h a l f b a k e r y
Warm and Fussy
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
When checking servers, it's really tough to get a good
of whether or not a disk is running out of space. On one
server 0.1 GB free space may be
plenty, while on another less than 10 GB free space may
a cause for alarm. Looking at the numbers at a given
point in time is meaningless.
What's needed is a log of
what the usage was a week, a month or a year ago. In
other words, one server may consume that last GB of
drive space in days while another may not consume it in
thousand years. It all depends on the usage patterns.
I propose a simple utility that takes a snapshot of the
space every week. By looking at the growth of usage in
the past, it will be able to extrapolate into the future
take a guess at how much time you have left before you
have to consider archiving old data, or simply upgrading
the hard drives.
This could be very useful for management too. You
say things like: "this hard drive will last us for 1-3 years,
given the current growth rate"
A separate utility would do, but of course ideally this
be integrated into output of commands such as "df".
||This could be done fairly easily using cron to call a script that logs date and remaining capacity. Then make a chart of the logfile with a spreadsheet or whatever.
||Sound recording software often displays time remaining until the disk is full. It tends to be months these days.
||You should examine your logs, daily.
||//You should examine your logs, daily.//
||Dear god but that's a dreadful German habit.
||The place I'm currently working at have periodic "spikes" around the beginning/end of the month, and mid-month, during the ides - in addition to this 15 day pulse, there's also heightened traffic over quarter-ends and also, a yearly spike at year-end. Then there are weekly archiving tasks, tidying up, and moving data off the main SAN and into an archive area - the idea being that we try to maintain a level of storage that remains (roughly) the same throughout the year - taking on new work includes measuring expected transaction volumes against the storage plan, and umming and ahhing about whether we can make it all fit without keeling over.
||A utility that does all this and spits out a single expected etc (estimated time to crash) would be cute, but may be a shade simplistic for some applications.
||[bigsleep] - Maybe it would help if you could order a batch of mixed quality drives. Some which have passed QA with flying colours, some which just scraped through and some some which fell a little short. This would help the clustering of failures and help the manufacturers shift more drives.
||//maybe it would help if you could order a batch of mixed
quality drives. Some which have passed QA with flying
colours, some which just scraped through and some some
which fell a little short. This would help the clustering of
failures and help the manufacturers shift more drives.//
||[wagster] You need to post this as a separate idea so it can
garner the croissants it deserves.
||Better still, RAID systems could read the serial number or manufacture date (perhaps via hdparm) of similarly manufactured drives, and throw an exception to alert the user that the serial number ranges of the array indicate that they were made too close to each other.
||S.M.A.R.T. will tell you Power-On Hours among other
||I voted yes for this idea, however, and I hate when people say this, but "they already have that."
And it actually works very well.
At my company everyone on a team gets weekly stats as part of our boring conference calls.
You can never predict everything, I mean an old server that rarely gets used could suddenly be assigned to a new department.
In our server room, we have the analog-equivalent of AI -- real humans.
And speaking of humans, the real problem is convincing your boss's boss's boss that the volume is 90% full and/or will go bad soon and we need a new one. Rinse and repeat at 95% full, and 99% too.
When the drive is 100% full and/or has crashed, that's when they will actually approve order for a new one, make you wait 3 weeks for it, when it gets there it's the wrong one...