149

There is a server that is used from 4:30 am in the morning until ~ 22:00.

Should it be turned off? I think that it is a server and that it won't have a problem to stay on, but serious professors are telling me that it is dangerous and that HD can fail within 2 years. The server owner believes that his old server running from 1995 without backup and a single hard disk (if the hard disk fails he is screwed) had no problem because he used to turn it off at nights.

What do you believe for this?

Now it has a RAID 1 array, external hard disk backup, and serveral full hard disk backups on DVD and over the internet.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
GorillaApe
  • 1,379
  • 2
  • 10
  • 11
  • 62
    +1 because even though this is a question that I don't think any of us have ever even entertained the thought of, clearly some people do and it needs to be answered. – Mark Henderson Apr 11 '11 at 00:40
  • 3
    So you save around 6hrs of running to your power bill. Are you going to come in at 4am or so to switch it on and ensure it boots up, and then switch it off again at 10pm every day? – hookenz Apr 11 '11 at 01:36
  • 4
    Can't you use some power settings to spin down the hard drives when they're not in use? Same benefit in terms of wear, but you don't have to turn the whole machine off. – Brendan Long Apr 11 '11 at 04:37
  • 36
    Professors of English Lit? – Iain Holder Apr 11 '11 at 06:28
  • 8
    Your professors are not running servers. They are running workstations with network daemons. – Bacon Bits Apr 11 '11 at 04:00
  • 28
    I would not recommend taking advice from people not making backups. Especially not, if they do this for 15+ years. Saving electrical energy is the only argument for switching it off. Lifetime will be reduced due to heating up and cooling down every day. – Malte Apr 11 '11 at 10:41
  • thanks for the attention!!! Server is in a industrial environment and that means that power consumsion doesnt matter as energy hungry machines are working 24h hours per day there.. – GorillaApe Apr 11 '11 at 19:59
  • 1
    Guess I should ask stupid questions more often! – Christian Apr 11 '11 at 22:53

12 Answers12

155

To liken it to a car analogy: A taxi can do over 500,000 kilometers before it needs an engine rebuild. The reason for this is because they are always running, 24/7, and after a car's engine is up to temperature, the amount of wear it receives while it is running is greatly reduced.

A computer is kinda the same. The majority of the "wear" on parts can happen when the server is booting up. Just attach an amp meter to your computer, and turn it on. When it starts up, the power it draws climbs very high, and then it settles down once all the disks have spun up and the processor is initalised. Also, think about how much disk activity the server undergoes during boot up vs when it's working. Chances are the disk access from booting the OS is fairly solid activity, whereas when the OS is running, unless it's a very heavy database server (I'm guessing not), the disks will most likely stay fairly idle. If there's any time it's going to fail, chances are it will be on boot up.

Turning your server on and off is a stupid idea. Not only to mention most servers can take upwards of 2-5 minutes to just get past the BIOS checks, it's a huge amount of wasted time too.


2018 Update: Given that most computers are now essentailly entirely solid-state, this answer may no longer be as accurate as it once was. The taxi analogy doesn't really suit todays modern servers. That said, typically you still generall don't turn servers off.

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
  • 16
    Could you add some links to research on this topic? – mafu Apr 11 '11 at 15:33
  • 16
    @mafutrct: WikiBook on [HD Failure](http://en.wikibooks.org/wiki/Minimizing_Hard_Disk_Drive_Failure_and_Data_Loss#Power_cycling_control) and [Google Labs Study on HD Failure](http://labs.google.com/papers/disk_failures.pdf) shows inconculsive evidence to suggest power cycles impact drive life, and total uptime does not as much as batch/luck of the draw. Also obvious things like physical trauma have the most effect. – Chris S Apr 11 '11 at 19:13
  • 4
    How do they fuel the taxi if it's always running? That's illegal in most jurisdictions. – Lightness Races in Orbit Apr 11 '11 at 22:12
  • 8
    @Tomalak - well, the point of the analogy was that the Taxi's engine is up to temperature and less wear occurs during this time. In the 90 seconds it takes to re-fuel a Taxi the engine does not have a chance to cool down, and thus the wear is still deminished. On a computer, it is "cooled" (for the purposes of the analogy) instantly, and each start is a "cold" start. – Mark Henderson Apr 11 '11 at 22:55
  • @MarkHenderson: You missed my hilarious pedanticism. – Lightness Races in Orbit Apr 11 '11 at 23:00
  • 1
    @Tomalak Perhaps you should pedant more clearly then. – rfelsburg Apr 12 '11 at 01:12
  • @Mark Henderson, this is a great answer +1..., what if your taxi was a VM? – Fergus Apr 12 '11 at 04:08
  • 6
    @Fergus - well, this only applies to physical boxes. If have a VM feel free to power it on/off as much as you like. But you won't get any life saving or power saving out of it. – Mark Henderson Apr 12 '11 at 04:19
  • So how does this one go if they are suspending to RAM rather than shutting down? In that case there is no need for excessive hard drive & CPU usage on resume. – intuited Apr 12 '11 at 07:05
  • @intuited - honestly I've never seen a server suspended (or hibernated for that matter) so I don't have any evidence anecdotal or otherwise. I would guess that the wear would be minimal, as most devices are just going into low power mode. – Mark Henderson Apr 12 '11 at 08:01
  • @int Also, the hibernate file has to be loaded from disk again, so it's probably not that much of an advantage I guess? – mafu Apr 12 '11 at 11:38
  • @mafutrct: Yes, that would apply if hibernating. Note that I said *suspending to RAM*. – intuited Apr 12 '11 at 12:09
  • +1 Mark, I had no idea that the car analogy would aptly reflect wear on computer hardware. I had always assumed the contrary. Thanks for the enlightenment! – msanford Apr 13 '11 at 13:39
  • Bah, what the heck. The site needs another gold badge awarded ;) – squillman Apr 13 '11 at 21:27
  • @squill - I'll be the first to admit that this answer does *not* warrant +100! – Mark Henderson Apr 13 '11 at 21:37
  • @Mark hahaha :) – squillman Apr 13 '11 at 21:38
  • I stole your analogy for a similar question on skeptics... http://skeptics.stackexchange.com/questions/2067/is-leaving-a-computer-running-better-than-turning-it-on-and-off – Supercereal Apr 14 '11 at 21:19
  • @Kyle - no worries – Mark Henderson Apr 14 '11 at 23:02
  • This analogy sustained solely by pure conjecture is a pretentious claim to know – and teach – how things really work. IMO it is entirely meaningless without proper factual evidence; as [rfelsburg](http://serverfault.com/questions/258064/should-servers-be-turned-off-at-night/258096#258096) noted in his answer, no such evidence is known to exist. – davide Dec 23 '15 at 00:52
  • On a component level, there may be additional wear and tear caused by repetitive heating can cooling of components if the server is powered on and off each day -regardless of solid state or not. My assumption, is that, I may be wrong, a consistent temperature, albeit higher or not, is better for longevity of the components as opposed to fast warming/cooling. My reasoning is that there may be thermal fatigue / tensile stresses on the internals of the components aggravated overtime by the two extremes. Googling has provided some reasonable justification for this but nothing empirical. – JaredW82 Oct 19 '18 at 18:36
71

Turning the server off and on everyday would likely cause it to fail faster than leaving it on.

HostBits
  • 11,776
  • 1
  • 24
  • 39
  • 3
    Most likely due to the disk stress @ boot; also, I had a server that happily ran for years, then refused to come up at restart. Turns out the boot disk was slowly degrading, with the MBR completely unreadable - but the MBR was only read when booting, so no-one noticed. Thankfully, the disk *completely* died only after a frantic rush to recover whatever wasn't backed up yet. – Piskvor left the building Apr 11 '11 at 13:00
  • I doubt that this happens with switching off once a day, even enterprise disks have a 300,000 switch on count in their technical details. The problem comes from energy management which does it every 15 minutes. – Lothar Sep 16 '14 at 15:15
  • I don't understand how an answer that states merely someone's thoughts without any reasoning what so ever received 70 upvotes. – Bassie-c Jun 06 '19 at 14:18
52

The only thing I can see that's even close to right about what you've been told is that drives can fail within 2 years. They can in fact fail at any time. I'm sure most of us have received at least one brand new drive that was DOA. On average server drives will last anything from about 3 years upwards, with 10 or 20 years not being too uncommon. That doesn't mean any individual drive won't fail a whole lot sooner.

Servers (meaning a machine with proper server grade components) are design to run continuously. There is no reason to shut it down at night but some very good reasons to leave it running. Nighttime, or whatever other time is "quiet" for a given system is the time to run all the maintenance and automation.

e.g. Backups are best taken when there is no or little user activity. This helps to ensures backups are consistent. Sure there are ways around this but why not give your backups every chance of success when there is nothing to lose by doing so?

Someone running a "server" with a single disk and no backup is a fool, not an admin. The only reason he got away with it is sheer dumb luck. It had absolutely nothing to do with shutting the machine down at night.

Massimo
  • 68,714
  • 56
  • 196
  • 319
John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
  • I'm going take a wild guess here and say that the "admin" who's not taking backups is also not using proper server grade components. – intuited Apr 12 '11 at 07:07
23

Servers are meant to operate 24x7. Shutting servers down overnight is extremely non-typical. Server hard drives are designed to be more reliable than desktop drives and now that you have backups and RAID 1, you will not suffer data loss if one of your two drives fails.

What i would worry about now for this 16 years old server is a motherboard or non-redundant power supply failure.

Zero Subnet
  • 699
  • 2
  • 10
  • 29
13

I've never turned a server off at night before.

Hard drives will fail when they are going to fail. Turning the machine on and off isn't going to make the drive fail slower. I've seen hard drives shipped from the vendor that were already bad, and I've seen disks running (and actively being used) for 5+ years without failing.

Your professors are idiots.

mrdenny
  • 27,074
  • 4
  • 40
  • 68
  • 2
    Turning the machine on and off will definitely make the drive fail more slowly if it is left off for long enough periods of time. If you turn the machine off for 10 years, the drive is more or less guaranteed to last at least 10 years. The question is how long the average machine needs to be left off for in order to have a positive effect. – intuited Apr 12 '11 at 07:13
  • 4
    @intuited If you turn on the machine after 50 years of waiting and the drive fails to boot, did it last 0 years, 50 years or what? – Cade Roux Apr 12 '11 at 17:44
  • @Cade Roux: I have no idea. What does it matter? Even if we count it as 0, this outlier is not going to significantly change the overall average. – intuited Apr 12 '11 at 20:49
  • 1
    @intuited it was a joke - but actually, it will likely skew the average http://research.google.com/archive/disk_failures.pdf And from this data, it's possible that a new drive turned on after sitting on the shelf to 5 years will be more likely to fail than a drive running for 5 years. The point is moot because it does not pay off to let hardware sit idle any more than it does to turn it off and on again. Hardware degrades and obsolesces and needs to have its maximum value used before it wears out and is replaced. – Cade Roux Apr 12 '11 at 21:02
  • @Cade Roux: From the google research you linked to: *As is common in server-class deployments, the disks were powered on, spinning, and generally in service for essentially all of their recorded life.* So that research is not relevant here. It does show that the still-new drive will be more likely than the old one to fail during a given period of time after it's turned back on, but this assumes that the old one hasn't already failed during the 5 years that it was running. – intuited Apr 12 '11 at 23:04
12

This also puts a bigger "human aspect" on the server. Even if you use power settings to turn it off and on at the correct times, you should have someone monitor the server to make sure all required services, etc. start up properly. That's precious time you can be teaching the professors about backups and RAID.

When do you run backups? I would give anything for a 6-hour window to run my daily backups, updates, hotfixes, etc. If nothing else, this downtime can be used for that.

I challenge you to go to these "serious professors" and provide research showing that leaving the computer on 24x7 is bad for it. I'd like to back up what they are saying.

Theo
  • 989
  • 5
  • 11
  • I said serious because one of them has designed processors, embedded systems knows linux kernel in great detail. As for backups , i have made scripts for automatic backup but then the owner & admin there looked me like WTF dude. "NO i want to do them manually daily" and dont "tar and compress them". – GorillaApe Apr 11 '11 at 20:06
  • 2
    @Parhs Simply put, you are smarter than your superiors. It would be wise of you to quit and find a real mentor before you have spent too much time in the shadow of idiots who will not let you do the right thing. – Skyhawk Apr 12 '11 at 03:59
10

Realistically most servers are expected to be available 24/7. Plain and simple.

On the off chance yours is not, there is very much a debate between which will cause more wear on your server, the constant expanding and contracting between turning your server on and having it heat up, and then turning off, and having it cool down, or the wear on components from constant use.

I have yet to see any research on which is worse, and I'm very much doubting your professor's have access to some research claiming differently.

In the end you'll have to make the decision based on your needs, but the cost benefit to most business's is to have their servers and services available all the time, not just when someone gets in and turns it on. Especially when there is a debate that you may in fact be making more trouble for your servers by turning them off.

rfelsburg
  • 767
  • 3
  • 7
7

What is more important is the cooling. Cooling makes a big difference. The temperature inside the box may much higher than the room. I would install software to monitor that like everest. Comparing the modern hard disk's to the one you replaced, they run hot. Some need fans to cool them Sometimes a small fan can make a big difference. The life of the HD and server will depend on the cooling.

5

Yeah not an option. Tell your professors that industry standard is to leave them running 24/7 and to have a warranty for failed hardware. If the server is 16 years old I imagine you're not going to get that warranty.
If the server did explode, what recovery time would you have to build a new one with the backed up information? I'd start hinting to the clients that their server has reached end-of-life and they should start looking for funds for a new one.

xXhRQ8sD2L7Z
  • 685
  • 5
  • 12
5

It's true that the mechanical stresses of power cycling is hard on the HDD. Also, there were some older drives that (when cooled down enough) could stop working altogether because of "stiction".

With inadequately designed circuits, inrush currents from turning the machine on could also stress some components, though this is not all that likely.

That said, there is some truth that leaving the machine on takes its toll: capacitors. The numerous electrolytic capacitors on the motherboard is likely to be the weakest link of system reliability. These capacitors are rated for their current/voltage handling capability, operating temperature and lifetime. Typical capacitors will be rated for several thousand hours. Heavy-duty/long-life caps are rated for several tens of thousands of hours and higher temperatures.

This is why you sometimes see motherboards for sale featuring "server grade capacitors" -- because those machines operate at full speed 24x7 and chew through their motherboard lifetime.

Toybuilder
  • 151
  • 4
  • 15+ years ago I've heard the term 'disk statistication' or something like that, the explanation given was after running for years the polymers in the disk lubricants start to form long chains and when the disk is powered off and stops spinning the bearings seize and will not start again'. Power failures in a DC usually meant a bunch of servers would not restart. Of course, I have no idea what disks use as/instead of bearings now. – jqa Apr 11 '11 at 23:30
  • +1 for striction. I was wondering if anyone would mention it. @james, it's called "striction", and there's been a lot of work done to come up with lubricants for the drives that don't thicken over time, use/abuse. Things are better than they used to be, but turning drives on and off is still not a good idea because they're most likely to die when the power hits, either because a component blows or the drive motors can't start spinning the platters – Greg Apr 12 '11 at 03:22
  • Most motherboards produced from 2008 (maybe earlier) to now use solid state capacitors, which have a MUCH MUCH higher lifetime than electrolytic stuff. Power supplies are now the only place where you still see electrolytic capacitors. – Mircea Chirea Apr 12 '11 at 07:05
  • That's stiction (stick + friction), not striction. :-) – kindall Apr 12 '11 at 20:19
2

When I had server getting monthly preventative maintenance from the manufacturer, they started off with a shutdown every month. This tended to result in component failures. The schedule changed to quarterly, then to only when required. I would not recommend shutting down a system that old unless it was necessary.

BillThor
  • 27,354
  • 3
  • 35
  • 69
  • Yes, I have seen servers with 1000+ days uptime, too. But not regularily rebooting (warm) is a sin; better encounter a failed system after a planned reboot than after an unplanned one. Also these reboots tend to uncover configuration mishaps. – sjas Sep 09 '17 at 22:28
2

One thing not mentioned is that most servers have maintenance tasks they perform on a daily, weekly or monthly basis. These are almost always scheduled for the middle of the night, when activity is expected to be at its lowest.

On a Red Hat system, for instance, these activities start at 4:02 am server time. Depending on the server, these could run for a few seconds to an hour or more. If you turn on the server at 4:30, these maintenance tasks will start immediately (by anacron) and the earliest users to log in between then and 5-ish am would be impacted to some extent.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940