1

There is a server with 2 CPUs (Intel Xeon E5-2670, Sandy Bridge) and 1 AMD GPU (Tahiti XT Radeon HD 7970). There are also SSD (system and executable files) and HDD (heavy data) that connected to this server.

Sometimes server works under high load for several hours. Sometimes server idle for several hours (idle may be more than 8 hours).

I've heard two opposite opinions:

  1. I shouldn't turn my server off because server is designed to work without turning off. Thousands of servers aren't turned off and work for month without stops.
  2. Server should be turned off if there is no load on it. Especially because of GPU. There is limited resource of GPU and of GPU cooler and it isn't good for GPU to be turned on all the time even if it isn't under heavy load.

Which opinion is right? Should I or shouldn't turn off this server to significantly increase it lifetime?

UPD 1 I wondering firstly about nonstop work of GPU.

UPD 2 About GPU choice. This isn't just gaming GPU. Radeon 7970 beat for example Nvidia Kepler in several cases. See presentation.

UPD 3 There is an opinion that it is very bad strategy for GPU-based machine to be turned on on IDLE. I try to understand is a true opinion or false

petRUShka
  • 293
  • 1
  • 5
  • 16
  • Duplicate of http://serverfault.com/questions/258064/should-servers-be-turned-off-at-night – user9517 Aug 24 '13 at 07:31
  • What's the GPU, ad a consumer gaming-oriented one at tht, actually doing in your server? – Chopper3 Aug 24 '13 at 09:28
  • It isn't duplicate of that question. I asked about server with GPU. In the question that you mentioned they talking mostly about HDD – petRUShka Aug 24 '13 at 22:00
  • About "gaming-oriented" see update of the question – petRUShka Aug 24 '13 at 22:00
  • 1
    All of the answers to that question apply equally well to the GPU. There's no good reason to shut down the server just because the GPU has nothing to do. – Michael Hampton Aug 24 '13 at 22:36
  • There is an opinion that it is very bad strategy for GPU-based machine to be turned on on IDLE. I try to understand is a true opinion or false – petRUShka Aug 25 '13 at 14:41
  • @petRUShka where is this opinion ? – user9517 Aug 28 '13 at 13:21
  • The "gaming oriented" or not is mostly based on the GPU *card* construction and not on the GPU *chip* itself. GPU / GPGPU cards for scientific computing are usually designed to survive operating 24/7/365 with high workloads for long periods of time (months, years). Gaming cards often have a lower MTBF / life expectancy because are targeted at high but more intermittent workloads. – Luke404 Aug 28 '13 at 13:40
  • @lain in private conversations – petRUShka Aug 28 '13 at 15:28

1 Answers1

1

Pros to turning the server off when idle:

  • Lower (zero) power consumption, which saves on both electricity and cooling costs
  • Less wear on fans, which is the most likely thing to die on the GPU (or the rest of the server, probably.
  • If you have a scheduled shutdown every night anyways, scheduling windows updates becomes a lot easier

Cons to turning the server off when idle:

  • Motors (both fans and the spinning drives) are more likely to fail to spin up than fail to keep spinning
  • Server is unavailable if there is some work for it to do in the middle of the night
  • Stress on components - there is a big temperature difference (especially in an air conditioned room) between a running server and a powered off one. The temperature cycling causes metal to expand and contract each time, which will eventually wear parts out.
  • Software and OS issues are more likely to happen at boot time. Maybe the last batch of windows updates messed something up, or your bootloader is corrupt, etc. Of course, these will come up the next time you reboot anyways, but at least you don't have to worry about them on a daily basis, and rush to fix them at 8:50am before everyone comes in at 9am.

Fans and hard drives are the only parts of most systems that have motors. The hard drive motors are well protected from the environment, but the fans are exposed to all the dust in the air. So they wear out quickly compared to other parts. This is why in most servers they are hot swappable - you can replace them without turning the server off. There are also more fans than actually needed, so a single fan failure doesn't cause the system to overheat.

However, that doesn't mean turning them off is necessarily a good thing. Most fans that are starting to wear out work fine once they get up to speed, but have trouble starting. So they will fail to come on at all when the server is turned back on, but may have kept running if it were left on the whole time.

Thoughts specifically about the video card:

  • The video card you are using is meant for high end gaming systems. AMD's FirePro line of video cards is made for server use.
  • One of the big differences you'll notice right away is only the highest end model has a fan, the rest are passively cooled. The one with fans actually had 3 of them, and they are larger and likely more durable than the fans on gaming video cards.
  • Server graphics cards are also built for a 24x7 workload, so they have more durable components overall.

All video cards will slow down their fans and lower their power consumption when idle. There isn't a "limited of resource of GPU" if you mean something like "after 1 trillion calculations, the video card will die", but there definitely is a limited number of hours the fan on it will run before failing. On the desktop side of things, I've had plenty of systems with dedicated video cards that ran nearly 24x7 for 2-3 years before the video card fan died. In an actual server room environment, with hopefully less heat and less dust than a desktop environment, I expect it could run for quite awhile without maintenance. But just in case, I would order a couple replacement fans for it so I have one ready if it dies.

Conclusion

  • Keep the server on, even if idle, unless it's going to be idle for many days or weeks at a time. And even then, I'd leave it on.
  • Pickup some spare fans for that GPU. Especially since they will be hard to find in a year or two when the card is considered outdated.
  • Look into replacing the GPU with a server grade equivalent. Whether that is a good option depends on your GPU processing needs and budget. You may decide it's cheaper to just have an entire spare card lying around in case one dies.
Grant
  • 17,671
  • 14
  • 69
  • 101