3

Our project are plan to migrate from Sparc to x86, and our HA requirement is 99.99%, previous on Sparc, we assume the hardware stability would like, hardware failure every 4 month or even one year, and also we have test data for our application, then we have requirement for each unplanned recovery (fail over) to achieve 99.99% (52.6 minutes unplanned downtime per year).

But since we are going to use Intel x86, it seems the hardware stability is not so good as Sparc, but we don't have the detail data.

So compare with Sparc, how about the stability of the Intel x86, should we assume we have more unplanned downtime? If so, how many, double?

Where I can find some more detail of this two type of hardware?

Jason T
  • 31
  • 2
  • 3
    What do you mean by hardware failure? Catastrophic as in server going down or minor as in losing a disk in a raid array. if you are getting catastrophic failures every 4-12 months on any server class platform you probably have environmental issues that will effect any platform equally. – Zypher Feb 09 '11 at 05:26
  • 1
    *it seems the hardware stability is not so good as Sparc, but we don't have the detail data.* -- If you don't have data then how did you come to this conclusion? I'm not asking to try and be "clever" but rather I'm wondering if you're looking at something specific, which can then be specifically addressed in the answers. – Rob Moir Feb 09 '11 at 08:25
  • "Intel x86" is a vague generic term, not a device. Do you have a question regarding an actual CPU? If so you need to specify what CPU(s) you're talking about. – John Gardeniers Feb 09 '11 at 21:15
  • @JohnGardeniers, it is much worse than that: "x86" is the CPU; need to factor in power source, motherboard, graphics, disks, ... I'd say "server-grade" x86 machines are approximately as reliable as SPARCs, and for the same performance much cheaper. Buy two ;-) – vonbrand Feb 17 '13 at 18:48

3 Answers3

2

Intel's Xeon 75xx series chips inherited 90%+ of the RAS features of their thoroughly enterprise class Itanium chips when launched last year. Their stability, especially when coupled with 75xx-aware OSs such as Server 2008 R2 and RHEL 5.5, is significantly better than their other x86 counterparts. These chips are available in servers by all the main vendors such as HP's DL580 G7, DL 980 G7 and Dell/IBM equivalents. Hope this helps.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • How about Sun Blade X6270 M2? How could I get more information about the reliability of this hardware? – Jason T Feb 10 '11 at 02:43
  • Well it uses Xeon 56xx series chips, which are great and the backbone of all my platforms but they're not as reliable as the 75xx series I mention above. Sun don't do 75xx series blades (yet) but their X4470 and X4800 servers do use them. Hope this helps. – Chopper3 Feb 11 '11 at 10:12
1

One of the advantages of x86' relatively low cost is the ability to scale horizontally. Your mission-critical applications should be able to survive a complete failure of any component, including one system in a cluster of systems.

  • Yes, but how many system crash caused by hardware should be estimated, so we need to consider how to avoid system downtime. For example, if the hardware will broken every month, then we need to think about less 1 minute downtime solution of our whole system. And if the hardware will crash every week, we need much more high HA requirement for our application (like service transfer to available node in second level) – Jason T Feb 10 '11 at 02:42
  • Without "official" numbers to give you, in my experience the vast majority of hardware failures in a sizable datacenter are hard drive failures, and that is independent of CPU architecture. I've had 1 out of 200 x86 systems fail in the last year that wasn't a hard drive failure. And hard drive failures are almost always trivial: just use RAID. – Justin Morgan Feb 10 '11 at 03:22
1

Since the x86 platform is a lot bigger, it's only natural that the spread between top-notch and bottom-feeders is also bigger. In the Sparc architecture, you'd be hard-pressed to find cheap low-quality machinery, in x86 you can find plenty of it. This is both between brands/manufacturers and in models within a single manufacturer's range.

However, there is also plenty of quality x86 kit arround. All of the reputable brand vendors have mid to high-end systems that are quite capable of reaching 99.99% (after a burn-in, not counting non-impacting defects, and you didn't specify how many SPOF's you intend to chain). Any good account manager will put you in touch with (sales-)engineers to flesh out your actual requirements.

Since you appear to be doing a major upgrade, it would be worth it to investigate reliability via software. There is a whole range of companies doing cheap ("somewhat" reliable x86) hardware with smart software to achieve Very High Availability. Some of their work may prove to be easy yet valuable for you.

Joris
  • 5,939
  • 1
  • 15
  • 13
  • Yes, we have test plan for software, but as I said, if we have more downtime caused by hardware, we need to have shorter recovery time for our solution to achieve 99.99%. – Jason T Feb 10 '11 at 02:36