Questions tagged [fault-tolerance]

89 questions
16
votes
3 answers

Multi-site high availability

We have a SaaS application that we need to be highly available. We already have an expensive, well-maintained Hyper-V failover cluster, but today the datacenter where we host that cluster had a five-hour power outage that knocked us completely…
Mike
  • 1,261
  • 5
  • 18
  • 31
15
votes
1 answer

What's the difference between a "degraded" RAID 6 array and a "clean" RAID 5 array?

Suppose you have two RAID arrays, one with N disks and one with N+1 disks. The array with N disks was formatted as a RAID 5 and left alone, while the other array was formatted as a RAID 6 before one of its disks was removed. Now both arrays have N…
ATLief
  • 299
  • 2
  • 12
10
votes
1 answer

Systemd does not restart service, although Restart=always

Here is my unit file of a systemd service: [Unit] Description=Tunnel For %i After=network.target [Service] User=autossh ExecStart=/usr/bin/autossh -M 0 -N -o "ExitOnForwardFailure yes" -o "ConnectTimeout=1" -o "ServerAliveInterval 60" -o…
guettli
  • 3,113
  • 14
  • 59
  • 110
10
votes
3 answers

Is round-robin DNS a possible solution for high availability?

Let's say I have 2 IPs for a given domain (round-robin DNS). If one the IPs becomes unresponsive, will clients try to connect to the other IP? or they will just fail to establish comunication with the domain?
10
votes
4 answers

Fault-tolerant NFS?

Probably a FAQ but I haven't found anything useful after a while of searching: Can I set up NFS in such a way that every single error (e.g. server CPU, hard disk, hd controller, network adapter, network cable, power supply) is masked without any…
Peter G.
  • 333
  • 5
  • 12
9
votes
4 answers

Do I need a second RAID controller for fault-tolerance?

I have a server with 3 hard drives installed, and a total capacity of 6. We're planning to max it out, but our consultant also suggested getting a second RAID controller "for redundancy" to support the new drives. To me, this doesn't make much…
Bigbio2002
  • 2,763
  • 11
  • 34
  • 51
9
votes
4 answers

Minimum number of disks to implement RAID6

RAID6 is intended to provide fault tolerance in the event 2 disks fail. What is the minimum number of disks required to implement RAID6? thanks
Upul
  • 211
  • 1
  • 2
  • 6
8
votes
3 answers

Shared storage options for ESXi HA cluster

I am seeking recommendations for shared storage options to support ESXi HA cluster (note I'm NOT asking for product/brand/model recommendation - I know this is against the rules here). I am asking for technology recommendation. The company I work…
7
votes
1 answer

Storage Spaces Direct MTBF

I'm testing S2D now and I would like to calculate MTBF for the whole system. It seems very fragile to me: Let's have 4 nodes (x) with 12 drives (y) in each. MTBF for one node is 1/12 of single drive value. For three-copy mirror we can tolerate 2…
7
votes
1 answer

Azure Virtual Machines - what fault tolerance do they provide?

We are thinking about moving our virtual machines (Hyper-V VHDs) to Windows Azure but I haven't found much about what kind of fault tolerance that infrastructure provides. When I run VHD in Azure, I've got two questions: Is my VHD and all the data…
Borek Bernard
  • 709
  • 2
  • 11
  • 21
6
votes
1 answer

Keepalived not sending mutlicast advertisments

I have two systems, both VMs. The are configured to use Bridged networking. I am trying to get keepalived to manage ownership of a VIP - 10.190.1.230. I have tried two versions of keepalived-1.2.2 and keepalived-1.2.1, built from source. ServerA -…
The_Viper
  • 153
  • 1
  • 8
5
votes
3 answers

How to properly configure power redundancy for devices with only one power input?

Some devices, such as servers and high-end switches have dual PSUs/power inputs, which can then be hooked up to separate UPSes, which are hooked up to separate power circuits (correct so far, right?). However, many devices, such as standard…
Bigbio2002
  • 2,763
  • 11
  • 34
  • 51
5
votes
3 answers

*nix CARP or VMWare Fault Tolerance?

We're experimenting with what VMWare called a "Fully Collapsed DMZ" on blade centre. Basically our DMZ goes straight into a vSwitch and all the security appliances are virtualised. I've spent days reading up about why this is a good idea and why…
Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
4
votes
1 answer

Does balance-alb and balance-tlb support fault-tolerance?

I've read bonding.txt file of kernel documentation, it's clear about load balance, but are balance-alb and balance-tlb really fault tolerant?
sebelk
  • 642
  • 3
  • 13
  • 32
4
votes
1 answer

Distributed mirrored filesystem under FreeBSD

Can someone share their experience in building a distributed mirrored filesystem between multiple FreeBSD machines? I. e. we have two (three, four...) servers and special partition "part1" mounted on each of them. We make some changes on it on the…
1
2 3 4 5 6