How to achieve zero down time

Question

For an application we want to achieve zero database and application down time using Active Active configuration. Our dB is Oracle

Following are my questions:

How can we achieve active active configuration in Oracle?
Will introducing Cassandra/HBase(or any other No SQL dbs) cloud help in zero downtime or it is only for fast retrieval of data in a large db?
Any other options?

Thanks and Regards, Hiral

This belongs on Serverfault, but keep in mind that as downtime permitted approaches 0, cost approaches ∞. — derobert, Dec 13 '10 at 19:50

score 10 · Answer 1 · answered Dec 13 '10 at 19:42

10

There's no such thing as zero downtime. Set a realistic goal (say, five nines uptime), and plan around that. If you beat your target, great, but promising a system will never, ever go down can trap you in a state where it's impossible to make major architectural upgrades necessary to continue maintaining the system at reasonable expense.

answered Dec 13 '10 at 19:42

Chris

297
1
3

True. I would go a lot lower than five nines for a database. Even with parallel independent databases, you still hit those odd ball replication upgrade incompatibilities that require doing both sides at the same time. Usually it's only a 4-8 hours of downtime every few years. – JOTN Dec 13 '10 at 20:48

score 4 · Answer 2 · answered Dec 13 '10 at 21:54

The main consideration for zero (or close to) downtime is the amount of update activity. Updates (and deletes) conflict in ways that inserts don't.

Level 1. Database is pretty much totally read only (eg used for a Content Management System). This is the easiest to replicate.

Level 2. Only Inserts on a single node which get distributed to other 'read-only' nodes.

Level 3. Only inserts sharded (eg one node takes updates for America, another for Europe, a third for Asia...).

Level 4. Inserts/Updates/Deletes on a single node which get distributed to other 'read-only' nodes.

Level 5. Inserts/Updates/Deletes sharded (eg one node takes updates for America, another for Europe, a third for Asia...).

Level 6. Inserts on multiple nodes distributed to all other nodes.

Level 7. Inserts/Updates/Deletes on multiple nodes distributed to all other nodes.

At level 6/7 I'd be looking into NoSQL solutions. Maybe level 3 and 5 if I though the sharding mechanism might not be sustainable for longer timeframes.

Level 7 is practically impossible to achieve high nines availability. Ultimately you'll get person A trying to update stuff on node 1 at exactly the same time as person B is updating it on node 2....and then you'll lose node 1.

score 1 · Answer 3 · answered Dec 13 '10 at 20:13

This isn't necessarily a serverfault question. Being able to run two active databases in parallel will depend on your application code. The trick is you must design you code so there will never be a conflict with records simultaneously changed in both locations.

A few design ideas:

Use SYS_GUID to create all your primary keys instead of a sequence which can create duplicate keys if you're not careful.
Avoid deletes as they seem to cause the biggest headaches for referential constraint failures on replication.
Try to group transactions that may interfere with each on one database and only move to the alternate if that one fails.
Use unique constraints carefully as they also tend to cause a lot of replication failures

As far as how to set it up, we're using streams for replication and then the failover feature of Net8 to handle a server being down. If you want to pay the big bucks, you can look at RAC.

score 1 · Answer 4 · answered Dec 14 '10 at 17:15

1

The easiest way to achieve an active/active configuration in Oracle is by using Oracle RAC (Real Application Cluster). RAC Documentation can be found here.

RAC can also be combined with other Oracle High Availability tools like Data Guard or Streams. HA Documentation is available here.

Keep in mind that some maintenance operation can require you to shut down one node if not all of them.

answered Dec 14 '10 at 17:15

Benoit

3,499
1
18
17

1

RAC is active-active, but it will normally have a slightly lower availability than a stand alone installation due to its strong dependency on network and storage systems. Standalone oracle will not crash if the storage or network disappear for few minutes, but RAC will. I like DataGuard with FastStartFailOver for high availability, but it also has many limitations. – chenshap Dec 16 '10 at 23:15
@chensap RAC should allow the database to stay usable while at least one node is up and running. – Benoit Dec 17 '10 at 11:05

score 1 · Answer 5 · answered Dec 15 '10 at 15:40

I'd say it is possible to achieve zero downtime.

GoldenGate tries to provide this solution with bi-directional replication. You'll still need conflict resolution for an active-active configuration, and yes it can become a problem, but it's a pretty good solution.

For master/slave ChronicDB can do live updates accounting for replication without inconsistencies.

So the challenges are between active-active vs master-slave, and for both there a good alternatives

score 0 · Answer 6 · answered Nov 22 '11 at 01:28

Will introducing Cassandra/HBase(or any other No SQL dbs) cloud help in zero downtime or it is only for fast retrieval of data in a large db?

Cassandra can help because it allows even major upgrades to happen through rolling restarts (each node being upgraded in succession). Most good Cassandra libraries support automatically sending requests to accessible nodes as you perform the cluster changes. Using a replication factor of 3 and a consistency level of "quorum" allows uptime through rolling restarts even while maintaining consistency.

How to achieve zero down time

6 Answers6