CouchDB replication recommended topologies

Question

I am working on a proposal for a system with seven CouchDB servers (A,B,C,D,E,F,G) in different countries. The idea is to configure multi-master replication so that all data can be kept in sync.

I could configure bi-directional replication from each server to each other server

Mathematica graphics

but I suspect that this may result in too many connections that could reduce performance by increasing the used bandwidth (is this the case?).

So my next idea is to configure them like a ring:

Mathematica graphics

Now we have much fewer connections and still have redundancy as each node is connected to two servers. The problem for my particular situation is that we don't want to have all databases in all the nodes. We would like to have two nodes (A & B) with all the databases and the rest with different subsets of it. For this reason I am thinking about doing this:

Mathematica graphics

As I am not a network topology expert, I would like to ask:

Is it truly not a good idea to replicate all nodes against all nodes?
Is this a reasonable topology (the last shown)?
Where could I learn more about this?

Just for completeness, the figures were generated with the following Mathematica commands:

Graph[Rule @@@ Permutations[CharacterRange["A", "G"], {2}],  VertexLabels -> "Name"]
Graph[Rule @@@ (Partition[CharacterRange["A", "G"], 2, 1, {-1}] /. {a_, b_} :> Sequence[{a, b}, {b, a}]), VertexLabels -> "Name"]
Graph[Flatten[Outer[{#1 -> #2, #2 -> #1} &, {"A", "B"}, CharacterRange["C", "G"]]~Join~{"A" -> "B", "B" -> "A"}], VertexLabels -> "Name"]

score 1 · Answer 1 · edited Feb 10 '18 at 12:58

I have no special experience with seven nodes (but with three nodes) but there should not be a problem at all with replicating each node with each other. I do that also with the three nodes I use in our projects. CouchDB is built to support a multi master setup of nodes. But you are also right in thinking about the used bandwidth when replicating to that many nodes with many connections. I suggest you monitor this.

CouchDB is following the CAP theorem with AP: availability and partition tolerance. That means the data are eventual consistent (see http://guide.couchdb.org/draft/consistency.html). So you should also think about partitioning your data what will result in a different setup you have shown above.

Or you could have a look at CouchDB 2.0 which was released September 20th. Now CouchDB does support clustering. I am pretty sure that this could solve your problem. The proposed setup is to run a cluster with at least (naturally) three nodes (n) holding 8 shards (q) in each node (https://blog.couchdb.org/2016/08/01/couchdb-2-0-architecture/). Usining replication is for still possible and I think it could be a way to reduce your setup (although I am not aware of why you are thinking about a seven node setup).

http://docs.couchdb.org/en/2.0.0/index.html

Danke. My understanding is that CouchDB 2.0 clusters are intended for nodes at a single location. In my case, the servers will be thousand of km apart and the size of the database is well below current HD capacities. — gdelfino, Oct 12 '16 at 20:46
Yeah - it was in my mind that you will most likely have the servers apart with a great distance. And that's the reason why I wrote, that replication is still the tool you want to use in combination with the cluster features of CouchDB 2.0. the only thing to keep in mind is the fact, that the nodes will be eventual consistent ;-) — awenkhh, Oct 12 '16 at 21:02

CouchDB replication recommended topologies

1 Answers1