4

I have a solr cloud (5.2.1) with 5 shards a 2 nodes.

In this cluster i have 163.463.543 items. I indexed 200.000 more items and now the versions / sizeInBytes between two nodes of a cluster are not consistent anymore.

shard1   | sizeInBytes | segmentCount |  version |
--------------------------------------------------
replica1 | 71325055021 |           14 | 11877844 |
replica2 | 71330161457 |            8 | 11877874 |

shard2   | sizeInBytes | segmentCount |  version |
--------------------------------------------------
replica2 | 71658372259 |            9 | 11965329 |
replica1 | 71660446852 |           17 | 11965305 |

shard3   | sizeInBytes | segmentCount |  version |
--------------------------------------------------
replica1 | 72328398189 |           24 | 11978919 |
replica2 | 72329934372 |           20 | 11978971 |

shard4   | sizeInBytes | segmentCount |  version |
--------------------------------------------------
replica1 | 71398290694 |           10 | 11882893 |
replica2 | 71398972036 |           16 | 11883065 |

shard5   | sizeInBytes | segmentCount |  version |
--------------------------------------------------
replica2 | 71635961292 |           16 | 11920521 |
replica1 | 71636668652 |            9 | 11920667 |

When i look in the web gui cloud status page everything seems fine. Any idea what happenend / how to fix it?

Luke
  • 148
  • 5
stoer
  • 43
  • 3
  • I'm curious if this is even a problem - I have the same issue with my SolrCloud running Solr 4.10.3. I'd really like a solid answer, but I can't find one. – Luke Feb 19 '16 at 18:52

1 Answers1

3

I have done a lot of research on this matter, and the only reference I can find is this email in a Solr mailing list: Link

SolrCloud works very differently than the old master-slave replication. The index is NOT copied from the leader to the other replicas, except in extreme recovery circumstances.

Each replica builds its own copy of the index independently from the others. Due to slight timing differences in the indexing operations, and possible actions related to transaction log replay on node restart, each replica may end up with a different index layout. There also could be differences in the number of deleted documents. Unless something goes really wrong, all replicas should contain the same live documents.

Thanks, Shawn

I have seen this same thing in my own experience too. I just recently created 10 new Solr collections with 2 replicas, loaded several hundred thousand documents into each of the collections, and the versions no longer match. It seems that the version is a holdover from before the SolrCloud days and it does not need to match.

Luke
  • 148
  • 5