-RE: Start-up of new Solr shard leaders or existing shard replicas depends on zookeeper?
Hoggarth, Gil 2013-10-24, 15:33
I think my question is easier, because I think the problem below was
caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg'
zk collection name didn't specify the number of shards (and thus
defaulted to 1).
So, how can I change the number of shards for an existing collection/zk
collection name, especially when the ZK ensemble in question is the
production version and supporting other Solr collections that I do not
want to interrupt. (Which I think means that I can't just delete the
clusterstate.json and restart the ZKs as this will also lose the other
Solr collection information.)
Thanks in advance, Gil
From: Hoggarth, Gil [mailto:[EMAIL PROTECTED]]
Sent: 24 October 2013 12:01
To: [EMAIL PROTECTED]
Subject: Start-up of new Solr shard leaders or existing shard replicas
depends on zookeeper?
I'm seeing some confusing behaviour in Solr/zookeeper and hope you can
shed some light on what's happening/how I can correct it. (I've also
asked the ~same question to the Solr mailing list but I think it's a ZK
We have two physical servers running automated builds of RedHat 6.4 and
Solr 4.4.0 that host two separate Solr services. We also have two
separate ZK ensembles, a production version and a development version -
both running 3.4.5 built via automation. The first Solr server (called
ld01) has 24 shards and hosts a collection called 'ukdomain'; the second
Solr server (ld02) also has 24 shards and hosts a different collection
called 'ldwa01'. It's evidently important to note that previously both
of these physical servers provided the 'ukdomain' collection, but the
'ldwa01' server has been rebuilt for the new collection. ld01/ukdomain
connects to the production ZK. ld02/ldwa01, well that depends and is the
When I start the ldwa01 solr nodes with their zookeeper configuration
(defined in /etc/sysconfig/solrnode* and with collection.configName as
'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes
initially become shard leaders and then replicas as I'd expect. But if I
change the ldwa01 solr nodes to point to the zookeeper ensemble also
used for the ukdomain collection, all ldwa01 solr nodes start on the
same shard (that is, the first ldwa01 solr node becomes the shard
leader, then every other solr node becomes a replica for this shard).
The significant point here is no other ldwa01 shards gain leaders (or
The ukdomain collection uses a zookeeper collection.configName of
'ukdomaincfg', and prior to the creation of this ldwa01 service the
collection.configName of 'ldwa01cfg' has never previously been used. So
I'm confused why the ldwa01 service would differ when the only
difference is which zookeeper ensemble is used.
If anyone can explain why this is happening and how I can get the ldwa01
services to start correctly using the non-development zookeeper
ensemble, I'd be very grateful! If more information or explanation is
needed, just ask.
Web Archiving Technical Services Engineer
The British Library, Boston Spa, West Yorkshire, LS23 7BQ