We accidently configured all nodes the master role and wrong data path in upgrade and result in all shards unassigned with cluster red status, this led to data loss and corrupted shards.
For example, from the cluster health API the cluster status after upgrade:
1 | { |
In this case you need to take a glance at node status, it turns out that we have wrong configuration, all node are set to master, data path is wrong too:
1 | curl "localhost:9200/_cat/nodes" |
The solution is to set the right configuration (node role and data path) and restart the whole cluster, Usually, the node rejoin will transform unassigned
to assigned/started
, if not, the data may lose and corrupted so still unassigned.
From allocation explain API to get details:
1 | curl "http://localhost:9200/_cluster/allocation/explain" | jq |
1 | { |
How to proceed, in any node:
- retry reroute
1 | curl -XPOST "localhost:9200/_cluster/reroute?retry_failed=true" |
- force reroute and accept data loss See explanation for these 2:
1 | // allocate a primary shard to a node that holds a stale copy |
- reindex from a backup if you have
At that time the #2 solved the issue but we lost the all influenced indicies.
Reference
How to resolve unassigned shards in Elasticsearch, this series is good.