Elasticsearch Corrupted Shards

Posted on 2021-12-24 Edited on 2022-08-28 In Elastic

We accidently configured all nodes the master role and wrong data path in upgrade and result in all shards unassigned with cluster red status, this led to data loss and corrupted shards.

For example, from the cluster health API the cluster status after upgrade:

{
  "cluster_name": "xxx",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 5,
  // no data nodes
  "number_of_data_nodes": 0,
  // no primary shards
  "active_primary_shards": 0,
  "active_shards": 0,
  "relocating_shards": 0,
  "initializing_shards": 0,
  // all unassigned
  "unassigned_shards": 33,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 0
}

In this case you need to take a glance at node status, it turns out that we have wrong configuration, all node are set to master, data path is wrong too:

curl "localhost:9200/_cat/nodes"

172.16.0.141  5 86 1 0.00 0.10 0.19 im - 172.16.0.141
172.16.0.140 24 86 2 0.02 0.14 0.16 im - 172.16.0.140
172.16.0.138  4 66 0 0.27 0.18 0.24 im - 172.16.0.138
172.16.0.137  4 73 0 0.00 0.07 0.12 im - 172.16.0.137
172.16.0.152  4 86 1 0.00 0.04 0.09 im * 172.16.0.152

// for data path, master and data nodes may different but the same kind 
// should be the same path

The solution is to set the right configuration (node role and data path) and restart the whole cluster, Usually, the node rejoin will transform unassigned to assigned/started, if not, the data may lose and corrupted so still unassigned.

From allocation explain API to get details:

1	curl "http://localhost:9200/_cluster/allocation/explain" \| jq

{
  "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index": "elastalert-status",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "CLUSTER_RECOVERED",
    "at": "2021-12-20T23:54:16.720Z",
    "last_allocation_status": "no_valid_shard_copy"
  },
  "can_allocate": "no_valid_shard_copy",
  "allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"
}

How to proceed, in any node:

retry reroute

1	curl -XPOST "localhost:9200/_cluster/reroute?retry_failed=true"

force reroute and accept data loss See explanation for these 2:

// allocate a primary shard to a node that holds a stale copy
curl -XPOST "localhost:9200/_cluster/reroute" \
-H "Content-Type: application/json" \
-d \
'{
    "commands":
    [
        {
            "allocate_stale_primary":
                {
                    "index" : "elastalert-status",
                    "shard" : 0,
                    "node" : "172.16.0.138",
                    "accept_data_loss" : true
                }
        }
    ]
}'

// this actually deletes target index
curl -XPOST "localhost:9200/_cluster/reroute?pretty" \
-H "Content-Type: application/json" \
-d \
'{
    "commands":
    [
        {
          "allocate_empty_primary" :
          {
                "index" : "elastalert-status",
                "shard" : 0,
                "node" : "172.16.0.138",
                "accept_data_loss" : true
          }
        }
    ]
}'

reindex from a backup if you have

At that time the #2 solved the issue but we lost the all influenced indicies.

Reference

How to resolve unassigned shards in Elasticsearch, this series is good.