On-Call Elasticsearch Quick Check

Posted on 2021-09-06 Edited on 2024-10-22 In Oncall

List some common Elasticsearch APIs to check different objects:

Elastic Stack Compatibility

The table shows the stack components’ version compatibility.

Some Strategies

If the cluster is unhealthy, check health API the shard status, as well as check node API for node status, are all nodes in good roles? Also check path.data in date node. Check this post
If remove and rejoin node intentionally in a short time, e.g: upgrade, OS maintenance, etc, can delay the unassigned replica shards re-allocation. This setting goes into every index so it may take some time, you can revert setting back after node rejoins.

Cluster Check

# cluster health, examine:
# total shard number, primary shard number, etc
# 1. yellow or red (relocating or unassigned shards?)
# 2. node number: master and data (any lost?)
# 3. huge number of pending tasks (may get stuck the whole cluster, then
# need to check what kind of pending tasks)
GET _cluster/health
curl -s "http://localhost:9200/_cluster/health?pretty"

# total index number in cluster
GET _cluster/stats?filter_path=indices.count
# total shard number in cluster
GET _cluster/stats?filter_path=indices.shards.total

# explain shard’s current allocation.
# can see on going allocation explanation
GET _cluster/allocation/explain

# current settings
# defaults, transient and persistent(usually made by dynamic APIs)
GET _cluster/settings?flat_settings&include_defaults

# set persistent/transient setting
PUT _cluster/settings
{
    "<transient or persistent>" : {
        "cluster.routing.allocation.disk.watermark.flood_stage": "90%",
        "cluster.routing.allocation.disk.watermark.high": "75%",
        "cluster.routing.allocation.disk.watermark.low": "70%",
        "cluster.routing.allocation.cluster_concurrent_rebalance" : "6"
    }
}

# set cluster_concurrent_rebalance and node_concurrent_recoveries bigger can
# speed up rebalancing
PUT _cluster/settings
{
    "transient" : {
        "cluster.routing.allocation.cluster_concurrent_rebalance" : "10",
        "indices.recovery.max_bytes_per_sec" : "250mb",
        "cluster.routing.allocation.node_concurrent_recoveries": "10"
    }
}

# remove persistent/transient setting
PUT _cluster/settings
{
    "<transient or persistent>" : {
        "cluster.routing.allocation.disk.watermark.low": null
    }
}

# super helpful!
# show allocation details on each node
# shards, disk.percent columns can help observe the rebalancing
GET _cat/allocation?v&s=shards:desc
GET _cat/allocation?v&s=disk.percent:desc

Pending Task

# pending tasks list
# usually when cluster is yellow or heavy load
GET _cluster/pending_tasks?pretty
curl -s "http://localhost:9200/_cluster/pending_tasks?pretty" > pending_tasks

# then analyze the "source" to examine what kind of pending tasks is occupied

Node Check

curl -s "http://localhost:9200/_cat/nodes?pretty"
# list all header parameters
GET _cat/nodes?v

# check master and data role are properly set
GET _cat/nodes?help

# check node ES version, useful in upgrade
GET _cat/nodes?v&h=ip,v

# check node heap used, ram.percent
# ram.percent: used + cached!!
GET _cat/nodes?v&h=ip,heap.current,heap.percent,ram.percent,ram.current,node,role,master

# node metrics
# desc sort used disk space percent
# as well as show index number in each node
GET _cat/nodes?h=ip,disk.total,disk.used_percent,indexing.index_total&s=disk.used_percent:desc

# list custom node attributes
GET _cat/nodeattrs?v

Index Check

Delete indices can be performed on Kibana Index Management browser.

# list all header parameters
# h=xx,xx,xx
GET _cat/indices?help
curl -s "http://localhost:9200/_cat/indices?help"

# check index mapping and setting
GET <index name>?pretty

# view 2 doc in this index
# so you can have a glapse of doc content
GET <index name>/_search?pretty
{
  "size": 2
}
# when you know doc id
GET <index name>/_doc/<unique id>

# sort by creation date
# creation.date.string: human-readable
# creation.date: Epoch & Unix Timestamp
# sort to see the creation time window for a specific index pattern
GET _cat/indices/[*-index-pattern-2021.11]h=i,creation.date.string&s=creation.date
# you can use converter here just in case
# https://www.epochconverter.com/

# pri.store.size: combined size of all primary shard for an index
GET _cat/indices/h=i,pri.store.size

# delete index
curl -XDELETE 'localhost:9200/<index name>/'

Index Template

# cat
GET _cat/templates/<template name>?v
curl -s "http://localhost:9200/_cat/templates/<template name>?v"

# display template definition, for example 
# index-patterns field
# alias field
GET _template/<template name>

Index Alias

# get index alias
GET <index name>/_alias

# get available alias list
GET _alias/*

# get list of alias for all indexes, empty is showed
GET */_alias

Shard Check

# list primary/replic shards of specific index
# show doc number in each and host node
GET _cat/shards/<index name>?v
curl -s "http://localhost:9200/_cat/shards/<index name>?v"

# check shard allocation per node, disk usage
# useful for distribution/balance assess
GET _cat/allocation?v

# unassigned reason
# relocating direction
GET _cat/shards?h=index,state,prirep,unassigned.reason

# list reallocating shard
# show reallocating shard source -> target node
GET _cat/shards?v&h=index,shard,state,node&s=st:desc

# sort shards by size
# s=sto:desc, descending order
GET _cat/shards?h=i,shard,p,ip,st,sto&s=sto:desc,ip:desc

# list all shareds in specified node and sort by shard size desc
curl "localhost:9200/_cat/shards?h=i,shard,ip,prirep,st,store&s=store:desc" \
| grep "<node ip or node name>" > shards.txt

# query hot shards distribution on data nodes
# sort node ip with order: desc or asc
curl -s "localhost:9200/_cat/shards/<hot index pattern>?s=node:asc" > shards \
&& cat shards | awk {'print $8'} | uniq -c | sort -rn
# get hot shard total based on shards per node
cat shards | awk {'print $8'} | uniq -c | sort -rn | \
awk 'BEGIN { sum = 0 } { sum += $1} END { print sum }'
# get hot shard number average
cat shards | awk {'print $8'} | uniq -c | sort -rn | \
awk 'BEGIN { sum = 0; count = 0 } { sum += $1; count += 1 } END { print sum / count }'

# reroute shard pri/rep
# try dryrun first, the output can be big
# the dryrun output contains the reasons from success or failure
curl -XPOST "localhost:9200/_cluster/reroute?dry_run" \
-H 'Content-Type: application/json' \
-d \
'{
  "commands": [
    {
      "move": {
        "index": "<index name>",
        "shard": "<shard number>",
        "from_node": "<ip or node name>",
        "to_node": "<ip or node name>"
      }
    }
  ]
}'
# or run in Dev tool
POST /_cluster/reroute?dry_run
{
  "commands": [
    {
      "move": {
        "index": "<index name>",
        "shard": "<shard number>",
        "from_node": "<ip or node name>",
        "to_node": "<ip or node name>"
      }
    }
  ]
}

# it is sometimes possible there is unassigned shard due to max reties failed
# https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html#cluster-reroute-api-request-body
curl -XPOST "localhost:9200/_cluster/reroute?retry_failed=true" \
-H 'Content-Type: application/json' \
-d \
'{
  "commands" : [
  {
    "allocate_replica": {
       "index" : "<index name>",
       "shard" : "<shard number>",
       "node" : "<target node>"
     }
  }]
}'
# or in dev tool
POST /_cluster/reroute?retry_failed=true
{
  "commands" : [
  {
    "allocate_replica": {
       "index" : "<index name>",
       "shard" : "<shard number>",
       "node" : "<target node>"
     }
  }]
}

# attempt a single retry 
# if there are many good(if shards are stale/corrupted, this will not work) 
# unassigned shards blocked, retry all without request body
# you may need to run multiple times to clean the backlog
curl -XPOST "localhost:9200/_cluster/reroute?retry_failed=true"
# or in dev tool
POST /_cluster/reroute?retry_failed=true

# if shard gets stucked in initializing status from recovery
# the reason could be
# 1. the shard is big
# 2. lot of replicas that lingers initial process

Data Stream

It is easy to examine DS in Kibana index management console.

# get specificed ds backing indices, template, ILM
GET _data_stream/<data stream name>

# rollover a data stream
POST <data stream name>/_rollover

# delete ds and all its backing indices
DELETE _data_stream/<data stream name>

What I care about is the current write index of DS:

# list of all non-system, non-hidden data stream
curl -s "http://localhost:9200/_data_stream/*?format=json&pretty" \
| jq -r '.data_streams[].name' | sort -r

# find current write index(last one), health status, template and policy
curl -s "http://localhost:9200/_data_stream/<data-stream-name>?format=json&pretty" \
| jq -r '.data_streams[].indices[-1].index_name'

# data stream stats
# total shards, total backing indices, total storage size
curl -s "http://localhost:9200/_data_stream/<data-stream-name>/_stats?pretty&format=json"

Another important statistics is the distribution of data stream based hot shard, it is not straightforward and needs some calculation, I have wrote a script to display it.

ILM

The index lifecycle management APIs. I have observed the huge number of pending tasks from ILM operations that slow down the cluster holistically (cs, traffic, usage, etc)

# examine shard age and phase state
# and any ILM error
GET <index name>/_ilm/explain

# remove ILM from ds or alias
POST <ds or alias>/_ilm/remove
# need to check if any index is closed by forcemerge, if yes, open it
GET <ds or alias>

# retry after the ILM gets fixed(updated)
POST <index name>/_ilm/retry

Stop and start the ILM system, used when performing schedule maintenance on cluster nodes and cloud impact ILM actions.

# check status
GET _ilm/status

# stop ILM system
POST _ilm/stop

# start ILM system
POST _ilm/start