Recently I was working on optimizating deletion of ES stale indices and hot shard rebalancing.
Every weekend there is plenty of outdated index needs to be deleted, the heavy workload contributes to busy ES cluster along with Kafka topics lag, end up with PD alerts.
Also the lack of intelligence of elasticsearch could occasionally allocate lots of hot/active shards to one data node result in a hotspot with high CPU load average and IOPS.
1 | # query hot shards distribution on data nodes |
There are 2 basic Python modules can help:
- elasticsearch, the official Python client for Elasticsearch API.
- elasticsearch-curator, Elasticsearch curator helps you curate, or manage your indices easily.
ILM and Curator, Curator will not act on any index associated with an ILM policy without setting.
For writing your own specific curator, see this quick start, for instance to delete indices:
1 | import elasticsearch |
BTW, to list indices sorted by creation date on Kibana:
1 | # view url parameters |
For using existing curator docker container, the curator
CLI is ready to use with command syntax (the repo is organized bad…):
1 | curator [--config CONFIG.YML] [--dry-run] ACTION_FILE.YML |
There are the examples of config.yml and action.yml. I find the well formated and published document is here
K8s Cronjob is easy to have with docker container mentioned above (see others example), put config and action yaml file in configMap, BTW, to manually trigger cronjob for testing purpose:
1 | kubectl create job --from=cronjob/<cj name> <cj name>-manual-0001 |
Some useful cronJob settings:
1 | spec: |