Data Stream, ILM and Data Tiers

At the time of this writing, we use Elasticsearch version 7.16.2, the reference is based on this version and the content may be subject to change.

Demo

A quick docker compose setup to play with data stream and observe how ILM behaves.

A more detailed way to set up and use data stream and ILM from official document.

In reality, consider the tier size in terms of multiple factors: disk space usage, CPU LA, CPU usage, for example, when the disk space usage is low but the CPU LA could beyond the vcpu limits, then you should not shrink the tier aggressively.

Data Tiers

Details please see Data Management explanation.

Data tiers is automatically ingetrated with Data Stream, to move cold indices to less performance and cost hardware, as the docker compose setup demo shows.

Things to highlight:

  • The content tier is required. System indices and other indices that aren’t part of a data stream are automatically allocated to the content tier.
  • The hot tier is required. New indices that are part of a data stream are automatically allocated to the hot tier.

Decommission Data Nodes

Cluster-level shard allocation filtering.

To decommission data node from tiers, first drain all shards from it:

1
2
3
4
5
6
7
# multiple ips separated by comma
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip" : "<target data node ips>"
}
}

Then check allocation to make sure 0 shard resides:

1
GET _cat/allocation?v

Then revert the cluster level transient setting:

1
2
3
4
5
6
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip" : null
}
}

Data Stream

Data Stream is well-suited for logs, events, metrics, and other continuously generated append-only data.

Data stream consists of hidden backing indices, not to be confused with the hidden data stream. To list all hidden or non-hidden data streams:

1
GET _data_stream?expand_wildcards=hidden

Hidden data streams are usually for facilitating purpose, we don’t use it.

The backing index name pattern:

1
.ds-<data-stream>-<yyyy.MM.dd>-<generation>

The same index template can be used for multiple data streams, the index patterns in settings can use regexp to extend match:

1
"index_patterns" : ["apple-*"]

Then apple-green and apple-yellow are 2 data streams.

One data stream only has one wirte index.

ILM

ILM: Manage the index lifecycle, ILM is tightly working with Data Steam as the docker compose setup demo shows.

One thing is worth to notice is the age of shard, for instance, the write index at hot tier(always start at hot) is named as .ds-example-2022.07.26-000002, at the time of rollover the age of it is reset to 0. If from hot to cold tier the min age is 7 days, let’s say the rollover is at 2022.08.01, then the shift to cold tier of this shard will be on 2022.08.08, instead of 2022.08.02(2022.07.26 + 7 days).

So, to transition backing index to next tier right after rollover, the window should be set as 0 days.

The age of the shard can be examined by ILM explain API, and list of other important fields:

1
GET .ds-example-2022.07.26-000002/_ilm/explain

The update on working ILM policy has some limitations:

  • If changes can be safely applied, ILM updates the cached phase definition. If they cannot, phase execution continues using the previous cached definition.
  • Changes to min_age are not propagated to the cached definition. Changing a phase’s min_age does not affect indices that are currently executing that phase.
  • When you apply a different policy to a managed index, the index completes the current phase using the cached definition from the previous policy. The index starts using the new policy when it moves to the next phase.
0%