Brain Freeze

Please revisit this blog to help youself understand what caused public speaking anxiety and how to overcome it.

Brain freeze: mind going blank usually come when:

  • speak to native speaker.
  • speak to more senior people.

The underlying fear is judgment or negative evaluation by others.

  1. rehearse/rɪˈhɜːrs/ to increase confidence, don’t memorize words but focusing on message delivery.
  2. speak slowly, clearly, take deep breath.
  3. practice with written notes, bullet points, keep discussion on track.
  4. practice recovering from a brain freeze: purposely stopping the talk and shifting attention to elsewhere, use notes to bring back.
  5. practice for the worest: what to say to the audience if our mind goes blank.

Self-help videos.

Fluency Disorder

Language learner with fluctutations in fluency:

  • Comfort level: You may feel more fluent when you’re in a relaxed, familiar setting or speaking with people you’re comfortable with. Anxiety can affect performance.
  • Topic familiarity: If you’re talking about a subject you’re familiar with, it’s easier to find the right words. Less familiar topics can cause hesitation.
  • Mental focus: Sometimes, fatigue, stress, or distractions can impact your ability to think clearly and express yourself.
  • practice and confidence: On days when you’ve been practicing more or feel confident, your fluency improves. Confidence boosts flow, while self-doubt can block it.

With more practice and consistency, you’ll likely notice fewer ups and downs

How to Stabilize Fluency

  • Practice regularly: Consistent speaking practice.
  • Build Vocabulary: learn new words in context.
  • Think in English: This speeds up your response time and makes your speech more natural.
  • Relax and Slow down: understand conversation flow, reducing hesitation, When you feel tense, your speech can become disjointed. Take a deep breath and speak more slowly to give yourself time to think clearly.
  • Embrace mistakes: Focus on communication rather than perfection.
  • Prepare for specific topics: rehearse sentences
  • Use filler phrases: When you’re stuck, use fillers like “Let me think,” or “What I mean is…” to give yourself a moment to collect your thoughts without losing fluency.
  • Record yourself
  • Stay positive and patient: Celebrate small wins and remind yourself that fluency will improve with time and effort.

I separate the content from original Envoy proxy blog to make it shorter. The original Envoy proxy was redacted to focus on Envoy concepts and demos.

General Proxy Related Concepts

Youtube Channel about proxy basics.

1. What is Proxy (Server).

A server application that acts as an intermediary between a client requesting a resource and the server providing that resource.

2. What is Forward Proxy (Proxy) and Reverse Proxy.

  • Forward proxy: anonymity, caching, block unwanted sites, geofencing.
  • Reverse proxy: load balancing, ingress, caching, isolating internal traffic, logging, canary deployment.

A Forward proxy is a proxy connecting from private to public IP space (which was the original idea for a proxy) while a Reverse proxy connects from public to private IP space, e.g. mapping different web servers behind the proxy to a single, public IP.

How does forward proxy know the final destination? via the HOST header, start from HTTP/1.1. The ping will not pass HTTP proxy, it is a lower protocol L3. 也就是说,不是所有traffic都走的proxy. 你也可以设置哪些访问用proxy, 哪些不用。

Proxy can add additional header to tell server where is the originating IP: X-Forwarded-For header. Proxy is dedicated: HTTP proxy(for HTTP but can upgarde to support tunnel), SOCKS proxy(only for L4).

NOTE: Reverse proxy is not necessarily a load balancer. Load balancer is one form of reverse proxy types.

3. What is HTTP Tunnel

Well explained from WIKI.

The most common form of HTTP tunneling is the standardized HTTP CONNECT method.

In this mechanism, the client asks an HTTP proxy to forward the TCP connection to the desired destination. The proxy server then proceeds to make the connection on behalf of the client. Once the connection has been established by the server, the proxy server continues to proxy the TCP stream to and from the client. Only the initial connection request is HTTP - after that, the server simply proxies the established TCP connection.

This mechanism is how a client behind an HTTP proxy can access websites using SSL or TLS (i.e. HTTPS). Proxy servers may also limit connections by only allowing connections to the default HTTPS port 443, whitelisting hosts, or blocking traffic which doesn’t appear to be SSL.

A proxy server that passes unmodified requests and responses is usually called a gateway or sometimes a tunneling proxy(From WIKI proxy server).

NOTE: More details please see my Envoy proxy demo in Github.

4. What is HTTP Proxy

HTTP proxy is the proxy server that speaks with HTTP protocal. It’s especially made for HTTP connections but can be abused for other protocols as well (which is kinda standard already).

The examples about using curl with HTTP proxy to do HTTP or HTTPS(through CONNECT method if the proxy support it!).

The -p(--proxytunnel) flag is not necessary for HTTPS, curl will ask tunnel for you, but if you want to explicitly tunneling for other protocols such as ftp, you need to specify this flag(Of course the proxy needs to support CONNECT method).

Also please be aware that, curl new version support HTTPS proxy that connect to proxy over ssl/tls(not tunnel, see curl man): -x https://<proxy-url>:<port>, otherwise -x <proxy-url>:<port> is default with http://.s

NOTE: More details please see my Envoy proxy demo in Github, especially how to use curl to do tunnel for other protocols.

5. Can Proxy & Reverse Proxy be Used in the Same Place?

Yes, for example, service mesh.

6. VPN vs Forward Proxy.

Proxy vs VPN, what’s the difference

Main differences:

  • VPN encrypt the traffic all the way, the proxy(socks, etc) not.
  • VPN for all traffic, proxy works on app level (specific app or browser).

7. L4 and L7 Reverse proxy.

L7 proxy works on layer 7, it will redirect the request after it completely received. Proxy check client request and reassemble new request to target server.

L4 proxy works on layer 4 (packet level), it will redirect the request packet immediately to target server (don’t wait all packets).

8. TLS termination proxy and TLS forward proxy.

TLS termination proxy:

1
2
3
           (proxy cert)
client <=================> proxy <------------------> servers
https http

TLS forward proxy, it is not tunneling (对于tunnel的类型或许叫做Tunneling Proxy 更合适):

1
2
3
         (proxy cert)                  (server cert)
client <=================> proxy <==================> servers
https https

9. SNI. SNI (Server Name Indication) is an extension to TLS that allows a client to specify which hostname it is attempting to connect to at the start of the TLS handshaking process. (Because one single virtual server may host several secure web sites, the HOST header is hidden in TLS.)

SNI sends host name in clear text, since it is in first hello message in handshake. ESNI is new proposal to encrypt SNI hello message.

Demo: launch 3 web sites in laptop: 127.0.0.1:8080, 127.0.0.1:8081, 127.0.0.1:8082 and a haproxy 0.0.0.0:80 (reverse proxy), configuring the router routes internet inbound traffic to haproxy to mimic situation in public cloud.

Then use noip create 3 different domain names, then assign route’s public IP to each domain name.

如果使用HTTP, 则虽然访问的domain 不一样,但背后的IP是一样的,根据haproxy内部的设置通过parse HOST header把流量转发到对应的web site上。

如果要使用HTTPS, 用certbot 生成3个certs, private keys 对应于3个web sites, 然后配置 haproxy使用SSL/TLS 和这些certs. 这时因为haproxy无法看到HOST head了,SNI才开始起作用 从而client (browser)能获取正确的cert。这里haproxy 应该是做了TLS termination.

这里Demo解释了当时envoy demo没看懂的地方,实际上就是更改了router的配置,所以才能用noip domain去访问private网站!

There is also Envoy sandbox for TLS SNI demo.

The first edition was written on 2020-08-30.

Demo

This Github repo has demos for some important types of envoy proxies.

Some issues I had at the time of using Envoy:

About Source Code

The protobuf plays a central role on Envoy configuration and every component in Envoy is defined by protobuf. Here I will show some of them I explored.

For example, in external authz gRPC server demo code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import (
auth_pb "github.com/envoyproxy/go-control-plane/envoy/service/auth/v3"
)

func (s *service) Check(ctx context.Context,
r *auth_pb.CheckRequest)
(*auth_pb.CheckResponse, error) {
fmt.Println("received check request")
// return nil, fmt.Errorf("error")
return &auth_pb.CheckResponse{}, nil
}

func main() {
auth_pb.RegisterAuthorizationServer(grpcServer, s)
}

The Check handler is specified in module external_auth.pb.go#L704 and defined in proto file service external_auth.proto#L33.

Testing Facilities

There are some CLI and online facilities can help proxy testings:

For complex testing that multiple components are involved, utilizing docker compose to make them work together.

NOTE: nc and telnet can also work with HTTP server, but you need to input HTTP directives in connection, for example: GET /<path> HTTP/1.1

Envoy Training

So far the best Envoy learning series. The key takeaways are summarized in subsequent sections.

Episode 1: Intro to Envoy

The codelab Github repo.

  • Cloud Native L4/L7 proxy.
  • Extenability.
  • Control via API(usually gRPC): control plane/ data plane.
  • Observability: traces and metrics.

Core concepts and data flow, the same order in Envoy config yaml file:

1
2
3
4
5
6
7
8
9
Requests
-> Listeners
-> Filters(routing decision): chained and order matters.
-> TCP Filters
-> HCM(http_connection_manager) Filters: turns envoy to http L7 proxy.
-> HTTP Filters: operates on http header, body, etc.
-> Router Filters: sends traffic to upstream.
-> Clusters: upstream destinations
-> Endpoints/Cluster member/Cluster Load Assignment

Episode 05: Envoy filters

Envoy HTTP Filters:

  • Code that can interact with request/response.
  • Async IO.
  • Transparently work with HTTP 1.1 or 2/3.
  • Chained together.

Episode 15: Envoy + External Services

The external authz gRPC server is referenced from this episode, super helpful, see codelab

Other Learning Resources

Istio (as far as I understand it) is basically an Envoy discovery service that uses information from the Kubernetes API (eg the services in your cluster) to configure Envoy clusters/routes. It has its own configuration language.

Write Better Unit Test

A single unittest case should have:

  1. Cover only one path through code.
  2. Have better asserts and documentation.
  3. Providing informative failure messages.

To avoid reinventing the wheel, using test fixtures:

test fixture: represents the preparation needed to perform on or more tests, and any associated cleanup actions. This may involve, for example, creating temporary or proxy databases, directories, or starting a server process.

graph TD;
  setUpModule --> setUpClass;
  setUpClass --> setUp;
  setUp --> test_*;
  test_* --> tearDown;
  tearDown --> setUp;
  tearDown --> tearDownClass;
  tearDownClass --> setUpClass;
  tearDownClass --> tearDownModule;

Write Testable Code

Some important techniques to make code easier to test:

  • Documentation.
  • Dependencies.
  • Decomposition.
  • Graceful and informative failure.

Google Blog Writing Testable Code.

Dependency Replacement

Test Double: A simplified replacement for any dependency of a system under test.

You should use test doubles if the real thing:

  • Isn’t available
  • Won’t return the results needed
  • Would have undesirable side effects
  • Would be too slow

Types of test doubles:

What it does When to use
Placeholder Does nothing.
Passed around but never used
You need a placeholder.
Stub Provides canned answers. You want the same result every time.
Spy A stub that remembers
how it was called.
You want to verify functions were called the right way.
Mock Configurable mimic of
a particular object.
Can behave like a dummy, stub, or spy.
Fake A simplified version of
the real thing
Interacting with a complicated system.

Other Fakes

You can find on Internet such as MySQL fake, etc.

Unittest Module

Python unittest module: The python unittest module: https://docs.python.org/3/library/unittest.html#module-unittest

Basic example: https://docs.python.org/3/library/unittest.html#basic-example

How to organize the code: setUp()/tearDown() for each test case: https://docs.python.org/3/library/unittest.html#organizing-test-code

Also it has class and module level fixtures: https://docs.python.org/3/library/unittest.html#class-and-module-fixtures

Assertion methods: https://docs.python.org/3/library/unittest.html#assert-methods

Assert methods allow custom messages, for example:

1
2
3
4
self.assertEqual(
10, call_count,
'The call count should be 10, not {}.'
.format(call_count))

Command-Line interface: https://docs.python.org/3/library/unittest.html#command-line-interface

1
2
3
4
5
6
7
8
9
10
11
12
# see list of options
python -m unittest -h

# -v: verbose
python -m unittest -v test_module
python -m unittest test_module1 test_module2
python -m unittest test_module.TestClass
python -m unittest test_module.TestClass.test_method

# -k: pattern match with substring
# --durations: show N slowest test cases
python -m unittest -v test_module -k foo --duration 5

test_module is a python file, such as my_module.py and my_module can be imported by other python programs.

Unittest Parameterized

Parameterized module, installation and examples: https://github.com/wolever/parameterized

When using with mock.patch decorator, the order matters, for example: https://github.com/wolever/parameterized?tab=readme-ov-file#using-with-mockpatch

1
2
3
4
5
6
7
8
9
10
11
import unittest
from parameterized import parameterized
from unittest import mock

@mock.patch("os.getpid")
class TestXXX(unittest.TestCase):
@parameterized(...)
@mock.patch("os.fdopen")
@mock.patch("os.umask")
def test_method(self, param1, param2, ..., mock_umask, mock_fdopen, mock_getpid):
...

Unittest Mock

Unittest mock library, it is extremely important: https://docs.python.org/3.8/library/unittest.mock.html#

Usually Mock, MagicMock and patch decorator are enough for most cases, please check quick guide for quick onboarding: https://docs.python.org/3.8/library/unittest.mock.html#quick-guide

There is a separate document for using mock, it is more advanced: https://docs.python.org/3.8/library/unittest.mock-examples.html#

A mock object can pretend to be anything. It can:

  • Return expected values when called.
  • Keep track of how it was called.
  • Have attributes.
  • Call other functions.
  • Raise exceptions by side effect.

A Mock object can have assertion for how it has been used, for example:

1
2
3
4
5
6
7
8
mock = Mock()
mock.method()

mock.assert_called()
mock.method.assert_called_once()

mock.method(1, 2, 3, test='wow')
mock.method.assert_called_with(1, 2, 3, test='wow')

The mock assert methods are under Mock class with examples: https://docs.python.org/3.8/library/unittest.mock.html#the-mock-class

Some other examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
m = mock.Mock()
# Set up return value.
m.return_value = 'foo'
_ = m()

# Tracking calls and arguments
self.assertTrue(m.called) # bool
self.assertEqual(2, m.call_count) # int

# With what args called?
# All calls
m = mock.Mock()(return_value=None)
m(1, 2, 3)
m(4, 5, 6)
m()
expected = [mock.call(1, 2, 3), mock.call(4, 5, 6), mock.call()]
self.assertEqual(expected, m.call_args_list)

# Most recent call args.
args, kwargs = m.call_args

# Check most recent calls usage.
m.assert_called_with('foo', a=1)
m.assert_called_once_with('foo', a=1)

# Check other calls.
# Use mock.ANY as a placeholder.
self.assertTrue(m.assert_any_call('foo', a=1))
self.assertTrue((m.assert_has_calls([
mock.call('foo', a=1),
mock.call(mock.ANY, a=1),
])))

Mock vs Magic Mock

Basically, MagicMock is a subclass of Mock with default implementations of most of the magic methods such as __init__. You can use MagicMock without having to configure the magic methods yourself.

Return Valud Vs Side Effect

The return_value is static, but side_effect is versatile, it can configure not only return value dynamically but also arising exception, please see examples: https://docs.python.org/3.8/library/unittest.mock.html#unittest.mock.Mock.side_effect

they can be used together.

Patch Decorator

This is a commonly used technique, eapecially the object patch, for example, in your method you call a class instance method, the patcher can help set up the return value or side effect for it.

For object patch, the object you specify will be replaced with a mock (or other object) during the test and restored/undo when the test ends:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# the autospec=True: the mock will be created with a spec from the object being
# replaced.
@patch.object(
SomeClassorModule,
'class_or_module_method',
autospec=True,
return_value="hello",
side_effect=xxx)
# the patched object is passed as an extra argument to the decorated function
# here it is "mock_method".
def test(mock_method):
ret = SomeClass.class_method(3)
mock_method.assert_called_with(3)
assert ret == "hello"

Order when multiple patch decorators, it is bottom-up:

1
2
3
4
5
6
7
from unittest.mock import patch

@patch('module.ClassName2')
@patch('module.ClassName1')
def test(MockClass1, MockClass2):
# note the order in signature: MockClass1 the bottom one first.
pass

Helpers

The commonly used helpers:

  • mock.ANY

For revisit, go to check the Python Basic tutorial and Sample Code.

The gRPC is the open source version of Stubby from Google, it uses Protocol Buffer as both its Interface Definition Language (IDL) and as its underlying message interchange format.

gRPC clients and servers can run and talk to each other in a variety of environments and can be written in any of gRPC’s supported languages. So, for example, you can easily create a gRPC server in Java with clients in Go, Python, or Ruby.

Core Concepts for gRPC. There are regular and stream request/response types, they can be combined in any form, for example:

1
2
3
4
5
6
7
8
service Greeter {
rpc SayHello(HelloRequest) returns (HelloResponse) {}
// Response is using yield.
rpc LotsOfReplies(HelloRequest) returns (stream HelloResponse) {}
// Request is using iterator.
rpc ZLotsOfGreetings(stream HelloRequest) returns (HelloResponse) {}
rpc BidiHello(stream HelloRequest) returns (stream HelloResponse) {}
}

The comments in proto service definition will be translated to docstring in generated code, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// import definition.
// https://developers.google.com/protocol-buffers/docs/proto#importing
import "xxx/utility.proto";

// The greeting service definition.
service Greeter {
// Sends a greeting.
rpc SayHello (HelloRequest) returns (HelloReply) {}
// Sends a greeting again.
rpc SayHelloAgain (HelloRequest) returns (HelloReply) {}
}

// The request message containing the user's name.
message HelloRequest {
string name = 1;
}
// The response message containing the greetings
message HelloReply {
string message = 1;
}

As above shows, write message and service in .proto file.

Basic tutorial for Python, run within the python virtualenv, for example:

1
2
3
virtualenv -p python3 grpc
# python venv manager if you have installed.
workon grpc

Required python modules for gRPC:

1
2
3
4
5
6
7
# Install gRPC.
python -m pip install grpcio

# Python’s gRPC tools include the protocol buffer compiler protoc and the
# special plugin for generating server and client code from .proto service
# definitions.
python -m pip install grpcio-tools

After finished the proto file definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Command help
python -m grpc_tools.protoc --help

# This will generate xx_pb2.py and xx_pb2_grpc.py file.
# xx_pb2.py: Contains request and response message classes.
# xx_pb2_grpc.py: Contains client and server and utilities classes:
# - class xxxStub(object)
# - class xxxServicer(object)
# - def add_xxxServicer_to_server(servicer, server)
# The relative path can be used here.
python -m grpc_tools.protoc \
--proto_path=<target proto file folder path> \
--python_out=<Generate Python source file> \
--pyi_out=<Generate Python pyi stub> \
--grpc_python_out=<Generate Python source file> \
<target proto file path>

When finishes the basic codelab, here are many different patterns and usages to create client-server applications.

TIPS: Using Python typing to indicate the parameters type in service method to make it clear.

NOTE: In basic tutorial for asynchronous pattern, it uses Python asyncio as implementation.

For gRPC authentication in Python, please see ALTS authentication.

It is possible that the deletion of k8s CRD hang for indefinite time due to dependency issue.

If it happens, manually edit the CRD and empty the finalizers value block:

1
2
finalizers:
- xxx.finalizer.networking.gke.io

For programmatic way, what I did is running delete kubectl command in background and followed by a kubectl patch command to remove the finalizer block, for example the patch command:

1
2
3
kubectl path <crd type name> <crd resource name> \
{"metadata":{"finalizers":[]}} \
--type merge

The background delete command can be forcely killed at end or after some period of time if it does not fulfill.

Some useful CLI tips in daily uses, references:

Explain resources with specific fields.

This can help you explore the complete resource definition to find all available fields.

1
2
3
4
k explain ep
k explain pod.spec
k explain deploy.metadata.labels
k explain BackendConfig.spec # For GKE ingress

Create manual job from cronjob, you can also output as json template, edit

and use it.

1
k create job --from=cronjob/<name of cronjob> <manual job name>

Check service account/regular user permission.

1
2
3
4
5
6
# get sa name
k get sa

k -n <namespace> auth can-i \
--list \
--as system:serviceaccount:<namespace>:<service account name>

To check what you can do:

1
k auth can-i --list

Force delete pods with no grace period

This only works on pod resource:

1
k delete pod xxx --grace-period=0 --force

List endpoints(pod IP:port).

1
k get ep -n <namespace>

List events sorted by lastTimestamp.

1
2
# lastTimestamp is added by k8s in resource definition yaml
k get events --sort-by=".lastTimestamp"

Watching from events.

1
2
3
4
5
6
7
8
k get events -w --field-selector=type=Warning -A

# watch events for specific container
# The "involvedObject.fieldPath" is JSON return of events, see
# https://stackoverflow.com/questions/51931113/kubectl-get-events-only-for-a-pod
k get events \
--field-selector involvedObject.fieldPath="spec.containers{<container name>}" \
--sort-by=".lastTimestamp"

Get raw json for APIs.

1
2
3
k get --raw /apis/apps/v1
# Get metrics
k get --raw /metrics

Wait for pods to be ready.

1
k wait --for=condition=ready pod -l foo=bar

List customer env vars from a resource.

1
k set env <resource>/<resource-name> --list

List node and its pods mapping.

1
2
3
# column:value mapping:
# NODE:.spec.nodeName
k get po -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name

Create starter YAML for resources.

1
kubectl create deploy anyname --image=nginx --dry-run=client -o yaml

Pods sorted by memory usage.

1
kubectl top pods -A --sort-by='memory'

To examine logs of previous restarted pod/container:

1
2
3
4
5
6
7
8
9
# Other useful options:
# -f: streaming logs
# --tail: number of line to display
# --since: return logs newer then a relative duration 5s, 2m, 3h
kubectl logs <pod name> -c <container name> --previous
# if no -c specified, return first container log by default

# --tail 5: only show last 5 lines
kubectl logs <pod name> -c <container name> -f --tail=5

Aggregate(tail/follow) logs from multiple pods into one stream.

kubetail is written by bash so can be used without installing other dependencies, git clone and put the executable to $PATH.

1
2
3
4
5
6
# Tail all pods from a deployment/sts.
kubetail <deployment name>
# Tail specific container from deploy/sts.
kubetail <deploy name> -c container1 -c container2
# Tail with regex matching.
kubetail "^app1|.*my-demo.*" --regex

Rolling out/back resources

Better than editing resource manually with kubectl edit cmd. The cheat sheet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# rolling update with new image
# foo1 is the container name
k set image deploy/foo foo1=image:v2

# Check the history of deployments including the revision
k rollout history deploy/foo

# Rollback to the previous deployment or specific version
# Undo twice turns it back to the original: 1->2->1
k rollout undo deploy/foo

# Watch rolling update status until completion
k rollout status -w deploy/foo
# Rolling restart of the "foo" deployment
k rollout restart deploy/foo

How to roll out back to a specific version:

1
2
3
4
5
6
7
8
9
# first check current version number
k get deploy/foo -o yaml | grep revision

# check rollout history and the revision detail
k rollout history deploy/foo
k rollout history deploy/foo --revision=13

# rollout to the target version
k rollout undo deploy/foo --to-revision=13

Apart from working with data stream and rollover alias, ILM can also apply on individual indices with rollover action disabled.

Individual Index

For example, create a ILM policy that has hot(no rollover) and cold tiers and deletion phase, using this ILM policy with index template or updating on existing indices.

Although no rollover, it is still helpful to manage the retention and data tiers of individual index.

Switch Policy

To switch ILM policy for data stream or alias or individual, reference.

Debug ILM Error

There is an example steps about debugging the ILM error, reference.

At the time of this writing, we use Elasticsearch version 7.16.2, the reference is based on this version and the content may be subject to change.

Demo

Quick Demo to practice rollover alias and ILM.

Why not Data Stream

Manage time series data without data streams.

We recognise there might be use-cases where data needs to be updated or deleted in place and the data streams don’t support delete and update requests directly. In these cases, you can use an index alias to manage indices containing the time series data and periodically roll over to a new index.

Clarification

Note the difference between the rollover_alias field and the _alias API, the rollover_alias is configed in index template’s settings section and paired with ILM to ease rollover. The _alias API is more like a group operation to convenient the search, query, etc and you can also set write index for it.

In the Kibana template creation, there is a Alias page, it has nothing to do with rollover_alias.

After applying the rollover_alias, the managed backing index will have alias(you can see from GET or Kibana) and the _alias API works on it as well.

Reference for Alias API.

Logstash Plugin

Just like the data stream, the rollover_alias has its place in Logstash Elasticsearch output plugin, for example:

1
2
3
4
5
6
7
output {
elasticsearch {
ilm_rollover_alias => "custom"
ilm_pattern => "000001"
ilm_policy => "custom_policy"
}
}

From document, this config will overwrite the index settings and adjust the Logstash template to write the necessary settings for the template to support index lifecycle management, including the index policy and rollover alias to be used, So looks like there is no need to preconfig the template to have rollover_alias and ILM, Logstash will do it.

At the time of this writing, we use Elasticsearch version 7.16.2, the reference is based on this version and the content may be subject to change.

Demo

A quick docker compose setup to play with data stream and observe how ILM behaves.

A more detailed way to set up and use data stream and ILM from official document.

In reality, consider the tier size in terms of multiple factors: disk space usage, CPU LA, CPU usage, for example, when the disk space usage is low but the CPU LA could beyond the vcpu limits, then you should not shrink the tier aggressively.

Data Tiers

Details please see Data Management explanation.

Data tiers is automatically ingetrated with Data Stream, to move cold indices to less performance and cost hardware, as the docker compose setup demo shows.

Things to highlight:

  • The content tier is required. System indices and other indices that aren’t part of a data stream are automatically allocated to the content tier.
  • The hot tier is required. New indices that are part of a data stream are automatically allocated to the hot tier.

Decommission Data Nodes

Cluster-level shard allocation filtering.

To decommission data node from tiers, first drain all shards from it:

1
2
3
4
5
6
7
# multiple ips separated by comma
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip" : "<target data node ips>"
}
}

Then check allocation to make sure 0 shard resides:

1
GET _cat/allocation?v

Then revert the cluster level transient setting:

1
2
3
4
5
6
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip" : null
}
}

Data Stream

Data Stream is well-suited for logs, events, metrics, and other continuously generated append-only data.

Data stream consists of hidden backing indices, not to be confused with the hidden data stream. To list all hidden or non-hidden data streams:

1
GET _data_stream?expand_wildcards=hidden

Hidden data streams are usually for facilitating purpose, we don’t use it.

The backing index name pattern:

1
.ds-<data-stream>-<yyyy.MM.dd>-<generation>

The same index template can be used for multiple data streams, the index patterns in settings can use regexp to extend match:

1
"index_patterns" : ["apple-*"]

Then apple-green and apple-yellow are 2 data streams.

One data stream only has one wirte index.

ILM

ILM: Manage the index lifecycle, ILM is tightly working with Data Steam as the docker compose setup demo shows.

One thing is worth to notice is the age of shard, for instance, the write index at hot tier(always start at hot) is named as .ds-example-2022.07.26-000002, at the time of rollover the age of it is reset to 0. If from hot to cold tier the min age is 7 days, let’s say the rollover is at 2022.08.01, then the shift to cold tier of this shard will be on 2022.08.08, instead of 2022.08.02(2022.07.26 + 7 days).

So, to transition backing index to next tier right after rollover, the window should be set as 0 days.

The age of the shard can be examined by ILM explain API, and list of other important fields:

1
GET .ds-example-2022.07.26-000002/_ilm/explain

The update on working ILM policy has some limitations:

  • If changes can be safely applied, ILM updates the cached phase definition. If they cannot, phase execution continues using the previous cached definition.
  • Changes to min_age are not propagated to the cached definition. Changing a phase’s min_age does not affect indices that are currently executing that phase.
  • When you apply a different policy to a managed index, the index completes the current phase using the cached definition from the previous policy. The index starts using the new policy when it moves to the next phase.
0%