Customize User Settings

The example User Setting.json, not a exhausted list, but useful:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
// The VSC font size.
"editor.fontSize": 15,
// 2 spaces when indent
"editor.tabSize": 2,
// Column ruler and color.
"editor.rulers": [80],
// Stay the same with iTerm2 ohmyzsh font.
"terminal.integrated.fontFamily": "MesloLGS NF",
// This can be customized as needed.
"python.defaultInterpreterPath": "/usr/bin/python3",
"files.autoSave": "afterDelay",
"workbench.iconTheme": "material-icon-theme",
"workbench.colorCustomizations": {
"editorRuler.foreground": "#5c7858"
},
// After installing go plugin
"go.useLanguageServer": true,
"go.toolsManagement.autoUpdate": true,
// To make VSC compile code correctly with file that has below build tags
// otherwise VSC cannot find them, for example, the struct in Mock file.
"go.buildTags": "integration test mock lasting",
// For Markdown Preview Github Styling plugin.
"markdown-preview-github-styles.colorTheme": "light",
// For Markdown Preview Mermaid Support plugin.
"markdown-mermaid.darkModeTheme": "neutral"
}

Image Rendering

The pictures displayed in certain blogs are stored in my Google drive, you need to allowlist for anyone on internet and use the below path in markdown:

1
![](https://drive.google.com/thumbnail?id=<google img share link id>&sz=s1000)

Useful Extensions

The following VSC plugins are highly recommended.

  • Material Icon Theme, better view for icons

  • Remote Development, it includes:

    • Remote - SSH
    • Remote - SSH: Editing Configuration Files
    • WSL
  • Docker

  • Python

  • Pylint, you need to customize the configuration https://code.visualstudio.com/docs/python/linting https://dev.to/eegli/quick-guide-to-python-formatting-in-vs-code-2040

  • Go: must have for go development

  • YAML

  • HashiCorp Terrform

  • Ansible

  • gRPC Clicker, grpcurl under the hood

  • vscode-proto3, proto syntax highlight

  • Thunder Client, the VSC postman

  • TODO Highlight

  • Better Comments

  • Indenticator, highlights indent depth

  • GitLens, code blame, heatmaps and authorship, etc.

  • Markdown Preview Github Styling, rendering markdown file as Github style

  • Markdown Preview Mermaid Support, draw sequence diagrams, see Example and Live Editor

NOTE: The Mermaid plugin here is for VSCode, not for Hexo deployment, to enable Mermaid in Hexo rendering, please check Hexo setup blog.

I have encountered a issue that the go extension does not work properly, for example the source code back tracing is not working, the troubleshooting please see here. Most likely you need to run go vendor to download the dependencies locally.

Email Structure

Understand the purpose of every email, for example:

  • Educate or inform
  • Make a request
  • Introduction
  • Respond

Before sending, review for purpose alignment. If not, adjust the message.

Key elements for good email:

  • Subject line: keep email from getting deleted
  • Introduction: create context, build trust, remind who you are
  • Message: bulk of the email
  • Call to action: request, the last part of the body
  • Signature: provide contact information, your brand

Adjust the From name and email address, for example chengdol <email address> , this can be done through Send email as in Gmail settings.

The Signature helps people understand your skill set, value proposition, may include:

  • your name
  • tagline: directly under your name, a few words, can be your title.
  • phone, address, links

To, CC and BCC

  • To: directly talk to a person.
  • CC: who can hear the conversation, don’t have to act on it.
  • BCC: blind from others, only the person put you in BCC field knows you are listening, no one else know you are on recipient list, fine to reply to sender but not reply all.

Communicating Better

Visuals may help, but should not make it distracting, bold or highlighting is good to make response stand out.

When many people involved, use Reply All, unless you intend to start a side conversation. Move to Reply, for example:

1
2
Thank again for the introduction, I'll move this conversation with xx to another
thread so we don't clutter your inbox.

Vacation Responder

Set out-of-office auto reply, for example, at a conference, on leave. A good example:

1
2
3
I’m currently consumed with a project that is taking almost all of my time. If
you are emailing me with a product or support question, please email Liz.
Otherwise, I will try to respond within 24 hours.

Send Report

This is usually for project progress report weekly to stakeholders. Draft them in spreadsheet and copy to email.

There are 5 stages for a project (Not Started, In Progress, Complete):

  • Concept Exit
  • Design & Planning
  • Execution/Implementation
  • Preview
  • GA

You can also point out weekly:

  • Highlights
  • Lowlights

The risk status of milestone for different teams or components:

  • On Track(green): good, healthy
  • At Risk(amber): signal caution
  • Off Track(red): unlikely to be successful or correct

You can also add Next Steps and ETA, etc.

There is an example of the table(generated from https://tableconvert.com/markdown-generator), for status, using color to highlights:

Overall Risk Status: ON TRACK
Highlights:
Lowlights:
Workstreams Exit Date Weekly Updates Owner(s) Open Risks/Mitigation Status
ON TRACK
Key Decisions made this week:

Schedule Send

This feature is helpful.

Proofread

  • purpose aligned
  • body
  • distraction
  • call to action is clear
  • spelling and grammar
  • proper audience
  • any attachment

When the stakes are high, and your email can have a major impact on the outcome, it can pay to invest your time in proper proofreading, you can ask LLM to help proofread.

Demos

You can also generate draft or template from bard or charGPT, with the input from you for your purpose.

The First Communication

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## subject
Reaching out from Twitter

## introduction
Hi Chris, I have been following you on Twitter for a while, and have interacted
a little with you over the last few weeks. I wanted to bring the conversation
over to email.

## message body

## call to action
Can we get on a call in the next week or so? I am open all day Thursday, just
let me know what works for you.

## signature

Virtual Introduction

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
## subject
Virtual introduction: Matt and Jesse

## introduction
Hi Matt and Jesse, as per our previous conversations I wanted to introduce you
to one another.

## message
Matt, I’ve known Jesse for a few years and know him to be a very clever
developer, and a loyal friend. I know he can help you with some of the coding
challenges you are facing right now.

Jesse, Matt is my friend and colleague, and can better explain his challenges
than I can, but I think you are the right person for him to talk to.

## call to action
I hope you two can get together soon. I’ll let you take it from here.

## signature

Information Heavy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Follow-up on job search information

Hi Laurie, you had asked for information to help you with your job search
Thursday morning when we spoke.

Below are a few of my favorite blog posts which I think are relevant to where
you are, based on our conversation. I am happy to talk about any of this with
you, over email or on a call. Just let me know what works best for you.

## some links here

I have been blogging for over 14 years, and have plenty to share, but I thought
these would be the most interesting and meaningful to you.

I would love to jump on a call this week to talk about your next steps. Are you
available for a call Friday before 2?

## signature

Respond to Questions

1
2
3
4
5
Hi Jim, thank you for your thoughtful email. You have a lot of questions and
ideas in there. Please scroll down and see my comments in `yellow`.

## copy the original email and answer right after each question and highlight
## with yellow

Negative Situation

1
2
3
4
5
6
7
Team performance

Mike, I promised you an email follow-up from our conversation this morning. I
know this is an uncomfortable conversation and I appreciate your willingness to
address this with me.

There are two issues we need to address.
1
2
3
4
5
6
7
8
9
Project Foo meeting this morning

Hi Carlos, let me first apologize for how the meeting went this afternoon. I
could tell that you were uncomfortable. I wanted to share my perspective on what
was happening.

While I knew there was a chance your Project Foo was going to be killed, I was
not aware of the reasons stated in the meeting. I have seen what you and your
team have done with Project Foo and I have been very impressed.

Seek Help

1
2
3
4
5
6
7
8
9
10
Hi xx,
Hope you are doing well!

We have started the work related to ...
## body

At this point it is not clear for us how to proceed ...
Can you please provide guidance, or point us to someone who can assist?

Appreciate your help!

今天决定把Soft Skill相关的总结单独分出来。除了日常工作对话,后续重要的技能包括email, phone, technical writing 以及 negotiation, management等.

关注并接触Leadership 是始于Amazon的面试准备,虽然我很烦在面试中考察这种形式化的东西,但我不得不说它非常重要,特别是在英文背景中我这方面欠缺挺大的,必须要系统性的总结一下。

Leadership View

I think you have been doing a nice job.

figure out customer requirements hold meetings write agendas status reports

Critical skill of being a leader:

  • Communication
  • Effective management skill
  • Emotional intelligence (情商) and Empathy

Communicate in clear, credible and authentic way. Use passion and confidence to enhance the message.(口音,表情,肢体语言) Inspire, motivate others Informs, persuades, guides, assures

when you be a leader in your new team:

  • devote time and energy to establishing how you want your team to work
  • first few weeks are critical
  • get to know your team members
  • showcase your values
  • explain how you want the team to work
  • set and clarify goals, walk the talk
  • don’t be afraid to over communicate

Don’t:

  • 不要认为没建立关系也能完成工作
  • 不要假设成员理解你的工作模式和期望
  • 不要担心在开始阶段过多的重复谈话(reiterate the stragety over and over)

What does leadership mean to you? connection:

  • focus on the person
  • influence
  • words

changes:

  • vision
  • action
  • drive change

motivate:

  • inspire motivation
  • long-lasting motivation

Why they will follow you? make them feel comfortable, configent and satisfied:

  • trust
    • be open, fair and listen
    • admit mistake
    • be decisive
    • respect the opinions of others
  • compassion
  • stability
  • hope

Effective Leader

The more it becomes about people. The less it becomes about your personal tech expertise. The broader your domain becomes, making you even more removed.

Always be deciding(identify ambiguity, make tradeoff) Always be leaving(make the team self-drive without you) Always be scaling

//TODO: [ ] elasticsearch certificate [ ] es ML node for abnormal detect (这个还有点意思,用来分析数据的) [ ] logstash input data for testing, movielens [x] linkedin learning: [x] JKSJ elasticsearch training and git repo [x] cerebro, tutorial

ELK Stack

The Elastic Stack is one of the most effective ways to leverage open source technology to build a central logging, monitoring, and alerting system for servers and applications.

  • Elasticsearch: distributed, fast, highly scalable document database.
  • Logstash: aggregates, filters and supplyments log data, forwards them to Elasticsearch or others.
  • Kibana: web-based front-end to visualize and analyze log data.
  • Beats: lightweight utilities for reading logs from a varity of sources, sends data to Logstash or other backends.
  • Altering: send notifications to email, slack, pagerduty so on and so forth.

ELK Stack vs Prometheus: ELK is general-purpose no-sql stack can be used for monitoring, aggregating all the logging and shipping to elastic search for ease of browsing all the logging and similar things. Prometheus is dedicated monitoring system, alongside with service discovery consul and alert-manager.

My Vagrant elasticsearch cluster setup. Java runtime and /bin/bash that supports array are required, also note that elasticserach cannot boot by root user.

Another option is to use docker compose to create testing elasticsearch cluster, see my repo here.

Elasticsearch

Version Rolling Upgrade, some highlights:

  1. Set index.unassigned.node_left.delayed_timeout to hours
  2. Starts from data nodes, then master nodes, one by one, ensure config yaml file is correct for each role
  3. Wait for recovery with big retries
  4. Revert index.unassigned.node_left.delayed_timeout
  5. Upgrade kibana version

This blog series talks about kafka + elastic architecture.

This blog shares ways to enable data high reliability as well as extending resources. As we see the kafka message queue also benefits the data reliability besides throttling.

There are several ways to install: binary, rpm or on kubernetes. The package is Java self-contained, you can also specify ES_JAVA_HOME to use external Java.

Install using archive. The Elasticsearch .tar.gz package does not include the systemd module (have to create by yourself). To manage Elasticsearch as a service easily, use the Debian or RPM package instead.

It is advisable to change the default locations of the config directory, the data directory, and the logs directory. Usually data directory is mounted on separate disk.

Before launching, go to edit $ES_HOME/config/elasticsearch.yml. The configuration files should contain settings which are node-specific (such as node.name, node.role and storage paths), or settings which a node requires in order to be able to join a cluster, such as cluster.name and network.host.

The config path can be changed by ES_PATH_CONF env variable.

Important elasticsearch settings

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# cluster name
cluster.name: chengdol-es
# node name
node.name: master
# ip to access, the host public IP
# or using interface name such as _eth1_
network.host: 9.30.94.85

# a list of master-eligible nodes in the cluster
# Each address can be either an IP address or a hostname
# that resolves to one or more IP addresses via DNS.
discovery.seed_hosts:
- 192.168.1.10:9300
# port default 9300
- 192.168.1.11
- seeds.mydomain.com
# ipv6
- [0:0:0:0:0:ffff:c0a8:10c]:9301

To form a production cluster, you need to specify, for node roles, see this document about how to statically specify master and data nodes.

1
2
3
4
5
6
7
8
cluster.name
network.host
discovery.seed_hosts
cluster.initial_master_nodes
# specify dedicated node role
# lower version have different syntax
node.roles: [ master ]
node.roles: [ data ]

The master node is responsible for lightweight cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes.

High availability (HA) clusters require at least three master-eligible nodes, at least two of which are not voting-only nodes. Such a cluster will be able to elect a master node even if one of the nodes fails.

Data nodes hold the shards that contain the documents you have indexed. Data nodes handle data related operations like CRUD, search, and aggregations. These operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more data nodes if they are overloaded.

Important system settings:

  • disable swapping
  • increase file descriptors
  • ensure sufficient virtual memory
  • JVM DNS cache settings
  • temporary directory not mounted with noexec
  • TCP retransmission timeout

Regarding JVM settings in production environment, see this blog:

  • set Xmx and Xms the same
  • java heap size <= 50% host memory capacity
  • heap <= 30GB

Check elasticsearch version

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 9200 is the defaul port
# on browser or kibana dev console
curl -XGET "http://<master/data bind IP>:9200"
# response from Elasticsearch server
{
"name" : "master",
"cluster_name" : "chengdol-es",
"cluster_uuid" : "XIRbI3QxRq-ZXNuGDRqDFQ",
"version" : {
"number" : "7.11.1",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "ff17057114c2199c9c1bbecc727003a907c0db7a",
"build_date" : "2021-02-15T13:44:09.394032Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

Check cluster health and number of master and data nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
curl -X GET "http://<master/data bind IP>/_cat/health?v=true&format=json&pretty"
# response example
{
"cluster_name" : "chengdol-es",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Besides master and data node role, there are ingest node, remote-eligible node, coordinating node(can be dedicated).

Indiex

[x] how to create and config index [x] how to check index, status, config [x] how to create actually document data: check document API [x] how to reroute shards, through cluster APIs

ES 10 concepts, understand what is indices, document, fields, mapping, shards, primary and replicas shards, document, data node, master node, and so on.

First, an index is some type of data organization mechanism, allowing the user to partition data a certain way. The second concept relates to replicas and shards, the mechanism Elasticsearch uses to distribute data around the cluster.

1
2
Schema - MySQL => Databases => Tables => Row => Columns (事务性, Join)
Mapping - Elasticsearch => Indices => Types => Document => fields (相关性,高性能全文检索)

So just remember, Indices organize data logically, but they also organize data physically through the underlying shards. When you create an index, you can define how many shards you want. Each shard is an independent Lucene index that can be hosted anywhere in your cluster.

Index module: the settings for index and control all aspects related to an index, for example:

1
2
index.number_of_shards: The number of primary shards that an index should have
index.number_of_replicas: The number of replicas each primary shard has. Defaults to 1.

Index template: tell Elasticsearch how to configure an index when it is created, Elasticsearch applies templates to new indices based on an index pattern that matches the index name. Templates are configured prior to index creation and then when an index is created either manually or through indexing a document, the template settings are used as a basis for creating the index. If a new data stream or index matches more than one index template, the index template with the highest priority is used.

There are two types of templates, index templates and component templates(注意old version只有index template, see this legacy index template), template 其实包含了index module的内容.

Get index template details through API.

Elasticsearch API, for example: cluster status, index, document and shards, and reroute, to examine the node type (master and data nodes), shards distribution and CPU load statistics.

Let’s see an example to display doc content:

1
2
3
4
5
6
7
# cat indices
curl -X GET "172.20.21.30:9200/_cat/indices?format=json" | jq
# search index
# get list of docs and its ids
curl -X GET "172.20.21.30:9200/<index name>/_search?format=json" | jq
# get doc via its id
curl -X GET "172.20.21.30:9200/<index name>/_doc/<doc id>?format=json" | jq

To upload single or bulk document data to elasticsearch, see document API. You can download sample here: sample data, let’s try accounts data:

1
2
3
4
5
6
7
8
9
10
11
# the accounts.json does not have index and type in it, so specify in curl command
# /bank/account is the index and type
# es will create index bank and type account for you automatically
curl -s -H "Content-Type: application/x-ndjson" \
-XPOST 172.20.21.30:9200/bank/account/_bulk?pretty \
--data-binary "@accounts.json"; echo

# display indices
curl -XGET 172.20.21.30:9200/_cat/indices
# check doc id 1 of index `bank`
curl -XGET 172.20.21.30:9200/bank/account/1?pretty

Query data, run on kibana dev console:

1
2
3
4
5
6
7
8
9
# query account from CA
curl -XGET 172.20.21.30:9200/bank/account/_search
{
"query": {
"match": {
"state": "CA"
}
}
}

In the response message, the match has a _score field, it tells you how relevant is the match.

Plugins

Elasticsearch provides a variety of plugins to extend the system, for example, snapshot plugin, see here.

1
2
3
4
5
6
# list all installed plug-ins
bin/elasticsearch-plugin list
# example
bin/elasticsearch-plugin install analysis-icu
# api
localhost:9200/_cat/plugins

Kibana

Kibana get startd and download written in Node.js, no other dependencies needed.

Do a quick configuration, check my vagrant Kibana provision file and start in background:

1
2
3
4
5
6
7
8
9
server.port: 5601
server.host: "172.20.21.30"

server.name: "${KIBANA_SERVER_NAME}"
# 2 es nodes
elasticsearch.hosts: ["http://172.20.21.30:9200", "http://172.20.21.31:9200"]

pid.file: ${KIBANA_HOME}/kibana.pid
kibana.index: ".kibana"

Access by http://172.20.21.30:5601 in firefox browser.

Kibana has built-in sample data that you can play with, Go to add sample data then move to Analytics -> Discover to query and analyze the data. You need to know KQL to query document, Dashboard is also helpful.

Kibana dev console can issue HTTP request to explore ES APIs, command + enter to run (more shortcut see help menu, helpful!).

1
2
3
# echo command has a play botton
GET /_cat/indices?v
GET /_cat/nodes?v

Or the data can be ingested from Logstash, see below and my Vagrant demo. Need to create Index Pattern to load data and query.

Also you can install plug-ins for Kibana:

1
2
3
bin/kibana-plugin install <plugin>
bin/kibana-plugin list
bin/kibana-plugin remove <plugin>

Discover

I usually use Discover to filter and check log message, and use Dashboard to make graph, such Area, Bar, etc to extract data pattern.

How to draw graph easily from Dscovery:

  1. In Discover, query and filter to get the target log records.
  2. In the leftside panel, right click one of the selected fields -> Visualize.

I want to highlight that the Area graph, it will show you the proportion of target field value alone with the timeline. For example, in the graph settings, the horizontal axis is @timtstamp, vertical axis uses count and break down by the selected field of the message.

There is a saved object management, in Discover, Dashboard and Index pattern section, you can save items and manage them as well as export/import from other Kibana instance.

Logstash

Ingest data to elasticsearch or other downstream consumers, introduction. Usually be paired with Beats.

Logstash offers self-contained architecture-specific downloads that include AdoptOpenJDK 11, the latest long term support (LTS) release of JDK. Use the JAVA_HOME environment variable if you want to use a JDK other than the version that is bundled.

Logstash configuration Files Two types, pipeline config:

  • /etc/logstash/conf.d

and logstash settings:

  • startup.options
  • logstash.yml
  • jvm.options
  • log4j2.properties
  • pipelines.yml

There is a pipeline.workers setting in logstash.yml file and also some input plugin such as UDP has its own workers setting, what’s the difference? Read this post A History of Logstash Output Workers. So the input and (filter + output) are separated pools, they have separated worker thread settings, pipeline.workers is for (filter + output) part, the default value is equal to number of CPU core.

Getting started with Logstash

1
2
3
4
5
6
7
8
9
bin/logstash --help

# config file syntax check
# -f: config file
# -t/--config.test_and_exit:
bin/logstash -f first-pipeline.conf --config.test_and_exit
# start logstash
# -r/--config.reload.automatic: used to avoid restart logstash when change conf file
bin/logstash -f first-pipeline.conf --config.reload.automatic

Ad-hoc pipeline config:

1
2
3
4
5
cd /usr/share/logstash
## test running
bin/logstash -e 'input { stdin { } } output { stdout {} }'
bin/logstash -e 'input { stdin { } } output { elasticsearch { hosts => ["<master/data node ip>:9200"] } }'
## then type something to send

There are 4 important parts to config processing pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
# this output can be used for testing
output {
# can have multiple output plugin
elasticsearch {}
kafka {}
if "VERBOSE" in [message] {
file {}
}
else {
stdout { codec => rubydebug }
}
}

For stdout plugin, the output can be examined from journalctl -ex -u logstash -f.

Read Beats Data

To read data from Beats:

  1. Input: where is the data from? logs? beats?
  2. Filter: how should we parse the data? grok filters, geoip filters, etc.
  3. Output: where should we store the logs? backend? Elasticsearch?

Go to /etc/logstash/conf.d, create new file for example, beats.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
input {
## in Beats side, listening on port 5043
beats {
port => "5043"
}
}

filter {
if [type] == "syslog" {
## grok filter
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}

output {
elasticsearch {
## Elasticsearch address
hosts => [ "9.30.94.85:9200" ]
## write to index
## %{[@metadata][beat]}: these are field and sub_field in message
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

In above output %{[@metadata][beat]} is field access, please see Accessing event data and fields in logstash, the logstash data types

Then run command to testing file validity:

1
2
3
4
5
6
7
8
# --config.test_and_exit: parses configuration
# file and reports any errors.
bin/logstash -f beats.conf --config.test_and_exit

# The --config.reload.automatic: enables automatic
# config reloading so that don’t have to stop and
# restart Logstash every time modify the configuration file.
bin/logstash -f beats.conf --config.reload.automatic

Beats

https://www.elastic.co/beats/ Beats, written in golang, can output data to Elasticsearch, Logstash and Redis. But usually we send data to Logstash (pre-processing) then forward to Elasticsearch.

Each Beat has configure yaml file with detailed configuration guideline. For example, in the configure yaml file, comment out Elasticsearch output, use Logstash output.

  • Filebeat: text log files
  • Heartbeat: uptime
  • Metricbeat: OS and applications
  • Packetbeat: network monitoring
  • Winlogbeat: windows event log
  • Libbeat: write your own

X-Pack

X-Pack is an Elastic Stack extension that provides security, alerting, monitoring, reporting, machine learning, and many other capabilities. By default, when you install Elasticsearch, X-Pack is installed, it is open-source now.

Check X-pack, you will see the availability and status of each component:

1
curl -XGET http://9.30.94.85:9200/_xpack

后来还用到了Python Terraform package, 结合Python Click 以及其他custom Python package 去构造glue code.

Introduction

To alter a planet for the purpose of sustaining life.

This article from IBM can give you a good overview:

  • Terraform vs Kubernetes
  • Terraform vs Ansible

Removing manual build process, adopting declarative approach to deploy infrastructure as code, reusable, idempotent and consistent repeatable deployment.

Use gcloud API or client binary can do the same work as Terraform, so what are the benefits:

  • cloud agnostic, multi-cloud portable.
  • Unified workflow: If you are already deploying infrastructure to Google Cloud with Terraform, your resources can fit into that workflow.
  • Full lifecycle management: Terraform doesn’t only create resources, it updates, and deletes tracked resources without requiring you to inspect the API to identify those resources.
  • Graph of relationships: Terraform understands dependency relationships between resources.

Terraform 文档非常informative,结构清晰: Terraform之于cloud infra on public cloud 就相当于 helm之于application on k8s, 大大简化了操作复杂性,自动快速部署,同时做到了复用,versioning等特性。但得去了解cloud provider中提供的resources 的用途,搭配。

Github Repo

resource files in github: https://github.com/ned1313/Getting-Started-Terraform

Composition

Terraform executable: download from web (or build terraform docker image) Terraform files: using hashicorp configure language DSL Terraform plugin: interact with provider: AWS, GCP, Azure, etc Terraform state file: json and don’t touch it but you can view it to get deployment detial

You can have multiple terraform files: .tf, when run terraform it will stitch them together to form a single configuration. 比如把variables, outputs, resources, tfvars分开。

tfvars file by default named as terraform.tfvars, otherwise when run plan you need to specify the file path. This tfvars file is usually generated from some meta-data configuration, then combining with variable declaration file.

Commands

To execute terraform command, build a docker image and mount cloud auth credentials when start container. BTW, if you run on google compute VM, the SDK will inject the host auth automatucally in container. If you update the terraform file with different configuration, rerun init, plan and apply.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# list all commands
terraform --help
terraform version

# create workspace, see below section
terraform workspace

# linter
terraform validate

# show a tree of providers in main the sub-modules
# for example, google, kubernetes, random, null, locals they are all providers
terraform providers

# init will download plugin, for example, aws, gcp or azure..
terraform init

# will show you the diff if you update your terraform file
# load terraform.tfvars by default, if not, need to specify
terraform plan -out plan.tfplan


# will generate a tfstate file
# perform creation as much parallel as possible
# --auto-approve: for script, no interactive
terraform apply "plan.tfplan" [--auto-approve]

# -state: output state to specific file
# when run different env with a single definition file
terraform apply -state=qa-env.tfstate -var environment=qa "plan.tfplan"

# Manually mark a resource as tainted, forcing a destroy and recreate
# on the next plan/apply.
terraform taint <google_compute_instance.vm_instance>

# output terraform state or plan file in a human-readable form
# show what has been created
terraform show

# show output variable value
# useful for scripts to extract outputs from your configuration
terraform output [output name]

# Update variables
terraform refresh

# show objects being managed by state file
terraform state list

# destroy Terraform-managed infrastructure
terraform destroy [--auto-approve]

Syntax

Hashicorp configuration language, basic block syntax:

1
2
3
4
5
6
7
block_type label_one [label_two] {
key = value

embedded_block {
key = value
}
}

怎么知道resource的名称呢? find the provider, then search the resources: https://www.terraform.io/docs/configuration/resources.html 还有random provider,比如产生随机数.

Provider

Support mutliple providers, all written in Go. https://www.terraform.io/docs/providers/index.html

Provisioner

在deploy infrastructure 之后的配置操作,比如使用ansible or shell script as privisioners.

Provisioner can be ran in creation or destruction stage, you can also have multi-provisioner in one resources and they execute in order in resource.

Provisioner can be local or remote:

  • file: copy file from local to remove VM instance.
  • local-exec: executes a command locally on the machine running Terraform, not the VM instance itself.
  • remote-exec: executes on remote VM instance.

Terraform treats provisioners differently from other arguments. Provisioners only run when a resource is created, adding a provisioner does not force that resource to be destroyed and recreated. Use terraform taint to tell Terraform to recreate the instance.

Functions

https://www.terraform.io/docs/configuration/functions.html

You can experiment with functions in Terraform console, this can help with troubleshooting.

1
2
3
4
5
6
7
8
9
10
11
12
13
# first run terraform init
# it will auto load tfvars file and variables
terraform console
> lower("HELLO")
> merge(map1, map2)
> file(path)
> min(2,5,90)
> timestamp()
# modular
> 34 % 2
> cidrsubnet(var.network_address_space, 8, 0)
# lookup value in a map
> lookup(local.common_tags, "Bill", "Unknown")

For example:

1
2
3
4
5
6
7
variable network_info {
default = "10.1.0.0/16"
}

# split network range by adding 8 bits, fetch the first one subnet
# 10.1.0.0/24
cidr_block = cidrsubnet(var.network_info, 8, 0)

Resource arguments

https://www.terraform.io/docs/configuration/resources.html#meta-arguments common ones:

  • depends_on: make sure terraform creates things in right order
  • count: create similar resources
  • for_each: create resources not similar
  • provider: which provider should create the resource

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
resource "aws_instance" "web" {
# index start from 0
count = 3
tags {
Name = "web-${count.index + 1}"
}

depends_on = [aws_iam_role_policy.custom_name]
}

resource "aws_s3_bucket" "storage" {
for_each = {
food = "public-read"
cash = "private"
}
# access key and value
bucket = "${each.key}-${var.bucket_suffix}"
acl = each.value
}

Variables

Other way to use variables rather than specifying in single .tf file.

The scenario, we need development, QA(Quality Assurance)/UAT(User Acceptance Testing), production environment, how to implement with one configuration and multiple inputs?

The variable values can be from, precedence from low to high:

  • environment variable: TF_VAR_<var name>.
  • file: terraform.tfvars or specify by -var-file in terraform command.
  • terraform command flags -var.

You can override variables and precedence, select value based on environment, for example:

1
2
3
4
5
6
7
8
9
10
11
# specify default value in tf file
variable "env_name" {
type = string
default = "development"
}

# or specify in tfvars file
env_name = "uat"

# or specify in command line
terraform plan -var 'env_name=production'

Variable types:

  • string, the default type if no explicitly specified
  • bool: true, false
  • number (integer/decimal)
  • list (index start from 0)
  • map, value type can be number, string and bool

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
variable "project" {
type = string
}

variable "web_instance_count" {
type = number
default = 1
}

# list
variable "cidrs" { default = [] }

# map
variable "machine_types" {
# map type and key is string
type = map(string)
default = {
dev = "f1-micro"
test = "n1-highcpu-32"
prod = "n1-highcpu-32"
}
}

machine_resource = lookup(var.machine_types, var.environment_name)

In terraform, the same syntax ${} for interpolation as in bash:

1
2
3
4
5
6
7
8
9
10
# local variable definition
locals {
# random_integer is a terraform resource
tags = "${var.bucket_name_prefix}-${var.environment_tag}-${random_integer.rand.result}"
}

# use
resource "aws_instance" "example" {
tags = local.tags
}

Workspace

Workspace is the recommended way to working with multiple environments, for example:

  • state management
  • variables data
  • credentials management

State file example, we have dev, QA, prod three environments, put them each into separate folder, when run command, specify the input and output:

1
2
3
4
5
6
# for dev environment
# -state: where to write state file
# -var-file: load file
terraform plan -state="./dev/dev.state" \
-var-file="common.tfvars" \
-var-file="./dev/dev.tfvars"

Workspace example, there is a terraform.workspace built-in variable can be used to indicate the workspace currently in, then use it in map variable to select right value for different environment. (不用再去分别创建不同的folder for different environment了)

1
2
3
4
5
6
7
8
9
10
11
12
13
# create dev workspace and switch to it
# 类似于git branch
terraform workspace new dev
# show workspace
terraform workspace list
terraform plan -out dev.tfplan
terraform apply "dev.tfplan"

# now create QA workspace
terraform workspace new QA

# switch workspace
terraform workspace select dev

Special terraform variable to get workspace name

1
2
3
locals {
env_name = lower(terraform.workspace)
}

Managing secrets

Hashicorp Vault is for this purpose. it can hand over credentials from cloud provider to terraform and set ttl for the secrets.

Or you can use environment variable to specify the credentials, terraform will pick it automatically, but bear in mind to use the right env var name. For example:

1
2
3
# 注意这个和前面的TF_VAR_<var name> 不一样,这里是secret
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=xxx

Module

Make code reuse eaiser: https://www.terraform.io/docs/configuration/modules.html

Terraform registry, similar concept with Helm, Docker: https://registry.terraform.io/ Using module block to invoke local or remote modules.

  1. root module
  2. support versioning
  3. provider inheritance

Module components:

  • variables input
  • resources
  • output values (calling part will take this in)

Google Cloud Platform

Good tutorial: https://learn.hashicorp.com/tutorials/terraform/google-cloud-platform-build

If run terraform apply get permission issues, add the service account used to IAM, than grant it roles. Then retry the apply command.

用Terraform 建造的VM instance network 没有ssh allow firewall rule, 要自己添加: https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh

Terrafrom provisions GKE with additional node pool: https://learn.hashicorp.com/terraform/kubernetes/provision-gke-cluster

Resources

Commonly use reource types for terraform resource block:

  • google_compute_network
  • google_compute_instance
  • google_compute_address
  • google_storage_bucket
  • google_container_cluster
  • google_container_node_pool

Vagrant Jenkins server git repo for testing purpose: https://github.com/chengdol/vagrant-jenkins

Certified Jenkins Enginee

Certified Jenkins Engineer (CJE): https://github.com/bmuschko/cje-crash-course

Jenkins

Jenkins is not a build system, it is a structured model. Other interesting project: Tekton, Jenkins X (k8s involved)

Installing Jenkins, must have compatible openjdk installed.

1
2
## can use war file:
java -jar jenkins.war

or using rpm install for centos/redhat, then using systemctl to start/enable Jenkins https://www.jenkins.io/doc/book/installing/#red-hat-centos

1
2
## you will see the default jenkins port is 8080
ps aux | grep jenkinx

then you can open the web interface by <node ip>:8080. 如果发现是中文版,可能是浏览器的语言设置出了问题,改一下chrome的语言设置为English即可.

If wizard install failed with some plugins, you can fix this later in Manage Plugins.

Jenkins use file system to store everything, if using systemd, the configuration is in /var/lib/jenkins. You can backup this folder, or if you want to wipe it out, then run systemctl restart jenkins, then jenkins goes back to init state.

Even Jenkins UI to create project is not needed, you can mkdir and files in the Jenkins working directory, then go to Manage Jenkins -> Reload Configuration from Disk.

遇到一件很神奇的事情,有次Jenkins的Credentials配置消失了。。。重启也不见恢复,后来我直接stop daemon, 清空workspace下所有文件,再次重启初始化,就恢复了。后来想想我应该是在配置Credentials时把配置改错了,可以通过Manage Jenkins -> Configure Credentials 改回去。

Creating app build

freestyle project -> pipeline (series of freestyle project), freestyle project is not recommended.

Jenkins Workspace, you can see it in console output or click Workspace icon in your project dashboard. Everything running is in project workspace. Every build will override the previous one. You can use tar command to backup and restore this workspace, or clean workspace.

注意, 如果在Jenkins configuration中直接使用pipeline script 而不是 SCM, 是不会创建workspace的。

To store the build artifact, use Post-build Action in configure. for example, you want to archive some jar or zip files. then after build is done, these archives will show in the build page.

Build trend

Right to Build History list, there is a trend button, click it will see the build time history statistics and distribution.

Testing and Continuous integration

Now start the pipeline job type. After creating a pipeline job, you will see pipeline syntax button in the page bottom, it contains necessary resources to start. You can also use Copy from to copy a pipeline configure from another, for quick start.

Add slave nodes

Manage Jenkins -> Manage Nodes and Clouds To add slaves, usually use SSH to launch agent nodes. (如果node没有被发现,会显示错误,根据错误指示排查问题即可)

Before adding a slave node to the Jenkins master we need to prepare the node. We need to install Java on the slave node. Jenkins will install a client program on the slave node.

To run the client program we need to install the same Java version we used to install on Jenkins master. You need to install and configure the necessary tool in the slave node.

1
yum install -y java-1.8.0-openjdk

When configure add node agent, Host Key Verification Strategy:

Pipeline steps

This is scripted pipeline syntax, not recommended! Please use declarative pipeline directives! what are the differences between them: https://www.jenkins.io/doc/book/pipeline/#pipeline-syntax-overview

如果直接在Jenkins configure UI中设置Jenkins file,则常用的Steps (in snippet generator):

  • node: Allocate node Jenkins use master <-> agent model, you can configure tasks to be executed in agent node.

  • stage: stage

  • stash: stash some files to be used later in build

  • unstash: restore files previous stashed

  • parallel: execute in parallel (如果注册了多个slave nodes,则parallel会在上面执行并行的任务, 一般用在测试的时候,比如测试不同的环境和配置, see declarative pipeline demo code below)

  • git: Git

  • dir: Change Current Directory

  • sh: Shell Script

  • step: General Build Step

  • emailext: Extended Email

Triggering auto build

在pipeline configure中有Builder Triggers可以选择:

  • Build after other projects are built
  • Build periodically
  • Poll SCM
  • Disable this project
  • Quiet period
  • Trigger builds remotely

Email Notification

Using emailext: Extended Email,可以用groovy函数包装,传入email的subject以及内容再调用。

Managing plugins

Manage Jenkins -> Manage Plugins, then you can select and install plugin in Available section. For example:

  • Pipeline (如果这个没安装,则不会在UI中显示pipeline的动态流程图)
  • Html publisher (常用于发布unit test后的html report,这些html文件其实是被相关test生成好的, publisher then renders it)
    1
    2
    3
    4
    5
    6
    7
    publishHTML([allowMissing: true,
    alwaysLinkToLastBuild: true,
    keepAll: true,
    reportDir: "$WORKSPACE/cognitive-designer-api/DSJsonApiServletJUnitTests/build/reports/tests/payloadtests",
    reportFiles: 'index.html',
    reportName: 'Payload Test',
    reportTitles: ''])
  • Green Balls (show green color for success)
  • Blue Ocean (embedded site with new Jenkins UI)
  • Job DSL: Allowing jobs to be defined in a programmatic form in a human readable file.

Pipeline compatible plugins: https://github.com/jenkinsci/pipeline-plugin/blob/master/COMPATIBILITY.md

在初始化设置Jenkins的时候,有可能有plugins安装失败,可以自己在Manage plugin中安装,然后restart Jenkins (关于restart Jenkins请在没有job运行的情况下进行,不同的安装方式restart的方法不同,或者在安装plugin的时候选择restart jenkins after install), for example: systemctl restart jenkins, 这可以消除控制面板上的plugin failure警告。。

Continuous delivery

In Blue Ocean, you can run multiple builds in parallel. if more than one builds run in the same agent, the workplace path is distinguished by suffix (@count number). 但是能不能run multiple builds in one agent depends on how you design your pipeline and tasks.

Blue Ocean中的UI对parallel的显示也很直观,方便查看。

Trigger builds remotely

Set pipeline can be triggered remotely by URL, also optionally set pipeline trigger token in pipeline configure UI (can be empty).

You also need to know the user token, set from current user profile menu, you must keep the your user token to somewhere, for example, store it in credential secret text. So you can refer the token for example:

1
2
TRIGGER_USER = "chengdol.example.com"
TRIGGER_USER_TOEKN = credentials('<token id>')

Then in the upstream pipeline script, trigger other pipeline by running curl command:

1
2
3
4
## can be http or https connection
## --user is the jenkin user and its token or password
## token=${PIPELINE_TRIGGER_TOKEN} can be ignored if it's empty
curl --user ${TRIGGER_USER}:${TRIGGER_USER_TOEKN} --request POST http/https://<url>/job/${PIPELINE_NAME}/buildWithParameters?token=${PIPELINE_TRIGGER_TOKEN}\\&para1=val1\\&&para2=val2

You don’t need to specify all parameters in URL, the parameters default values will be used if they are not specified in the URL.

Notice that para1 and para2 must exist in parameters section of the triggered pipeline, otherwise you cannot use them. So far, based on testing, I can pass string, bool and file parameter types.

Check Status of Another pipeline

reference: https://gist.github.com/paul-butcher/dc68adc1c05ca970158c18206756dab1

1
curl --user ${LOGIN_USER}:${LOGIN_USER_TOEKN} --request GET http/https://<url>/job/${PIPELINE_NAME}/<build number>/api/json

Then you can parse the json returned: artifacts -> result: SUCCESS, FAILURE, ABORTED

Flyweight executor

Flyweight executor reside in Jenkins master, used to execute code outside of node allocation. Others are heavyweight executors. Flyweight executor will not be counted into executor capacity.

For example, we use flyweight executor for pause, in Jenkins script: https://stackoverflow.com/questions/44737036/jenkins-pipeline-with-code-pause-for-input

1
2
## see below declarative pipeline demo code
input "waiting for approval, move to staging stage..."

The job will be paused, you can go to Paused for Input to decide what to do next: proceed or abort. (In Blue Ocean, the pause interface is more clear)

Declarative pipeline

Please use Groovy style to wirte declarative pipeline!

github demo: https://github.com/sixeyed/jenkins-pipeline-demos

Declarative Pipelines: https://www.jenkins.io/doc/book/pipeline/syntax/

这里有关于declarative pipeline的介绍视频, Jenkins file lives in source control! https://www.jenkins.io/solutions/pipeline/,

Using Blue Ocean to setup pipeline from github need personal access token (must check out repo and user options): https://www.youtube.com/watch?v=FhDomw6BaHU

In Jenkins UI, go to pipeline syntax then declarative directive generator, it will help you generate pipeline code for declarative pipeline: https://www.jenkins.io/doc/book/pipeline/getting-started/#directive-generator

Jenkins parameters variables can be accessed by both params and env prefix, see this issue.

This is just a basic structure demo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
pipeline {
// default agent specify
agent any
// pipeline level env var, global scope
environment {
// you can put release number here
// referred by env.RELEASE
RELEASE = '1.1.3'
}
// can have multiple stages
stages {
// list tool version
stage('Audit tools') {
steps {
sh '''
git version
docker version
'''
}
}
stage('Build') {
// agent specify
agent any
// stage level env var, stage scope
environment {
USER = 'root'
}
steps {
echo "this is Build stage"
// executable in your repo
sh 'chmod +x ./build.sh'
// 把jenkins中一个名为api-key的密匙的值 放入 API_KEY这个环境变量中
// 且这个API_KEY仅在block中可见
withCredentials([string(credentialsId: 'api-key', variable: 'API_KEY')]) {
sh '''
./build.sh
'''
}
}
}
// can have different type
stage('Test') {
environment {
LOG_LEVEL = "INFO"
}
// parallel tasks
parallel {
// they can be running on different agent
// depends on you agent setting
stage('test1')
{
steps {
// show current stage name test1
echo "parallel ${STAGE_NAME}"
// switch to ./src directory
dir('./gradle') {
sh '''
./gradlew -p xxx test1
'''
}
}
}
stage('test2')
{
steps {
echo "parallel ${STAGE_NAME}"
}
}
stage('test3')
{
steps {
echo "parallel ${STAGE_NAME}"
}
}
}
}
stage('Deploy') {
// waiting for user input before deploying
input {
message "Continue Deploy?"
ok "Do it!"
parameters {
string(name: 'TARGET', defaultValue: 'PROD', description: 'target environment')
}
}
steps {
echo "this is Deploy with ${env.RELEASE}"
// groovy code block
// potential security hole, jenkins will not make it easy for you
script {
// you need to approve use of these class/method
if (Math.random() > 0.5) {
throw new Exception()
}
// you can use try/catch block for security reason
}
// if fail, this wouldn't get executed
// write 'passed' into file test-results.txt
writeFile file: 'test-results.txt', text: 'passed'
}
}
}

post {
// will always be executed
always {
echo "prints whether deploy happened or not, success or failure."
}
// others like: success, failure, cleanup, etc
success {
// archive files
archiveArtifacts 'test-results.txt'
// slack notifation
slackSend channel: '#chengdol-private',
message: "Release ${env.RELEASE}, success: ${currentBuild.fullDisplayName}."
}
failure {
slackSend channel: '#chengdol-private',
color: 'danger',
message: "Release ${env.RELEASE}, FAILED: ${currentBuild.fullDisplayName}."
}
}
}

if you don’t want to checkout SCM in stages that run in same agent, you can use this option:

1
2
3
options {
skipDefaultCheckout true
}

Configure slack

This is a slightly out-of-date video, 其实在各自pipeline中可以自己单独配置token以及选择channel。 https://www.youtube.com/watch?v=TWwvxn2-J7E

First install slack notifaction plugin. Then go to Manage Jenkins -> Configure System, scroll down to bottom you will see slack section, see question mark for explanation.

Then go to your target slack channel, select Add an app, search Jenkins CI, then add it to slack, follow the instructions to get the secret token, add this token to Jenkins credentials and use it in above slack configuration.

After all set, try Test connection, you will see message in your slack channel.

Reusable

Reusable functions and libraries are written in Groovy. https://www.eficode.com/blog/jenkins-groovy-tutorial

Let’s see some demos:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
pipeline {
agent any
// options
parameters {
booleanParam(name: 'RC', defaultValue: false, description: 'Is this a Release Candidate?')
}
environment {
VERSION = "0.1.0"
VERSION_RC = "rc.2"
}
stages {
stage('Audit tools') {
steps {
// call function
auditTools()
}
}
stage('Build') {
environment {
// call function
VERSION_SUFFIX = getVersionSuffix()
}
steps {
echo "Building version: ${VERSION} with suffix: ${VERSION_SUFFIX}"
sh 'dotnet build -p:VersionPrefix="${VERSION}" --version-suffix "${VERSION_SUFFIX}" ./m3/src/Pi.Web/Pi.Web.csproj'
}
}
stage('Unit Test') {
steps {
// switch directory
dir('./m3/src') {
sh '''
dotnet test --logger "trx;LogFileName=Pi.Math.trx" Pi.Math.Tests/Pi.Math.Tests.csproj
dotnet test --logger "trx;LogFileName=Pi.Runtime.trx" Pi.Runtime.Tests/Pi.Runtime.Tests.csproj
'''
mstest testResultsFile:"**/*.trx", keepLongStdio: true
}
}
}
stage('Smoke Test') {
steps {
sh 'dotnet ./m3/src/Pi.Web/bin/Debug/netcoreapp3.1/Pi.Web.dll'
}
}
stage('Publish') {
// condition
when {
expression { return params.RC }
}
steps {
sh 'dotnet publish -p:VersionPrefix="${VERSION}" --version-suffix "${VERSION_RC}" ./m3/src/Pi.Web/Pi.Web.csproj -o ./out'
archiveArtifacts('out/')
}
}
}
}

// groovy methods, can run straight groovy code
def String getVersionSuffix() {
if (params.RC) {
return env.VERSION_RC
} else {
return env.VERSION_RC + '+ci.' + env.BUILD_NUMBER
}
}

def void auditTools() {
sh '''
git version
docker version
dotnet --list-sdks
dotnet --list-runtimes
'''
}

Shared library

Demo structure and code: https://github.com/sixeyed/jenkins-pipeline-demo-library Invoking shared library at head of jenkins file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// this is dynamic reference, explicitly specify the library in jenkins file
library identifier: 'jenkins-pipeline-demo-library@master', retriever: modernSCM(
[$class: 'GitSCMSource',
remote: 'https://github.com/sixeyed/jenkins-pipeline-demo-library.git',
// if the repo is private, you can have credential here
credentialsId: '<credential id>'])

pipeline {
agent any
stages {
stage('Audit tools') {
environment {
// pass parameters as map
VERSION_SUFFIX = getVersionSuffix rcNumber: env.VERSION_RC, isReleaseCandidate: params.RC
}
steps {
auditTools()
}
}
}
}

You can add Global Pipeline Libraries in Configure Jenkins for any pipeline use. 这种情况下,可以设置默认的shared library,然后在jenkins file中直接调用相关函数。

Shared pipelines

You can put the shared pipeline into a shared library, for example:

1
2
3
4
5
6
library identifier: 'jenkins-pipeline-demo-library@master', 
retriever: modernSCM([$class: 'GitSCMSource', remote: 'https://github.com/sixeyed/jenkins-pipeline-demo-library.git'])

crossPlatformBuild repoName: 'sixeyed/pi-psod-pipelines',
linuxContext: 'm4',
windowsContext: 'm4'

Shared library is under vars folder. In Groovy, we can add a method named call to a class and then invoke the method without using the name call, crossPlatformBuild is actually the file name, inside file there is a call method.

Multi-branch pipeline

https://www.jenkins.io/doc/book/pipeline/multibranch/ Jenkins automatically discovers, manages and executes Pipelines for branches which contain a Jenkinsfile in source control.

Orphaned item strategy, for deleted branch, you can discard or reserve it.

Pipeline development tools

Validating pipeline syntax

First enable anonymous read access in Configure Global Security: https://www.jenkins.io/doc/book/pipeline/development/#linter

Issue curl command:

1
2
## if not success, it will show you the overall problems with your jenkins file
curl -X POST -F "jenkinsfile=<[jenkins file path]" http://<IP>:8080/pipeline-model-converter/validate

Visual Studio code has Jenkins linter plugin, you need to configure it with linter url.

Restart or replay

In every build interface, restart from stage, you can select which stage to restart (sometimes stage may fail due to external reason), replay, you can edit your jenkins file and library then rerun, the changes only live in current build (after succeed, check in your updates to source control).

Unit test

https://github.com/jenkinsci/JenkinsPipelineUnit

  • Supports running pipelines and library methods
  • Can mock steps and validate calls

Jenkins with Docker

学习步骤: 首先是agent 使用docker,然后master + agent 都使用docker, 最后交由K8s去管理。

这块很有意思,加入容器后就太灵活了,只要有build agent支持docker且已安装,则jenkins就可以把container运行在之上。 https://www.jenkins.io/doc/book/pipeline/docker/

https://hub.docker.com/r/jenkins/jenkins you can parallelly run several jenkins version in one machine, for purposes like testing new features, testing upgrade, so on and so forth. But you may need to customize the jenkins docker image and expose to different port.

  • containers as build agents
  • customizing the build container
  • using the docker pipeline plugin

Jenkins master and slave with docker: https://medium.com/@prashant.vats/jenkins-master-and-slave-with-docker-b993dd031cbd

From agent syntax: https://www.jenkins.io/doc/book/pipeline/syntax/#agent

1
2
3
4
5
6
7
8
9
10
11
12
agent {
docker {
image 'myregistry.com/node'
// can pass argument to docker run
args '-v /tmp:/tmp'
// the node must pre-configured to have docker
label 'my-defined-label'
// optional set the registry to pull image
registryUrl 'https://myregistry.com/'
registryCredentialsId 'myPredefinedCredentialsInJenkins'
}
}

If there is no label option, Jenkins will dynamically provisioned on a node and it will fail if no docker installed, you can set docker label to filter: https://www.jenkins.io/doc/book/pipeline/docker/#specifying-a-docker-label

I got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: https://stackoverflow.com/questions/47854463/docker-got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socke

还解决了一个权限的问题,在实验中container默认用户是jenkins,没有root权限无线运行container内部的程序,解决办法是 args '-u root'在上面的配置中。

此外jenkins会把workspace注入到container中,通过环境变量查找:

1
2
sh 'printenv'
sh 'ls -l "$WORKSPACE"

Additionally, you can use agent with Dockerfile and install Docker Pipeline plugin.

Concepts

Now, let’s understand the basic concepts of Helm: https://helm.sh/docs/intro/using_helm/

Official Document To install helm in the control node, download the corresponding binary and untar to execution path, or using container and mount necessary k8s credentials.

Package manager analogy:

  • helm (charts)
  • apt (deb)
  • yum (rpm)
  • maven (Jar)
  • npm (node modules)
  • pip (python packages)

Helm v3.2.0

Helm3 does not have Tiller server, see what’s new in Helm 3

Plugins

十六种实用的 Kubernetes Helm Charts工具

Tillerless

For helm2, Tiller server in cluster may not stable and secure, another workaround is run it locally, it talks to remote k8s cluster via kuebctl config.

1
2
## install tillerless plugin
helm plugin install https://github.com/rimusz/helm-tiller

A good practice is to have helm, helm plugin, kubectl and cloud SDK in one container, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
FROM python:3.7.7-alpine3.11

USER root

## version number
ENV KUBCTL_VER="1.16.11"
ENV KUBCTL="/bin/kubectl"
ENV HELM_VER="v2.17.0"
ENV HELMFILE_VER="v0.119.0"
ENV HELMDIFF_VER="v3.1.1"
ENV GCLOUD_SDK_VER="318.0.0"

# Install fetch deps
RUN apk add --no-cache \
ca-certificates \
bash \
jq \
git \
curl \
unzip \
tar \
libgit2 \
openssl-dev \
libffi-dev \
gcc \
musl-dev \
python3-dev \
make \
openssh \
tini \
shadow \
su-exec \
vim


RUN curl -L https://storage.googleapis.com/kubernetes-release/release/v${KUBCTL_VER}/bin/linux/amd64/kubectl \
-o $KUBCTL && chmod 0755 $KUBCTL

# install helm && plugins && helmfile
RUN curl -L https://storage.googleapis.com/kubernetes-helm/helm-${HELM_VER}-linux-amd64.tar.gz | tar xz \
&& cp linux-amd64/helm /bin && rm -rf linux-amd64 \
\
&& curl -O -L https://github.com/roboll/helmfile/releases/download/${HELMFILE_VER}/helmfile_linux_amd64 \
&& mv helmfile_linux_amd64 /usr/local/bin/helmfile && chmod 755 /usr/local/bin/helmfile


# Install GCloud SDK
RUN curl -L https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${GCLOUD_SDK_VER}-linux-x86_64.tar.gz | tar zx \
&& mv google-cloud-sdk /usr/local/ && /usr/local/google-cloud-sdk/install.sh -q \
&& /usr/local/google-cloud-sdk/bin/gcloud components install beta --quiet

ENV PATH="/usr/local/google-cloud-sdk/bin:${PATH}"

RUN helm init -c \
&& helm plugin install https://github.com/chartmuseum/helm-push \
&& helm plugin install https://github.com/databus23/helm-diff --version ${HELMDIFF_VER} \
&& helm plugin install https://github.com/rimusz/helm-tiller

#Test
Run kubectl version --client \
&& helm version --client \
&& helm plugin list \
&& helmfile -v \
&& gcloud version

ENTRYPOINT ["/bin/bash","-c", "tail -f /dev/null"]

Note that HELM_VER < 2.17.0 does not work anymore, the default stable repo is gone, so upgrade to 2.17.0 in environment variable.

Then run it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# go to tillerless folder that with the dockerfile above
docker build -f tillerless.dockerfile -t tillerless:1.0 .

docker run -d --name=xxx \
## something you want to mount
## gcloud config
-v ~/.config:/root/.config \
## kubectl connect
-v ~/.kube:/root/.kube \
-v $(pwd)/../envoy-proxy:/envoy-proxy \
# default workspace path
-w /envoy-proxy \
## tillerless env vars
## by default tiller uses secret
-e HELM_TILLER_STORAGE=configmap \
-e HELM_HOST=127.0.0.1:44134 \
--entrypoint=/bin/bash \
tillerless:1.0 \
-c "tail -f /dev/null"

When first time exec into docker container, run kubectl may not work, try exit out and run kubectl on host and exec log in again.

If switch k8s context, please stop and restart tillerless to adopt change.

1
2
3
4
5
6
7
8
9
10
## export if they are gone
export HELM_TILLER_STORAGE=configmap
export HELM_HOST=127.0.0.1:44134
## by default tiller namespace is kube-system
helm tiller start [tiller namespace]
helm list
helm install..
helm delete..

exit

or

1
2
3
4
5
6
7
8
9
10
## export if they are gone
export HELM_TILLER_STORAGE=configmap
export HELM_HOST=127.0.0.1:44134
## by default tiller namespace is kube-system
helm tiller start-ci [tiller namespace]
helm list
helm install..
helm delete..

helm tiller stop

or

1
helm tiller run <command>

Overview

helm3 does not have default repo, usually we use https://kubernetes-charts.storage.googleapis.com/ as our stable repo. helm2 can skip this as it has default stable repo.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
## add stable repo to local repo
## 'stable' is your custom repo name
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
## display local repo list
helm repo list
## remove repo 'stable'
helm repo remove stable

## create charts sacffold
helm create <chart name>

## install charts
## Make sure we get the latest list of charts
helm repo update
helm install stable/mysql --generate-name
helm install <release name> stable/mysql -n <namespace>
helm install <path to unpacked/packed chart>

## show status of your release
helm status <release name>

Whenever you install a chart, a new release is created. So one chart can be installed multiple times into the same cluster. Each can be independently managed and upgraded.

1
2
3
4
5
6
7
## show deployed release
helm ls -n <namespace>

## uninstall
## with --keep-history, you can check the status of release
## or even undelete it
helm uninstall <release name> [--keep-history] -n <namespace>

Install order

Install in certain order, click to see. Or you can split the chart into different part or using init container.

Chart file structure

https://helm.sh/docs/topics/charts/#the-chart-file-structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<chart name>/
Chart.yaml # A YAML file containing information about the chart
LICENSE # OPTIONAL: A plain text file containing the license for the chart
README.md # OPTIONAL: A human-readable README file
values.yaml # The default configuration values for this chart
values.schema.json # OPTIONAL: A JSON Schema for imposing a structure on the values.yaml file, values.yaml 必须遵守这个结构, 否则不会通过
charts/ # other dependent
requirements.yaml # other dependent (for helm2)
crds/ # Custom Resource Definitions
templates/ # A directory of templates that, when combined with values,
# will generate valid Kubernetes manifest files.
xxx.yaml
_xx.tpl # functions
NOTES.txt # show description after run helm install
templates/NOTES.txt # OPTIONAL: A plain text file containing short usage notes

To drop a dependency into your charts/ directory, use the helm pull command

  1. Chart.yaml apiVersion, helm3 is v2, helm2 is v1 appVersion, application verion version, charts version, for example, chart file/structure changed keywords field is used for helm search type, we have application and library chart

Managing dependencies

Package the charts to archive, you can use tar but helm has special command for this purpose:

1
2
3
4
## it will create .tgz suffix
## and append chart verion to archive name
## chart version is from Chart.yaml
helm package <chart_name>

Publishing chart in repos, chartmuseum (like docker hub…), just like private docker registry, you can create a private chartmuseum in your host (有专门的安装包).

1
2
3
4
5
6
7
8
9
## go to the dir that contains chart archive
## this will generate a index.yaml file
helm repo index .
## for security can be signed and verified
## for verification, we need provenance file
helm package --sign
helm verify <chart>
## verify when install
helm install --verify ...

关于dependency,甚至可以只有charts文件夹,里面放所有的chart archive,外面也不需要templates了。 但这样不好管理版本,还是在Chart.yaml中定义依赖比较好。 在定义中还可以指定版本的范围,用的是semver语法: ~1.2.3, ^0.3.4, 1.2-3.4.5

1
2
3
4
5
## will download dependency charts archive to your charts folder
## according to the definition in Chart.yaml
helm dependency update <chart name>
## list dependency, their version, repo and status
helm dependency list <chart name>

You can also use conditions and tags to control which dependency is needed or not, for example, in Chart.yaml file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: v2
name: guestbook
appVersion: "2.0"
description: A Helm chart for Guestbook 2.0
version: 1.2.2
type: application
dependencies:
- name: backend
version: ~1.2.2
repository: http://localhost:8080
condition: backend.enabled
tags:
- api
- name: frontend
version: ^1.2.0
repository: http://localhost:8080
- name: database
version: ~1.2.2
repository: http://localhost:8080
condition: database.enabled
tags:
- api

Then in values.yaml file:

1
2
3
4
5
6
7
## can be true or false
backend:
enabled: true
database:
enabled: true
tags:
api: true

Using existing charts

Helm web: https://hub.helm.sh/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## add and remove repo
helm add repo ...
helm remove repo ...
## list repo's name and URL
helm repo list

## search chart you want,for example
## mysql, nfs, mongodb, prometheus, redis, dashboard, wordpress

## for mysql, you need to specify storage provisioner
## see inspect readme or values
helm search [hub | repo] <keyword>

## inspect the chart, like docker inspect
## readme: usage 去网上看更清晰
## values: default config
## chart: Chart.yaml
helm inspect [all | readme | values | chart] <chart name>

## the same as 'helm inspect values'
helm show values

## download chart without dependencies
## for example, looking into the source code
helm fetch <chart name>

## download dependencies specified in Chart.yaml
## specify char name unpacked
helm dependency update <chart name>

Customizig existing charts

if you want to override child chart’s values.yaml, then in your partent chart values.yaml, 这是常用的,比如你有个dependency 是 mongodb chart, 要改它的默认配置:

1
2
3
4
## 'mongodb' is child chart name
mongodb:
persistence:
size: 100Mi

还可以child chart中的values.yaml override parent的,但很少这样用,用法很tricky.

Chart template guide

https://helm.sh/docs/chart_template_guide/getting_started/ Helm Chart templates are written in the Go template language, with the addition of 50 or so add-on template functions from the Sprig library and a few other specialized functions.

Template and values

https://helm.sh/docs/topics/charts/#templates-and-values

Where are the configuration values from, precdence low to high from top to bottom:

  1. values.yaml (default use)
  2. other-file.yaml: helm install -f <other-file.yaml> ...
  3. command: helm install --set key=val ...

Helm template built-in objects:

  1. Chart.yaml: .Chart.Name (use upper case)
  2. Release data: .Release.Name
  3. K8s data: .Capabilities.KubeVersion
  4. File data: .Files.Get. conf.ini
  5. Template data: .Template.Name

In values.yaml:

  1. use _ instead of -
  2. decimal number wrapped by "", "2.0", integer number no need

使用placeholder 是最基本的操作,let’s see functions and logic.

  1. use functions and pipelines, they are interchangeable https://helm.sh/docs/chart_template_guide/functions_and_pipelines/ commonly used functions and correspinding pipelines
1
2
3
4
5
6
7
8
9
10
11
      function usage         --       pipeline usage
================================================================
default default_value value -- value | default default_value
quote value -- value | quote
upper value -- value | upper
trunc value 20 -- value | trunc 20
trimSuffix "-" value -- value | trimSuffix "-"
b64enc value -- value | b64enc
randAlphaNum 10 -- value | randAlphaNum 10
toYaml value -- value | toYaml
printf format value -- list value | join "-"
  1. modify scope using with to simpify the directives,就不用写一长串引用了
  2. control whitespaces and indent use - to remove whitespace (newline is treated as white space!)
1
2
3
4
5
{{- with ... -}}
...
{{- end}}
## indent 6 space ahead
{{ indent 6 .Value.tcp }}
  1. logical operators and flow control if-else and loop
  2. use variables define the variable
1
2
3
4
{{- $defaultPortNum := .Values.defaultPortNum -}}
{{ $defaultPortNum }}
## . means global scope
{{ $.Release.Name}}
  1. use sub-template define function in _helper.tpl file then use include:
1
{{ include "fun_name" . | indent 4}}

Debug template

Locally rendering template: https://helm.sh/docs/helm/helm_template/ https://helm.sh/docs/chart_template_guide/debugging/

Usually first use the static check then dynamic check.

1
2
3
4
5
6
7
8
9
10
11
## static
## works without k8s cluster
## you can also specify which values yaml file
helm template <chart dir or archive file> [--debug] | less
## for helm2
helm template --tiller-namespace tiller --values ./xxx/values.second.yaml --debug <chart dir or archive file> |less

## dynamic
## real helm install but without commit
## can generate a release name as [release]
helm install [release] <chart> --dry-run --debug 2>&1 | less

Helm commands

https://helm.sh/docs/helm/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## install with specified release name
helm install [release name] [chart] -n <namespace> --values <path to values yaml>
## check release status
helm list -n <namespace>
## display yaml files
helm get manifest [release] -n <namespace> | less

## check release specification and revision numbers
helm status [release] -n <namespace>

## get all info
## helm2: helm get [release]
helm get all [release] -n <namespace>

## upgrade
helm upgrade [release] [chart] -n <namespace>
## check revision
helm history [release] -n <namespace>
## rollback
## revision number can get from helm history
helm rollback [release] [revision] -n <namespace>

## if abort the helm install, check helm list then uninstall the broken release
## helm2: helm delete --purge [release]
helm uninstall [release] -n <namespace>

PluralSight Supplement

github: https://github.com/phcollignon/helm3

Helm context

Helm use the same configuration as kubectl

1
2
3
4
5
6
## helm env, repos, config, cache info
helm env
## check helm version
helm version --short
## helm uses the same current context
kubectl config view

Helm stores release configuration and history in k8s as secrets. In helm3, it is stored in each corresponding namepsace.

1
2
3
4
## in your working namespace
kubectl get secret -n <ns>
## helm secret is something like:
sh.helm.release.v1.demomysql.v1 helm.sh/release.v1 1 110s

Improved Upgrade Strategy: 3-way Strategic Merge Patches In Helm3, Helm considers the old manifest, its live state, and the new manifest when generating a patch.

In helm2, helm client uses gRPC protocol to access Tiller server (in production secure connection is required, set TLS/SSL), then Tiller (need service account with privilege) will call K8s API to instantiate the charts. In helm3, no Tiller no security issue.

//TODO 这篇总结是来自PluralSight上的LPIC-1课程的Essential章节。 备注:2020年4月份pluralsight在搞活动,免费注册学习!这次lock down是个机会补补课。

Environment: CentOS 7 Enterprise Linux or RedHat.

Essentials

Reading OS data

1
2
3
4
5
6
7
8
9
# system version
# softlink actually
cat /etc/os-release
cat /etc/system-release
cat /etc/redhat-release

# kernel release number
uname -r
cat /proc/version

Shutdown

Send message to others

1
2
3
4
5
6
# send to individual user terminal
write dsadm
> xxx

# send to all user in terminals
wall < message.txt

Shutdown system and prompt

1
2
3
4
5
6
# reboot now
shutdown -r now
# halt/poweroff in 10 mins and use wall send message to login users
shutdown -h 10 "The system is going down in 10 min"
# cancel shutdown
shutdown -c

Changing runlevels what is runlevel in linux? https://www.liquidweb.com/kb/linux-runlevels-explained/ 比如 runlevel 1 就只能root user且没有network enabled,也叫作rescue.target,可以做一些需要隔离的操作。 runlevel 3 是默认的multi-user + network enabled (多数情况是这个状态) runlevel 5 是Desktop interface + runlevel 3的组合。

1
2
3
4
5
6
7
8
9
# show current runlevel
who -r
runlevel

# different systemd daemon can have differet target runlevel
# default runlevel
systemctl get-default
# set default runlevel
systemctl set-default multi-user.target

More about systemd, see my systemd blog.

Manage processes

1
2
3
4
5
6
7
8
9
10
11
12
# show process on current shell
# use dash is UNIX options
ps -f
# -e means all processes
ps -ef --forest
# -F show full format column
ps -F -p $(pgrep sshd)
# kill all sleep processes
pkill sleep

# BSD options
ps aux

$$ the PID of current running process

1
2
3
4
5
6
7
cd /proc/$$

# we can interrogate this directory
# current dir
ls -l cwd
# current exe
ls -l exe

top 命令的options还记得吗? 比如切换memory显示单位,选择排序的依据CPU/MEM occupied…

Process priority

if something runs in foreground and prevent you from doing anything, use ctrl+z to suspend it (still in memory, not takeing CPU time), then put it in background.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sleep 10000
^Z
[1]+ Stopped sleep 10000

# use job command, `+` means current focus
jobs
[1]+ Stopped sleep 10000

# use bg command to put current focus in background
bg
[1]+ sleep 10000 &

# check is running in background
jobs
[1]+ Running sleep 10000 &

# use fg will bring current focus to foreground again

如果你在一个bash shell中sleep 1000& 然后exit bash shell,则这个sleep process will hand over to init process. can check via ps -F -p $(pgrep sleep), 会发现PPID是1了。进入另一个bash shell jobs 并不会显示之前bash shell的background process.

1
2
3
4
5
6
7
8
# show PRI(priority) and NI(nice) number
ps -l

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 23785 23781 0 80 0 - 28891 do_wai pts/1 00:00:00 bash
0 S 0 24859 23785 0 80 0 - 26987 hrtime pts/1 00:00:00 sleep
0 S 0 24861 23785 0 80 0 - 26987 hrtime pts/1 00:00:00 sleep
...

PRI value for real time is from [60,99] and [100,139] for users, the bigger the better. NI value is from [-20,19], higher the nicer so less CPU time to take. 在相同PRI 之下,NI 决定了多少资源.

比如说你有一个build task并不urgent, 不想它在后台占用太多资源,可以设置nice value.

1
2
3
4
# set nice value to 19
nice -n 19 sleep 1000 &
# reset nice value
renice -n 10 -p <pid>

要注意的是只有root可以设置负数nice value和降低nice value. root可以去vim /etc/security/limits.conf设置对不同user/group的nice value。

Monitor linux performance

这个很重要,一般关注网络,硬盘,CPU

List content of the package procps-ng, procps is the package that has a bunch of small useful utilities that give information about processes using the /proc filesystem. The package includes the programs ps, top, vmstat, w, kill, free, slabtop, and skill.

1
2
3
4
5
6
7
8
9
10
11
12
13
# see executable files under procps package via rpm
rpm -ql procps-ng | grep "^/usr/bin/"

/usr/bin/free
/usr/bin/pgrep
/usr/bin/pkill
/usr/bin/pmap
...

# check the source package of top command
rpm -qf $(which top)

procps-ng-3.3.10-17.el7_5.2.x86_64

Introduce 2 new commands: pmap and pwdx

1
2
3
4
5
6
7
8
9
# pmap, show memory map of a process
# for example, current running process
pmap $$
# you can also see shared libary been used by the process

# show current working directory of process
pwdx $$
pwdx $(pgrep sshd)
# actually the output is from /proc/<pid>/cwd, it is a softlink

Load average analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# check how long the system has been running
# load average is not normalized for cpu number 如果你知道CPU有多少个
# 根据load average就能看出是不是很忙, 如果load average的值超出了CPU个数
# 则说明需要queue or wait
# 这个命令其实是从/proc/uptime, /proc/loadavg 来的数据
uptime
18:53:14 up 39 days, 3:50, 1 user, load average: 0.00, 0.01, 0.05

# check how many cpu
# the number of cpu is equal to processor number
# but you may have less cores, see /proc/cpuinfo
lscpu

# the same as w
w
18:59:29 up 12 days, 23:40, 3 users, load average: 0.04, 0.26, 0.26
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 9.160.1.111 08:47 6:46m 0.03s 0.03s -bash
...

监控load or output

1
2
3
4
5
6
7
# execute a program periodically, showing output fullscreen
# 这里的例子是每隔4秒 运行 uptime
watch -n 4 uptime

# graphic representation of system load average
# 如果此时运行一个tar,会看到loadavg显著变化
tload
1
2
3
4
5
6
7
8
9
10
11
12
# -b 使用batch mode 输出所有process情况
# -n2 运行2回合
top -b -n2 > file.txt

# run 3 time, gap 5 seconds
# reports information about processes, memory, paging, block IO, traps, disks and cpu activity
vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 520 90576 4176 1601932 0 0 4 188 18 19 0 1 93 4 2
0 0 520 90460 4176 1601956 0 0 0 46 514 348 0 0 98 2 1
0 0 520 88972 4176 1603692 0 0 0 542 707 589 0 1 97 2 1

sysstat toolkit

The package contains many performance measurement tools. Install sysstat (a bunch of command: iostat, netstat, etc).

1
2
3
4
5
6
7
8
9
10
11
12
13
yum install -y sysstat

# then check executable
rpm -ql | grep "^/usr/bin"

/usr/bin/cifsiostat
/usr/bin/iostat
/usr/bin/mpstat
/usr/bin/nfsiostat-sysstat
/usr/bin/pidstat
/usr/bin/sadf
/usr/bin/sar
/usr/bin/tapestat

The config file for sysstat can be found by:

1
2
3
# -q: query
# -c: config file
rpm -qc sysstat

在安装后,其实用的cron在背后操作收集数据, configuration is in file cat /etc/sysconfig/sysstat,这里面可以设置记录的周期,默认是28天。

1
2
3
4
5
6
7
8
# cron config for sysstat
cat /etc/cron.d/sysstat

# Run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 1 1
# 0 * * * * root /usr/lib64/sa/sa1 600 6 &
# Generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A

start and enable:

1
2
systemctl start sysstat
systemctl enable sysstat

来看看sysstat下的工具命令:

1
2
3
4
5
6
# show in mega byte
# run 3 times 5 seconds in between
iostat -m 5 3
# others
pidstat
mpstat

Let’s see sar(system activity report), gather statistics and historical data, 通过分析一天的bottleneck(cpu/memory/disk/network/loadavg)可以更好的schedule任务,比如发现某个时间cpu, memory的使用比较多。这里并没有深入讲解怎么解读这些数据,并且你需要了解各个部分数据的含义,以及什么样的数据可能是异常.

sar的数据在/var/log/sa里面,每天一个文件,周期性覆盖。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# sar specific processor, cpu 0/cpu1
# check %idle
sar -P 0/1

# default show CPU utilization
# %user: user space stuff
# %system: sytem space stuff
sar -u
# interval 1sec and show 5 times
sar -u ALL 1 5

# show memory utilization
sar -r

# show disk utilization
sar -b

# network activity
sar -n DEV

# load average
# interval 5sec and show 2 times
sar -q 5 2

# 显示sa23这天的文件,从18:00:00到19:00:00
sar -n DEV -s 18:00:00 -e 19:00:00 -f /var/log/sa/sa23

图形化sar数据,可以用ksar:https://www.cyberciti.biz/tips/identifying-linux-bottlenecks-sar-graphs-with-ksar.html

Log and logrotate

Auditing login events,这个还挺有用的,看哪个user什么时候login了, w是查看当前哪些user正在使用中。

1
2
3
4
5
6
7
8
9
10
11
12
# see user login info
lastlog | grep -v "Never"

Username Port From Latest
root pts/0 9.65.239.28 Fri Apr 24 17:51:48 -0700 2020
fyre pts/0 Fri Apr 24 17:52:00 -0700 2020

# check system reboot info
# The last command reads data from the wtmp log and displays it in a terminal window.
last reboot
# check still login user, the same as `w`
last | grep still

Auditing root access,看su/sudo的使用情况,在/var/log/secure文件中,这里其实有多个secure文件,有日期区分。

1
2
3
4
5
# there are some secure and auditing files
cd /var/log
# secure file
# 当然有grep也行,把用sudo的事件找出来
awk '/sudo/ { print $5, $6, $14 }' secure

我会专门总结一下awk的笔记,这个挺有用的。

journalctl是一个常用的system log查询工具。当时查看一些docker的log在里面也能看到。

1
2
3
4
5
6
7
8
9
# show last 10 lines
journalctl -n 10
# ses real time appending
journalctl -f
# -u: systemd unit
journalctl -u sshd
# timestamp
journalctl --since "10 minutes ago"
journalctl --since "2020-04-26 13:00:00"

Selinux

O’Reilly有过相关的课程,在我工作邮件中连接还在。目前只需要知道什么是selinux,如何打开,关闭它即可。 SELINUX= can take one of these three values: enforcing - SELinux security policy is enforced. permissive - SELinux prints warnings instead of enforcing. disabled - No SELinux policy is loaded.

1
2
3
4
# see if selinux is permissive, enforcing or disabled
getenforce
# more clear
sestatus

如果最开始是disabled的,则要去config file /etc/selinux/config 设置permissive,然后重启。 不能setenforce 去disable,也只能在config文件中disable然后重启机器。

1
2
3
4
# setenforce [ Enforcing 1| Permissive 0]
# 成为permissive后就可以用setenforce切换了,但都不是永久的
setenforce 0
setenforce 1

显示selinux的labels, flag Z对于其他命令也有用。

1
2
3
4
5
6
7
8
9
10
11
# see user selinux config
id -Z
# user, role, type
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
# see files selinux config
/bin/ls -Z
# see process selinux config
ps -Zp $(pgrep sshd)
LABEL PID TTY STAT TIME COMMAND
system_u:system_r:kernel_t:s0 968 ? Ss 0:00 /usr/sbin/sshd -D
unconfined_u:unconfined_r:unconfined_t:s0 1196 ? Ss 0:00 sshd: root@pts/0

Reference: https://docs.oracle.com/javase/tutorial/essential/exceptions/definition.html

Code that fails to honor the Catch or Specify Requirement will not compile.

Not all exceptions are subject to the Catch or Specify Requirement. To understand why, we need to look at the three basic categories of exceptions, only one of which is subject to the Requirement.

The Three Kinds of Exceptions

The first kind of exception is the checked exception.

Checked exceptions are subject to the Catch or Specify Requirement. All exceptions are checked exceptions, except for those indicated by Error, RuntimeException, and their subclasses.

The second kind of exception is the error. These are exceptional conditions that are external to the application, and that the application usually cannot anticipate or recover from. Errors are not subject to the Catch or Specify Requirement. Errors are those exceptions indicated by Error and its subclasses.

The third kind of exception is the runtime exception. Runtime exceptions are not subject to the Catch or Specify Requirement. Runtime exceptions are those indicated by RuntimeException and its subclasses.

Errors and runtime exceptions are collectively known as unchecked exceptions.

unchecked exception 也可以被catch,只要match就行,也可以被throws, but we don’t have to do that: https://stackoverflow.com/questions/8104407/cant-java-unchecked-exceptions-be-handled-using-try-catch-block

Better Understanding on Checked Vs. Unchecked Exceptions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
try
{
// code that could throw an exception
}
// check in order
catch (IOException | SQLException ex)
{
logger.log(ex);
throw ex;
}
catch (IndexOutOfBoundsException e)
{
System.err.println("IndexOutOfBoundsException: " + e.getMessage());
}
// The finally block always executes when the try block exits.
finally
{
if (out != null) {
System.out.println("Closing PrintWriter");
out.close();
} else {
System.out.println("PrintWriter not open");
}
}

finally block it allows the programmer to avoid having cleanup code accidentally bypassed by a return, continue, or break. Putting cleanup code in a finally block is always a good practice, even when no exceptions are anticipated.

The try-with-resources statement ensures that each resource is closed at the end of the statement. Any object that implements java.lang.AutoCloseable, which includes all objects which implement java.io.Closeable, can be used as a resource.

1
2
3
4
5
6
7
8
static String readFirstLineFromFile(String path) throws IOException 
{
try (BufferedReader br = new BufferedReader(new FileReader(path)))
{
// try block
return br.readLine();
}
}

Note: A try-with-resources statement can have catch and finally blocks just like an ordinary try statement. In a try-with-resources statement, any catch or finally block is run after the resources declared have been closed.

Throw exception

declare throws exception for method

1
public void writeList() throws IOException {}

throw an exception

1
2
3
4
5
public void test() {
if (size == 0) {
throw new EmptyStackException();
}
}

Create custom exception: https://www.baeldung.com/java-new-custom-exception

1
2
3
4
5
6
// create custom exception
public class IncorrectFileNameException extends Exception {
public IncorrectFileNameException(String errorMessage) {
super(errorMessage);
}
}

关于Deque,之前我的总结是,如果要当作Stack使用,则stick to Stack methods,比如push, pop, peek。如果当作Queue使用,则stick to Queue method,比如add/offer, poll/remove, peek。在实现上,我一般使用的是ArrayDeque, a resizable double-ended array。

今天突然想到一个问题,用Deque实现的Queue或者Stack,在使用enhanced for loop的时候,Java是怎么知道元素弹出的正确顺序呢? 或者如果混用Queue和Stack的方法,peek会弹出什么结果?iterator会给出什么顺序的结果呢?

我们来看看ArrayDeque的源码:

1
2
transient int head;
transient int tail;

这里有2个pointers, head和tail,当arraydeque是empty的时候,head和tail重叠。

对于push来说,移动的是head,head的值减1再使用,并且用的是modulus circularly decrement。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/**
* Pushes an element onto the stack represented by this deque. In other
* words, inserts the element at the front of this deque.
*
* <p>This method is equivalent to {@link #addFirst}.
*
* @param e the element to push
* @throws NullPointerException if the specified element is null
*/
public void push(E e)
{
addFirst(e);
}

/**
* Inserts the specified element at the front of this deque.
*
* @param e the element to add
* @throws NullPointerException if the specified element is null
*/
public void addFirst(E e)
{
if (e == null)
throw new NullPointerException();
final Object[] es = elements;
es[head = dec(head, es.length)] = e;
if (head == tail)
grow(1);
}

/**
* Circularly decrements i, mod modulus.
* Precondition and postcondition: 0 <= i < modulus.
*/
static final int dec(int i, int modulus)
{
if (--i < 0) i = modulus - 1;
return i;
}

对于add/offer来说,tail的值用了再加1,并且用的是modulus circularl increment。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/**
* Inserts the specified element at the end of this deque.
*
* <p>This method is equivalent to {@link #addLast}.
*
* @param e the element to add
* @return {@code true} (as specified by {@link Collection#add})
* @throws NullPointerException if the specified element is null
*/
public boolean add(E e)
{
addLast(e);
return true;
}

/**
* Inserts the specified element at the end of this deque.
*
* <p>This method is equivalent to {@link #add}.
*
* @param e the element to add
* @throws NullPointerException if the specified element is null
*/
public void addLast(E e)
{
if (e == null)
throw new NullPointerException();
final Object[] es = elements;
es[tail] = e;
if (head == (tail = inc(tail, es.length)))
grow(1);
}

/**
* Circularly increments i, mod modulus.
* Precondition and postcondition: 0 <= i < modulus.
*/
static final int inc(int i, int modulus)
{
if (++i >= modulus) i = 0;
return i;
}

peek总是从head pointer取值

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* Retrieves, but does not remove, the head of the queue represented by
* this deque, or returns {@code null} if this deque is empty.
*
* <p>This method is equivalent to {@link #peekFirst}.
*
* @return the head of the queue represented by this deque, or
* {@code null} if this deque is empty
*/
public E peek()
{
return peekFirst();
}

iterator总是从head pointer -> tail pointer

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Returns an iterator over the elements in this deque. The elements
* will be ordered from first (head) to last (tail). This is the same
* order that elements would be dequeued (via successive calls to
* {@link #remove} or popped (via successive calls to {@link #pop}).
*
* @return an iterator over the elements in this deque
*/
public Iterator<E> iterator()
{
return new DeqIterator();
}

所以现在情形就很清楚了,来看一个例子。首先按照顺序push [1 2 3], 这时peek是3,然后再add/offer [4 5 6], 这时peek还是3,然后iterator的结果是: [3 2 1 4 5 6]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Deque<Integer> dq = new ArrayDeque<>();
dq.push(1);
dq.push(2);
dq.push(3);
// now peek is 3
dq.add(4);
dq.add(5);
dq.add(6);
// now peek is still 3
for (int ele: dq)
{
System.out.println(ele);
}
// 3 2 1 4 5 6

0%