New features:

  • Using cockpit web interface
  • Using enhanced firewall: nftables
  • Managing NFSv4 with nfsconf
  • Layered storage management with stratis
  • Data de-duplication and compression with VDO (virtual data optimizer)

感觉资源挺分散的,不知道就是不知道😢 RedHat provides lab interactive learning exercise:

Openshift lab interactive exercise:

Fast Ping Test

This is the new shorthand format.

1
2
3
4
5
# last decimal represents 24 bits
# the same as 127.0.0.1
ping 127.1
# 1.0.0.1
ping 1.1

Cocopit Web Console

Available since CentOS 7.5. 相当于一个简化版的桌面. You can check logs, create account, monitor network, start services, and so on.

1
2
3
4
5
6
7
8
9
10
11
# install
sudo yum install -y cockpit-211.3-1.el8.x86_64

# start socket only not cockpit.service
sudo systemctl enable --now cockpit.socket
systemctl status cockpit.socket
# see port opened: 9090
sudo ss -tnlp

# still inactive
systemctl status cockpit.service

Set root password, user vagrant is privileged, run:

1
sudo passwd root

I have the port forwarding for 9090, view cockpit UI by localhost:9090, login as root user with the password you set. After login the cockpit.service is now active:

1
systemctl status cockpit.service

There is a terminal in web UI, you can work with it just like working on a normal ssh terminal.

The dashboard plugin:

1
2
3
4
5
6
# see plugins
# you can see yum packages installed and available
yum list cockpit*

yum info cockpit-dashboard
yum install -y cockpit-dashboard

With cockpit dashboard plugin installed, you can connect to remote machine (with cockpit installed and cockpit.socket running), dashboard is just like a control plane.

Other plugins like cockpit-machines is used to manage virtual guests.

Enhancing Firewall

RedHat 8 Getting start with nftables It is the designated successor to the iptables, ip6tables, arptables, and ebtables tools. Stick to one command, not using mixed. firewalld command can be replaced by nftables.

NFTables nft is the default kernel firewall in CentOS 8. Single command for IPV4, IPV6 ARP, and Bridge filters. nftables does not have any predefined tables, tables are created by firewalld or rely on our scripts.

First yum install nftables, run as sudo or root.

1
2
3
4
systemctl disable --now firewalld
reboot
# list all tables, nothing is there.
nft list tables

Now start and enable firewalld, the tables will be created:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
systemctl enable --now firewalld
# list all tables
nft list tables

table ip filter
table ip6 filter
table bridge filter
table ip security
table ip raw
table ip mangle
table ip nat
table ip6 security
table ip6 raw
table ip6 mangle
table ip6 nat
table bridge nat
table inet firewalld
table ip firewalld
table ip6 firewalld

Some common commands:

1
2
3
4
5
6
7
# list all tables
nft list tables

# list tables with specific protocol family
nft list tables ip
# check detail of ip filter
nft list table ip filter

Let’s see the demo code to build nftables:

  • create chains
  • create rules
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# disable firewalld
systemctl disable --now firewalld ; reboot
nft list tables
# inet will work both for ipv4 and ipv6
# create a new table `filter`
nft add table inet filter

# INPUT is the chain name, not necessarily call it INPUT
# here we add INPUT chain to inet filter table
nft add chain inet filter INPUT \
# basic chain type: filter, route, nat
# basic hook types: prerouting, input, forward, output, postrouting, ingress
# priority 0 ~ 100, 0 is hightest
{ type filter hook input priority 0 \; policy accept \;}

Add SSH inbound to our system, set rules:

1
2
3
4
5
6
7
8
# add rule to inet filter table INPUT chain
nft add rule inet filter INPUT iif lo accept
# allow traffic back to system with specified state
nft add rule inet filter INPUT ct state \
established,related accept
nft add rule inet filter INPUT tcp dport 22 accept
# drop everthing that is not explicitly defined
nft add rule inet filter INPUT counter drop

Persisting nftables rules

1
2
3
4
5
6
7
8
# store rules
nft list ruleset > /root/myrules
# clear table
nft flush table inet filter
# delete table
nft delete table inet filter
# restore rules
nft -f /root/myrules

Using systemd service unit:

1
2
3
4
5
# the systemd service unit for nftables use /etc/sysconfig/nftables.conf
nft list ruleset > /etc/sysconfig/nftables.conf
nft flush table inet filter
nft delete table inet filter
systemctl enable --now nftables

NFSv4

CentOS 8 uses NFSv4.2 as NFS server. The new tool nfsconf writes to the /etc/nfs.conf. Enable and use NFSv4 only and managing inbound TCP connections using firewall. SELinux NFS configuration.

Install nfs package for both server and clients:

1
yum install -y nfs-utils

The default will have NFSv2 disable and NFSv3 and above enabled, we will disable NFSv3 and have NFSv4 only with TCP port 2049 to be opened. 这样看来之前项目中的NFS 用的默认设置,并且不是secure的.

we can edit /etc/nfs.conf or using nfsconf commands:

1
2
3
4
5
nfsconf --set nfsd vers4 y
nfsconf --set nfsd tcp y
# close udp and nfsv3
nfsconf --set nfsd vers3 n
nfsconf --set nfsd udp n

Start nfs server daemon:

1
systemctl enable --now nfs-server.service

Check port opened:

1
2
3
4
5
6
7
8
ss -tlp -4

State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:sunrpc 0.0.0.0:* users:(("rpcbind",pid=8920,fd=4),("systemd",pid=1,fd=76))
LISTEN 0 128 0.0.0.0:mountd 0.0.0.0:* users:(("rpc.mountd",pid=8936,fd=8))
LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:* users:(("sshd",pid=917,fd=5))
LISTEN 0 128 0.0.0.0:54425 0.0.0.0:* users:(("rpc.statd",pid=8925,fd=9))
LISTEN 0 64 0.0.0.0:nfs 0.0.0.0:*

We don’t need sunrpc with NFSv4, mask them both service and socket:

1
systemctl mask --now rpc-statd rpcbind.service rpcbind.socket

Then we have nfs and mounted port only, only nfs port needs firewalld setting:

1
2
3
4
5
6
ss -tl -4

State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:mountd 0.0.0.0:*
LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:*
LISTEN 0 64 0.0.0.0:nfs 0.0.0.0:*

Let’s create some shared files:

1
2
3
4
5
mkdir /share
# copy *.txt under /usr/share/doc to /share
# {} represents the content find finds
# \; is used for find command, escape in bash
find /usr/share/doc -name '*.txt' -exec cp {} /share \;

Go to edit /etc/exports file

1
2
3
4
5
6
7
8
9
10
# Here only rw, in my previous work, we use (rw,insecure,async,no_root_squash)
# 这里其他默认设置够用了
/share *(rw)

# launch
exportfs -rav
# check options applied
exportfs -v

/share <world>(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

Configure firewall:

1
firewall-cmd --add-service=nfs --permanent

Then go to the client and mount /share folder. 后面讲了SElinux对NFS的支持,目前用不到, 也没明白.

Storage Management Stratis

Stratis resources:

视频中的讲解比RedHat的练习更好一些,在mount的时候,用的是/etc/fstab persistent configuration.

In creating filesystem, the author chooses xfs, so what is the difference comparing to ext4?

Virtual Data Optimizer

To make use of block level deduplication, compression, thin-provisioning to save space.

Example Use Case: To reduce the amount of operational and storage costs in data centers, we use the deduplication and compression features in VDO to decrease the footprint of data.

Revisit on 2023 March because of SPLA.

Intro

Cassandra quick started with single docker node. You can extend it to test any Cassandra client, for example gocql, build the app with golang image and bring it up in the same docker network.

Cassandra basics to understand consepts such as ring(cassandra cluster, masterless), horizontal scaling(aka scale-out), partitioning, RF(replication factor), CL(consistency level), quorum(RF/2+1), CAP(cassandar by default AP).

How the quorum is calculated.

Replication Strategy

Keyspace: a namespace that defines data replication on nodes, one keyspace may have mutliple related tables, the Replicatin Strategy is keyspace-wide:

  • simply strategy for single data center
  • network topology strategy for multiple data centers

Data center can have multiple racks.

For example, one data center has 2 racks, rack1 has 2 nodes, rack2 has one node, if the simply strategy is 1, then rack1 owns 50% data, each node in rack1 owns 25%, rack2 owns 50%, since rack2 only contains 1 node, so that node owns 50%.

Tunable Consistency

Coordinator Node: client connect to perform actions. Each connection to Cassandra may have a different coordinator node, any node can be the coordinator.

You can configure consistency on a cluster, datacenter, or per individual read or write operation. see this doc for details.

Consistency level for write:

  • ONE, TWO, THREE
  • QUORUM(majority of nodes succeeds)
  • ALL(must all good)
  • ANY(include coordinator itself).

Hinted Handoff: when one write node is unavaiable, the data is written to coordinator node, the coordinator node will try repeatedly write to the unavailable node until succeeded.

Consistency level for read: how many nodes to consult to return the most current data to caller.

  • SERIAL: see this doc
  • ONE, TWO, THREE
  • QUORUM(majority of nodes succeeds)
  • ALL (must all good)

Read Repair: 当对一个node写失败了but back online later,在read时如果有多个replicas 的数据可以参考,则对那个node可重新写入上次wirte失败的数据. Run nodetool repair periodically will resolve the inconsistencies in cluster.

Achieving strong consistency:

  • write consistency + read consistency > replication factor

Multiple data center consistency level:

  • EACH_QUORUM
  • LOCAL_QUORUM: local means current coordinator node data center
  • LOCAL_ONE: the same as ONE

Partition, Composite and Clustering Key

To correctly run cql, especially the order by, you need to understand how to define primary key and use partition, composite and clustering keys efficiently:

https://www.baeldung.com/cassandra-keys

CQL

A single Cassandra docker node is enough for CQL.

Cassandra Query Language:

With cqlsh script, you can specify remote Cassandra node with port to connect, by default it will connect to localhost 9042 port.

Keyspace -> Tables -> partitions -> row.

In brief, each table requires a unique primary key. The first field listed is the partition key, since its hashed value is used to determine the node to store the data. If those fields are wrapped in parentheses then the partition key is composite. Otherwise the first field is the partition key. Any fields listed after the primary key are called clustering columns. These store data in ascending or descending order within the partition for the fast retrieval of similar values. All the fields together are the primary key.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## help
cqlsh> help

Documented shell commands:
===========================
CAPTURE CLS COPY DESCRIBE EXPAND LOGIN SERIAL SOURCE UNICODE
CLEAR CONSISTENCY DESC EXIT HELP PAGING SHOW TRACING

## specified
cqlsh> help consistency;

CQL help topics:
================
AGGREGATES CREATE_KEYSPACE DROP_TRIGGER TEXT
ALTER_KEYSPACE CREATE_MATERIALIZED_VIEW DROP_TYPE TIME
ALTER_MATERIALIZED_VIEW CREATE_ROLE DROP_USER TIMESTAMP
ALTER_TABLE CREATE_TABLE FUNCTIONS TRUNCATE
ALTER_TYPE CREATE_TRIGGER GRANT TYPES
ALTER_USER CREATE_TYPE INSERT UPDATE
APPLY CREATE_USER INSERT_JSON USE
ASCII DATE INT UUID
BATCH DELETE JSON
BEGIN DROP_AGGREGATE KEYWORDS
BLOB DROP_COLUMNFAMILY LIST_PERMISSIONS
BOOLEAN DROP_FUNCTION LIST_ROLES
COUNTER DROP_INDEX LIST_USERS
CREATE_AGGREGATE DROP_KEYSPACE PERMISSIONS
CREATE_COLUMNFAMILY DROP_MATERIALIZED_VIEW REVOKE
CREATE_FUNCTION DROP_ROLE SELECT
DROP_TABLE SELECT_JSON

Create a keyspace with:

1
create keyspace pluralsight with replication = {'class':'SimpleStrategy', 'replication_factor':1};

Create a table in this keyspace with:

1
2
use pluralsight;
create table courses (id varchar primary key);

Optionally attempt to create the table again with:

1
create table if not exists courses (id varchar primary key);

(and note that you will not get an error as long as the ‘if not exists’ is present)

Add a few columns to the courses table with:

1
2
3
alter table courses add duration int;
alter table courses add released timestamp;
alter table courses add author varchar;

Add a comment to the table with:

1
alter table courses with comment = 'A table of courses';

View the complete table and all its default properties with:

1
2
-- describe
desc table courses;

Drop and recreate a more complete courses table with:

1
2
3
4
5
6
7
8
9
10
11
drop table courses;

create table courses (
id varchar primary key,
name varchar,
author varchar,
audience int,
duration int,
cc boolean,
released timestamp
) with comment = 'A table of courses';

(Note that when entering the lines as above cqlsh will automatically detect a multi-line CQL statement)

Exit cqlsh:

1
exit

Load course data by running a series of CQL commands from an external file

1
cat courses.cql | cqlsh

Verify that the CQL commands in the file were indeed executed:

1
2
3
use pluralsight;
desc tables;
select * from courses;

(The ‘desc tables’ should show a single ‘courses’ table, and the ‘select’ statement should show 5 rows of sample data.)

The ‘expand’ cqlsh command will display the query results in a ‘one column per line’ format:

1
2
3
4
-- pretty format
expand on;
select * from courses;
expand off;

You can display the time a piece of data was written with the ‘writetime’ function:

1
select id, cc, writetime(cc) from courses where id = 'advanced-javascript';

We can update this cc column with an ‘update’ statement:

1
update courses set cc = true where id = 'advanced-javascript';

Now re-run the select statement containing the ‘writetime’ function and notice that the time has changed. You can prove to yourself that this write time is stored on a per column basis by selecting this for a different column:

1
select id, name, writetime(name) from courses where id = 'advanced-javascript';

Note that this writetime value is the same as the one returned by our first ‘cc’ query.

Cassandra also provides a function for returning the token associated with a partition key:

1
select id, token(id) from courses;

If you try to select from a column other than the primary key, you’ll get an error:

1
select * from courses where author = 'Cory House';

(We’ll show how to do this in a later module.)

Let’s create a users table:

1
2
3
4
5
6
7
create table users (
id varchar primary key,
first_name varchar,
last_name varchar,
email varchar,
password varchar
) with comment = 'A table of users';

Then we’ll insert and “upsert” two rows of data:

1
2
3
insert into users (id, first_name, last_name) values ('john-doe', 'John', 'Doe');
update users set first_name = 'Jane', last_name = 'Doe' where id = 'jane-doe';
select * from users;

(Note that the net effect of the insert and update are the same.)

Now we’ll add a new ‘reset_token’ column to this table, and add a value to this column with a TTL:

1
2
alter table users add reset_token varchar;
update users using ttl 120 set reset_token = 'abc123' where id = 'john-doe';

We can retrieve the time remaining for a ttl with the ‘ttl’ query function:

1
select ttl(reset_token) from users where id = 'john-doe';

We can turn on tracing and do a select to see that there are currently no tombstones:

1
2
tracing on;
select * from users where id = 'john-doe';

(Re-run this several times until the 2 minutes have elasped and the token_value will be gone, and tracing will show a tombstone.)

Turn off tracing:

1
tracing off;

Create a ratings table with two counter columns:

1
2
3
4
5
create table ratings (
course_id varchar primary key,
ratings_count counter,
ratings_total counter
) with comment = 'A table of course ratings';

Now let’s increment both counter columns to represent receiving a new course rating of 4:

1
2
update ratings set ratings_count = ratings_count + 1, ratings_total = ratings_total + 4 where course_id = 'nodejs-big-picture';
select * from ratings;

(The select should show the data we just upserted.)

Now let’s add a second course rating of 3:

1
2
3
update ratings set ratings_count = ratings_count + 1, ratings_total = ratings_total + 3 where course_id = 'nodejs-big-picture';
select * from ratings;
exit

This should show the new values of “2” and “7” for ratings_count and ratings_total respectively.

Drop and re-create “ratings” to use with the “avg” aggregate function

1
2
3
4
5
6
7
8
drop table ratings;

create table ratings (
course_id varchar,
user_id varchar,
rating int,
primary key (course_id, user_id)
);

Insert a few sample ratings

1
2
3
4
insert into ratings (course_id, user_id, rating) values ('cassandra-developers', 'user1', 4);
insert into ratings (course_id, user_id, rating) values ('cassandra-developers', 'user2', 5);
insert into ratings (course_id, user_id, rating) values ('cassandra-developers', 'user3', 4);
insert into ratings (course_id, user_Id, rating) values ('advanced-python', 'user1', 5);

You can select the average for a single course (across users):

1
2
select course_id, avg(rating) from ratings where course_id = 'cassandra-developers';
select course_id, avg(rating) from ratings where course_id = 'advanced-python';

However, you can’t apply aggregate functions across partition keys:

1
select course_id, avg(rating) from ratings;  -- incorrect results

Multi-Row Partition

Composite Key

Previously we only have one primary key in table, that primary is the partition key. But it could be:

1
2
-- composite key
PRIMARY KEY (partition_key, clustering_key, ...)

partition_key can also be composite.

There is no join operation in Cassandra.

Drop this table and create a new one to hold both course and module data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
drop table courses;
create table courses (
id varchar,
name varchar,
author varchar,
audience int,
duration int,
cc boolean,
released timestamp,
module_id int,
module_name varchar,
module_duration int,
primary key (id, module_id)
) with comment = 'A table of courses and modules';

Insert data for the course, plus the first two modules

1
2
3
4
5
insert into courses (id, name, author, audience, duration, cc, released, module_id, module_name, module_duration)
values ('nodejs-big-picture','Node.js: The Big Picture','Paul O''Fallon', 1, 3240, true, '2019-06-03',1,'Course Overview',70);

insert into courses (id, name, author, audience, duration, cc, released, module_id, module_name, module_duration)
values ('nodejs-big-picture','Node.js: The Big Picture','Paul O''Fallon', 1, 3240, true, '2019-06-03',2,'Considering Node.js',900);

Select the data we just inserted

1
2
3
-- get same result
select * from courses;
select * from courses where id = 'nodejs-big-picture';

Now we can include both id and module_id in our where clause

1
select * from courses where id = 'nodejs-big-picture' and module_id = 2;

We can’t select by just module, unless we enable ‘ALLOW FILTERING’

1
2
3
-- if no partition_key, performance downgrade
select * from courses where module_id = 2; // fails
select * from courses where module_id = 2 allow filtering; // succeeds

Now insert the remaining modules for the course

1
2
3
4
5
6
7
8
insert into courses (id, name, author, audience, duration, cc, released, module_id, module_name, module_duration)
values ('nodejs-big-picture','Node.js: The Big Picture','Paul O''Fallon', 1, 3240, true, '2019-06-03', 3, 'Thinking Asynchronously', 1304);

insert into courses (id, name, author, audience, duration, cc, released, module_id, module_name, module_duration)
values ('nodejs-big-picture','Node.js: The Big Picture','Paul O''Fallon', 1, 3240, true, '2019-06-03', 4, 'Defining an Application and Managing Dependencies', 525);

insert into courses (id, name, author, audience, duration, cc, released, module_id, module_name, module_duration)
values ('nodejs-big-picture','Node.js: The Big Picture','Paul O''Fallon', 1, 3240, true, '2019-06-03', 5, 'Assembling a Development Toolset', 489);

We can also use module_id as part of an “in” clause

1
select * from courses where id = 'nodejs-big-picture' and module_id in (2,3,4);

And we can order by module_id

1
select * from courses where id = 'nodejs-big-picture' order by module_id desc;

We can “select distinct” just the id, but not the id and course name:

1
2
select distinct id from courses;         // succeeds
select distinct id, name from courses; // fails

Static Columns

Static Columns are static within the partition. Its the common data in a partition.

From cqlsh, drop and recreate the courses table, using static columns

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
use pluralsight;
drop table courses;
create table courses (
id varchar,
name varchar static,
author varchar static,
audience int static,
duration int static,
cc boolean static,
released timestamp static,
module_id int,
module_name varchar,
module_duration int,
primary key (id, module_id)
) with comment = 'A table of courses and modules';

Insert just the course data, and select it back

1
2
3
4
insert into courses (id, name, author, audience, duration, cc, released)
values ('nodejs-big-picture','Node.js: The Big Picture','Paul O''Fallon', 1, 3240, true, '2019-06-03');

select * from courses where id = 'nodejs-big-picture';

Now insert the module data for the first two modules

1
2
3
4
5
insert into courses (id, module_id, module_name, module_duration)
values ('nodejs-big-picture',1,'Course Overview',70);

insert into courses (id, module_id, module_name, module_duration)
values ('nodejs-big-picture',2,'Considering Node.js',900);

Selecting from courses now returns both course and module data in each row

1
2
select * from courses where id = 'nodejs-big-picture';
select * from courses where id = 'nodejs-big-picture' and module_id = 2;

Insert the third module, but also change the name of the course. Select all rows to show the course title changed everywhere.

1
2
3
4
insert into courses (id, name, module_id, module_name, module_duration)
values ('nodejs-big-picture', 'The Big Node.js Picture', 3, 'Thinking Asynchronously', 1304);

select * from courses where id = 'nodejs-big-picture';

Insert the fourth module, and fix the course name

1
2
insert into courses (id, name, module_id, module_name, module_duration)
values ('nodejs-big-picture', 'Node.js: The Big Picture', 4, 'Defining an Application and Managing Dependencies', 525);

Insert the remaining course module

1
2
insert into courses (id, module_id, module_name, module_duration)
values ('nodejs-big-picture', 5, 'Assembling a Development Toolset', 489);

The ‘in’ and ‘order by’ clauses work the same as before

1
2
3
select * from courses where id = 'nodejs-big-picture' and module_id in (2,3,4);

select * from courses where id = 'nodejs-big-picture' order by module_id desc;

Select course info, repeated based on the number of modules in the course

1
select id, name, author, audience, duration, cc, released from courses;

Now “select distinct” course info and only get one row back

1
select distinct id, name, author, audience, duration, cc, released from courses;

Select just the module information for the course

1
select module_id, module_name, module_duration from courses where id = 'nodejs-big-picture';

Load module-level course data by running a series of CQL commands from an external file

1
cat data/courses2.cql | cqlsh

Select module information for the ‘advanced-javascript’ course

1
2
use pluralsight;
select module_id, module_name, module_duration from courses where id = 'advanced-javascript';

Select module information for the ‘docker-fundamentals’ course

1
select module_id, module_name, module_duration from courses where id = 'advanced-python';

Select just the course-level information for all 5 courses

1
select distinct id, name, author from courses;

Time Series Data

Launch our one Cassandra node and (when it’s ready) load our sample course data

1
cat data/courses2.cql | cqlsh

From cqlsh, create a new table to hold course page views

1
2
3
4
5
6
use pluralsight;
create table course_page_views (
course_id varchar,
view_id timeuuid,
primary key (course_id, view_id)
) with clustering order by (view_id desc);

Insert a row into this table, using “now()” to create a timeuuid with the current date/time. Include a one year TTL.

1
2
insert into course_page_views (course_id, view_id)
values ('nodejs-big-picture', now()) using TTL 31536000;

Insert another row into the table with a manually generated v1 UUID (also with a TTL)

1
2
insert into course_page_views (course_id, view_id)
values ('nodejs-big-picture', bb9807aa-fb68-11e9-8f0b-362b9e155667) using TTL 31536000;

Insert two more rows using “now()”

1
2
3
4
5
insert into course_page_views (course_id, view_id)
values ('nodejs-big-picture', now()) using TTL 31536000;

insert into course_page_views (course_id, view_id)
values ('nodejs-big-picture', now()) using TTL 31536000;

Select the rows, and then use dateOf() to extract the date/time portion of the view_id

1
2
select * from course_page_views;
select dateOf(view_id) from course_page_views where course_id = 'nodejs-big-picture';

Reverse the date order of the results

1
select dateOf(view_id) from course_page_views where course_id = 'nodejs-big-picture' order by view_id asc;

Select only those dates based on Timeuuids that span a 2 day range

1
2
3
4
5
select dateOf(view_id) from course_page_views where course_id = 'nodejs-big-picture'
and view_id >= maxTimeuuid('2019-10-30 00:00+0000')
and view_id < minTimeuuid('2019-11-02 00:00+0000');

-- adjust these dates as necessary to match a more current date range

Truncate the table, and add a static column

1
2
truncate course_page_views;
alter table course_page_views add last_view_id timeuuid static;

Now insert three rows, using “now()” for both Timeuuids (with TTLs)

1
2
3
4
5
6
7
8
insert into course_page_views (course_id, last_view_id, view_id)
values ('nodejs-big-picture', now(), now()) using TTL 31536000;

insert into course_page_views (course_id, last_view_id, view_id)
values ('nodejs-big-picture', now(), now()) using TTL 31536000;

insert into course_page_views (course_id, last_view_id, view_id)
values ('nodejs-big-picture', now(), now()) using TTL 31536000;

Selecting all rows shows different view_ids but the same last_view_id for all rows

1
select * from course_page_views;

Use ‘select distinct’ to get just the latest page view for this course

1
select distinct course_id, last_view_id from course_page_views;

For just one course, this can also be accomplished with the view_id and a LIMIT clause

1
select course_id, view_id from course_page_views where course_id = 'nodejs-big-picture' limit 1;

However, a ‘limit’ won’t work across multiple courses. Insert multiple views for another course.

1
2
3
4
5
6
7
8
insert into course_page_views (course_id, last_view_id, view_id)
values ('advanced-javascript', now(), now()) using TTL 31536000;

insert into course_page_views (course_id, last_view_id, view_id)
values ('advanced-javascript', now(), now()) using TTL 31536000;

insert into course_page_views (course_id, last_view_id, view_id)
values ('advanced-javascript', now(), now()) using TTL 31536000;

Select latest view_id from each course, using the limit clause

1
2
select course_id, view_id from course_page_views where course_id = 'nodejs-big-picture' limit 1;
select course_id, view_id from course_page_views where course_id = 'advanced-javascript' limit 1;

Retrieve the latest course page view for all courses with ‘select distinct’ and the static column

1
select distinct course_id, last_view_id from course_page_views;

Select all the individual views for each course, one at a time

1
2
select course_id, view_id from course_page_views where course_id = 'nodejs-big-picture';
select course_id, view_id from course_page_views where course_id = 'advanced-javascript';

课程后面的东西目前用不到,到时候再接着看。

An opinionated CI/CD platform built on top of Kubernetes.

Introduction from Cloudbees:

Terraform is recommended going forward. K8s cluster must be compatiable with jx.

Prerequisites:

  • Know how classic Jenkins pipeline works
  • Know how to operate on Kubernetes
  • Know how to run Docker container
  • Know how to use Helm

All JX projects are deployed to Kubernetes using Docker and Helm.

Aerial View

Jenkin X: https://jenkins-x.io/

Traditional CI/CD pipeline like classic Jenkins they require heavy customization, and not cloud-native.

Jenkins X is opinionated and cloud-native CI/CD pipeline built on top of Kubernetes. Jenkins X uses Tekton (a Kubernetes-native pipeline engine) to do the job.

Setup Jenkins X

Create a cluser with Terraform on GCP.

Steps please see my github repository: https://github.com/chengdol/jenkins-X-deployment

First install gcloud SDK and Terraform at local.

1
2
3
4
## init the environment, configuration and link to the target gcloud project
gcloud init
## we need kubectl to interact with K8s cluster
gcloud components install kubectl

Then create main.tf file to provision jenkins X cluster on GCP, go to https://registry.terraform.io/ and search jx.

1

Create First App on Jenkins X Pipeline

Key components:

  • Applciation code source
  • Docker file: All services store in docker image
  • Helm chart: All docker images wrapped in helm packages
  • Jenkins X file: Defines the build pipeline
1
jx create quickstart

JX commands recap:

1
2
3
4
5
6
7
8
9
10
## out of box workflow for many language projects
jx create qucikstart
## import an existing project to jx
jx import

## watch pipeline for a project
jx get activity -f <project name> -w

## view logs for a build pipeline
jx get build logs <project name>

Environment with GitOps

Introduce GitOps:

Jenkins X adopts GitOps, where Git is the single-source-of-truth for our environment.

1
2
3
4
5
6
## list all available environment
jx get env
## create new env
jx create env

jx promote

Pull Requests and ChatOps

Jenkins streamlines the pull request workflow.

Prow: https://jenkins-x.io/docs/reference/components/prow/

  • Kubernetes CI/CD system
  • Orchestrates Jenkins X pipelines via GitHub events
  • Automates interactions with pull requests Enables
  • ChatOps driven development
  • GitHub only, to be superseded by Lighthouse

Github webhook will call Prow, the webhook is actuall a HTTP POST request that contains the event payload, then Prow will execute pipeline. Conversely, Prow can call Github API.

Introduce ChatOps:

1
2
3
jx create pullrequest
jx get previews
jx delete previews

Creating Custom QucikStart and Build Packs

Build Packs: https://buildpacks.io/, the Jenkins X project template, powered by Draft, contains:

  • Dockerfile
  • Production and Preview Helm charts
  • Jenkins X pipeline

Quick start workflow: jx create quickstart -> choose quick start projects -> generate Vanilla project -> detects Build packs -> modifies Jenkins X project

So bascially, we use quickstart to create a vanilla project skeleton, then Jenkins X-ify this project by build packs (generate languange specific Jenkins X template files). So jx import existing project also use build packs to do the job.

1
2
3
4
5
6
7
8
9
10
11
## list repositories containing quickstarts
jx get quickstartlocation

## create new quickstart repository
jx create quickstartlocation

## delete new quickstart repository
jx delete quickstartlocation

## edit current buildpack location
jx edit buildpack

Customize Jenkins X Pipeline

Jenkins X file definition has YAML file structure, make use of inheritance to reduce repetition.

1
2
3
4
5
## validate pipeline syntax
jx step syntax validate pipeline

## show full pipeline
jx step syntax effective

Versioning and Releasing App

Jenkins X adopts Semantic versioning, split into 3 components.

  • Major: breaking changes
  • Minor: non-breaking changes
  • Patch: bug fixes

Patch number is auto-incremented, Major/Minor versions are set manually.

Custom Domain and TLS

nip.io, the IP with it is not human readable and prevents TLS, insecure. 比如外界访问部署后的应用。 Jenkins X allows custom domains and HTTPS

External DNS:

  • Makes Kubernetes resources discoverable on public DNS servers
  • Automates creation of DNS records

Certificate Manager:

  • Issues SSL certificates for our applications
  • Leverages Lets Encrypt behind the scenes

First buy domain from Google Domains web site then use it in Google Cloud settings.

Course git repo, this repo has useful Vault commands: https://github.com/ned1313/Getting-Started-Vault

My Vault vagrant demo: https://github.com/chengdol/InfraTree/tree/master/vagrant-vault

Questions

Vault vs K8s secrets? Examples of what Vault can do that k8s secrets cannot: With Vault you can rotate secrets and have secrets with short TTL With Vault you can access secrets across namespaces (or outside the k8s cluster) Vault can provide a PKI for signing certs (enabling for example automation of cert generation for mtls) Vault can use LDAP, oauth, IAM, etc as identity providers

Introduction

Vault web site: https://www.vaultproject.io/

Secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data using a UI, CLI, or HTTP API. 注意API的path,并不是和UI上的path一样!

Vault works well with Consul, for example, set Consul as storage backend.

Start a Vault server:

1
2
3
## same as consul in development mode, don't use this in production!
## 0.0.0.0:8200, used for vagrant port fordwarding access from host
vault server -dev-listen-address 0.0.0.0:8200 -dev

Output as below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WARNING! dev mode is enabled! In this mode, Vault runs entirely in-memory
and starts unsealed with a single unseal key. The root token is already
authenticated to the CLI, so you can immediately begin using Vault.

You may need to set the following environment variable:

$ export VAULT_ADDR='http://0.0.0.0:8200'

The unseal key and root token are displayed below in case you want to
seal/unseal the Vault or re-authenticate.

## these two are critical
Unseal Key: pLNBmZQRHvspdy5unZcTjm1jOVQ81Z0pO6ywYHNP1zQ=
Root Token: s.hfnmMkgG7cggWDOmPfHC1jIe

Development mode should NOT be used in production installations!

unseal key在production mode中用来解除对Vault server的封锁,否则无法login as root.

注意,这里VAULT_ADDR是本地的,vault server当然可以在其他地方,设置对应的地址即可.

1
2
3
4
5
6
## Vault API access
export VAULT_ADDR='http://0.0.0.0:8200'
export VAULT_TOKEN=s.ttlDcetbJe3uLt0FF5rSidg3

## login to vault server need VAULT_TOKEN to login
vault login

Vault web UI access: http://localhost:8200

So, just like Consul, there are 3 ways to interact with Vault server: UI, API, CLI (actually running API under the hood).

secret is a pre-existing secret engine folder in Vault storage path, you can see it in UI:

1
2
3
4
5
6
#Write a secret
vault kv put secret/hg2g answer=42
#For Linux
# marvin.json is a json file
curl --header "X-Vault-Token: $VAULT_TOKEN" --request POST \
--data @marvin.json $VAULT_ADDR/v1/secret/data/marvin

marvin.json is as follow:

1
2
3
4
5
6
{
"data": {
"paranoid": true,
"status": "bored"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#Get a secret
vault kv get secret/hg2g
#specify format
vault kv get -format=json secret/hg2g
vault kv get -format=yaml secret/hg2g
#For Linux
#Install jq if necessary
sudo yum install jq -y
curl --header "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/secret/data/marvin | jq

#Put a new secret in and a new value for an existing secret
vault kv put secret/hg2g answer=54 ford=prefect
vault kv get secret/hg2g

#Delete the secrets
vault kv delete secret/hg2g
vault kv get secret/hg2g

#For Linux
curl --header "X-Vault-Token: $VAULT_TOKEN" --request DELETE $VAULT_ADDR/v1/secret/data/marvin

Working with Secrets

Secret lifecycle:

  • Create
  • Read
  • Update
  • Delete (soft or hard unrecoverable)
  • Destroy

There is version 1 and version 2 of secret engine server, version 2 is more recoverable and versioning but less performance then version 1 if you need to scale. secret folder 就是默认创建的secret engine, version 2, 可以去UI 查看configuration.

Demo code about secrets lifecycle: https://github.com/ned1313/Getting-Started-Vault/blob/master/m3/m3-secretslifecycle.sh

Everytime you update the key, the version increment by 1:

1
2
3
4
## pick value by version
vault kv get -version=3 secret/hg2g
## from API
curl -X GET --header "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/secret/data/hg2g?version=3 | jq .data.data

If you delete version 3, you still can get version 1 or 2:

1
2
3
4
5
6
7
8
9
10
vault kv delete secret/hg2g
vault kv get -version=2 secret/hg2g
## undelete version 3
## -versions not -version, you can undelete multiple: -versions=2,3
vault kv undelete -versions=3 secret/hg2g

## API
#For Linux
curl --header "X-Vault-Token: $VAULT_TOKEN" --request POST \
$VAULT_ADDR/v1/secret/undelete/hg2g --data '{"versions": [2]}'

Destroy, can no longer undelete:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#Destroy the secrets
vault kv destroy -versions=1,2 secret/hg2g

#For Linux
## metadata is still there
curl --header "X-Vault-Token: $VAULT_TOKEN" --request POST \
$VAULT_ADDR/v1/secret/destroy/hg2g --data '{"versions": [1,2]}'

#Remove all data about secrets
vault kv metadata delete secret/hg2g
vault kv get secret/hg2g

#For Linux
curl --header "X-Vault-Token: $VAULT_TOKEN" --request DELETE \
$VAULT_ADDR/v1/secret/metadata/hg2g

Create New Secret Engine

Demo about secret engine, 我这里就不贴出来了, 如果用API,还有一些配置用的json文件,见这个的上层目录中的比如dev-b.json: https://github.com/ned1313/Getting-Started-Vault/blob/master/m3/m3-secretengine.sh

Create Mysql DB Secret Engine

Vault official reference: https://www.vaultproject.io/docs/secrets

Vault在这里类似一个中间层,在user 和 Mysql instance之间,保存和传递credential和请求. 这个demo在AZure上spin up了一个bitnami Mysql instance,和本地的vault mysql secret engine关联,然后通过Vault联系Mysql 产生一个dynamic user,并授予这个临时user权限,我们通过这个临时user就可以操作Mysql 数据库了。

这个dynamic user lifecycle also managed by Vault, as well as the permission the user has. https://github.com/ned1313/Getting-Started-Vault/blob/master/m3/m3-mysqlengine.sh

Besides Mysql example, Vault secrets can be used for certificates, for SSH credentials, etc. Secrets engine make Vault extensible, there are different plugin to handle different needs.

Controlling Access

Similar to RBAC in K8s, authentication method to control login, policies to control what is able to do, managing client tokens.

Vault Auth Method

Vault has internal authentication (called userpass) and support external, multiple authrntication methods. https://github.com/ned1313/Getting-Started-Vault/blob/master/m4/m4-basicauth.sh

1
2
3
4
5
6
7
8
9
## list current auth methods
## by default, we only have token auth method
vault auth list
## enable new auth method
vault auth enable userpass
## create user
vault write auth/userpass/users/arthur password=dent
## list user
vault list auth/userpass/users

You will see the changes in UI Access section.

Once you create a user/password, then you can run for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## 这里使用的是user/password login,不是用token
vault login -method=userpass username=arthur
## then input password
Password (will be hidden):
## 这个warning是因为,做实验的时候之前export了root token
WARNING! The VAULT_TOKEN environment variable is set! This takes precedence
over the value set by this command. To use the value set by this command,
unset the VAULT_TOKEN environment variable or set it to the token displayed
below.

Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.

Key Value
--- -----
token s.z52FGzn78XynKlxeS0Akt0t7
token_accessor 90LaRNNtsCUEg6yDd9ldRi3v
token_duration 768h
token_renewable true
token_policies ["default"]
identity_policies []
policies ["default"]
token_meta_username arthur


## will show you where is the token from, not root token any more
## see `path` field where is the token from
vault token lookup

Token represents who you are and what you can do, for example, the user itself cannot update its password via its token:

1
2
3
4
5
6
7
8
9
10
## update password
vault write auth/userpass/users/arthur/password password=tricia
## output
Error writing data to auth/userpass/users/arthur/password: Error making API request.

URL: PUT http://0.0.0.0:8200/v1/auth/userpass/users/arthur/password
Code: 403. Errors:

* 1 error occurred:
* permission denied

Can do this after export root token.

Delete auth:

1
2
#Remove account
vault delete auth/userpass/users/arthur

这里介绍了2个新概念: LDAP and Active Directory What are the differences between LDAP and Active Directory? Active Directory is a database based system that provides authentication, directory, policy, and other services in a Windows environment

LDAP (Lightweight Directory Access Protocol) is an application protocol for querying and modifying items in directory service providers like Active Directory, which supports a form of LDAP.

Short answer: AD is a directory services database, and LDAP is one of the protocols you can use to talk to it.

见下面vault policy章节的例子. Case: 在外部设置了AD as authentication method, enable LDAP auth in Vault. Then user login Vault with AD credentials against LDAP talk to AD, AD talk to Vault and determine the policy about what the user can do, then user will get the right token from vault to access the store.

Vault Policy

Similar to K8s role. https://github.com/ned1313/Getting-Started-Vault/blob/master/m4/m4-activedirectory.sh

这个例子,远程登录了一个Vault server, login as root, create devkv store with key and value pair, then create policy dev with HCL file for user to access the devkv.

Enable LDAP auth, configure it with Active Directory remotely. Then assign the developers group in LDAP with dev policy.

Then user adent login vault against ldap method. The user is in the developers group so it will get token with permission specified by dev policy.

Client Token

https://www.vaultproject.io/docs/concepts/tokens

Wrapping Response

https://github.com/ned1313/Getting-Started-Vault/blob/master/m4/m4-tokenwrapping.sh

Operating Vaule Server

主要讲了production Vault server setup.

  • Vault server architecture
  • Storage backend options
  • Vault server operations

这里例子用Consul作为 storage backend(不是说it’s not intended for this吗😂), 在Vault 主机上运行有Consul agent 和 远处的Consul server通信 (gossip tcp: 8301, RPC tcp: 8300) ,Consul server可以有多个以实现HA。Vault通过本地的Consul agent port 8500和其交互,外界访问Vault 用默认的port 8200: https://github.com/ned1313/Getting-Started-Vault/tree/master/m5

可以留意一下这里的Consul配置,之前用的一直是dev mode,这里是production mode了, 把Consul 包装成了一个service, run as daemon, in consul folder:

注意把 .hcl and .service 文件内容paste到 consul-deploy.sh 创建的文件中。script中的useradd command也值得借鉴。consul keygen 是用来产生encrypt string的,用在server 和 client的 .hcl中。

Consul agent setup:

好了,这里引出一个有意思的东西: Create Custom Systemd Service,之前从来没有这么做过,原来systemd是可以自己配置的! https://medium.com/@benmorel/creating-a-linux-service-with-systemd-611b5c8b91d6

Vault server Config, also set Vault as systemd service, in vault folder:

.hcl 中就配置了Consul 为storage backend,看上去本来就有这种特性,所以不需要过多的设置。

Server Operations

Operator command: https://www.vaultproject.io/docs/commands/operator

Setup production Vault server, sealed and need to unseal: https://github.com/ned1313/Getting-Started-Vault/blob/master/m5/m5-serveroperations.sh

Auditing

Everything is audited, sensitive data is hashed unless you explicitly set false. https://github.com/ned1313/Getting-Started-Vault/tree/master/m6

Auditing device type:

  • File, JSON format
  • Syslog
  • Socket: TCP/UDP
1
2
3
4
5
6
7
8
9
10
11
12
13
## enable auditing device
vault audit enable [type] [-path=valut_path]
## log_raw=true means no encrypt sensitive data
vault audit enable file file_path=/var/log/vault/vault_audit.log log_raw=true
vault audit enable -path=file2 file file_path=/var/log/vault/vault_audit2.log
vault audit enable syslog tag="vault" facility="LOCAL7"

## disable
vault audit disable [vault_path]
## disable file2 created above
vault audit disable file2
## list
vault audit list -detailed

Lab Environment Setup

Consul is easy to install, just a executable binary, put it in /usr/local/bin: https://www.consul.io/downloads

我修改了一下课程的demo,做了一个consul lab cluster via Vagrant: https://github.com/chengdol/InfraTree/tree/master/vagrant-consul

Glossary: https://github.com/chengdol/InfraTree/blob/master/vagrant-consul/glossary.md

Introduction

Challenges in managing services:

  • Service discovery
  • Failure Detection
  • Mutli-Data center
  • Service configuration

一个应用服务架构中,一般有API tier增加灵活性,同时提供额外的服务,比如以下应用就可以直接拿来API使用:

Consul is distributed.

These services need to be discovered by each other. 对于越来越复杂的内部组织结构,比如很多internal load balancer, Consul can come and play, 比如提供内部的DNS服务, Service discovery.

Failure Dectection, Consul running lightweight Consul agent (server or client mode) on each of node in your environment. The agent will diagnose all services running locally.

Reacting configuration via key/value store, reflecting changes quickly in near real time. Multi-Data center aware.

Consul vs Other softwares, see here. Especailly Consul vs Istio, see here.

Consul UI online demo: https://demo.consul.io

Monitor Nodes

在这一章的例子中提供了一个很好的建模思路!在vagrant virutal machine中安装docker,然后用container的方式运行一些服务(比如这里的Nginx web and HAProxy LB),再expose(localhost)这些端口(对machine iptables做了更改),这样就避免了很多的virtual machine上的安装配置工作。

Start consul server agent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# -dev: development agent, server mode will be turned on this agent, for quick start
# in production, don't use -dev

# -advertise: specify one ipv4 interface
# -client: specify client access ip, usually 0.0.0.0
consul agent -dev -bind 0.0.0.0 -advertise 172.20.20.31 -client 127.0.0.1

# log output
==> Starting Consul agent...
Version: 'v1.8.0'
Node ID: '95b60a36-f350-8a2b-b1cb-54f7b79657dc'
Node name: 'consul-server'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 172.20.20.31 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:
...
2020-06-19T23:59:25.729Z [INFO] agent.server: New leader elected: payload=consul-server
...

[ ] 我改动了一下Vagrantfile, 我估计是routing table出了问题,在MacOS host上无法访问private network中的virtual machine via private IP: https://stackoverflow.com/questions/23497855/unable-to-connect-to-vagrant-private-network-from-host

于是我增加了一个VM ui 去显示consul 的UI with port forwarding, but still does not work, from the log the port 8500 is bound with 127.0.0.1:

1
2
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 172.20.20.41 (LAN: 8301, WAN: 8302)

首先,我想到了更改Client Addr 为 172.20.20.41,因为这是我在Vagrantfile中设置的private IP:

1
consul agent -config-file /vagrant/ui.consul.json -advertise 172.20.20.41 -client 172.20.20.41

但还是不行, 主机上localhost:8500 无法连接,当然为了确认-client flag的使用的正确性,用netstat查看一下是否端口在改interface上。后来我就想到应该是iptables的问题了,没有这个interface上的流量forward出去,那就改成0.0.0.0好了(specify “any IPv4 address at all”):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# /vagrant/ui.consul.json set ui is true
consul agent -config-file /vagrant/ui.consul.json -advertise 172.20.20.41 -client 0.0.0.0

# output
==> Starting Consul agent...
Version: 'v1.8.0'
Node ID: '10ccbe63-bef0-3cf6-b24b-e0a53bdef213'
Node name: 'ui'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 172.20.20.41 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:
...
==> Consul agent running!
...
2020-06-20T03:38:12.476Z [INFO] agent: (LAN) joining: lan_addresses=[172.20.20.31]
2020-06-20T03:38:12.477Z [WARN] agent.client.manager: No servers available
2020-06-20T03:38:12.477Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-06-20T03:38:12.480Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: consul-server 172.20.20.31
2020-06-20T03:38:12.480Z [INFO] agent: (LAN) joined: number_of_nodes=1
...

或者可以在config json中定义client_addr:

1
2
3
4
5
{
"retry_join": ["172.20.20.31"],
"data_dir": "/tmp/consul",
"client_addr": "0.0.0.0"
}

虽然通过的ui virtual machine暴露的web,但是所有信息都来自consul server! 和k8s nodeport的模式类似。 Can access via HTTP API: https://www.consul.io/api-docs

1
2
3
http://localhost:8500/v1/catalog/nodes
# format readable
http://localhost:8500/v1/catalog/nodes?pretty

DNS query, go to ui node, when we run consul agent, the DNS port is 8600:

1
2
3
4
5
6
# query node
dig @localhost -p 8600 consul-server.node.consul
# query service
dig @localhost -p 8600 consul.service.consul
# query service record, will show you the server port, such as 8300
dig @localhost -p 8600 consul.service.consul SRV

The RPC Protocol is deprecated and support was removed in Consul 0.8. Please use the HTTP API, which has support for all features of the RPC Protocol.

Consul Commands

这里提到了2个有用的commands, 本来是用RPC实现的,但现在改了:

1
2
3
4
5
# can specify target point
# provide debug info
consul info [-http-addr=172.20.20.31:8500]
# get log message, 这样就可以在某一agent上查看任意其他的agent log了
consul monitor [-http-addr=172.20.20.31:8500]

Here 172.20.20.31 is consul server, you must start it by -client 0.0.0.0, otherwise the port is bound with loopback interface and cannot access.

Other commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# maintain node
# enable maintaince, service will not show in consul DNS
# -service: maintain for a specific service
consul maint -enable -reason "Because..."
consul maint
consul maint -disable

# validate config file
# the config file must complete! cannot separate to several parts!
consul validate [config file]

# show members
consul members

# similar to docker/k8s exec
consul exec uptime

Note that consul exec is by default disabled: https://www.consul.io/docs/agent/options.html#disable_remote_exec 这个命令挺危险,就相当于ssh到node执行command line. 比如在node上用的docker container提供服务,则可以exec到node docker stop xxx.

BTY, gracefully exit the consul process will not cause warning or error in UI display. If you force kill it, the node will be marked as critical.

Service Discovery

One way to register service to consul is use Service definition: https://www.consul.io/docs/agent/services 比如register LB service to consul,这样的好处就是前面提到了,consul会根据其他agent反馈的web nginx的情况及时修改HAProxy的config信息,更新配置, 接下来会看到:

Regsiter service does not mean the service is healthy, also need to do healthy check: For example:

1
2
3
4
5
6
7
8
9
10
{
"service": {
"name": "web",
"port": 8080,
"check": {
"http": "http://localhost:8080",
"interval": "10s"
}
}
}

Then launch consul client agent, add one more service config file web.service.json for registration, for example, in web1 node:

1
2
3
consul agent -config-file /vagrant/common.json \
-advertise 172.20.20.21 \
-config-file /vagrant/web.service.json

Then check the consul UI, you will see the node is good but service is unhealthy because now there is no nginx running, so create nginx in web1 node:

1
/vagrant/setup.web.sh

Then refresh the web page, everything is good.

You can dig the web service from ui node, this is so called internal service discovery, not facing public. 这些数据对于LB来说可以用来direct traffic, 这就是Consul自带DNS的好处,没有什么额外的设置了,并且还提供了health check,就非常方便了. 并且public facing LB也在Consul中注册了,这样一旦LB goes down,就能被马上监测到。

1
2
dig @localhost -p 8600 web.service.consul SRV
# you will see exactly the number of web service running

Except query DNS from dig, consul HTTP API also can do it:

1
2
3
4
5
6
7
# services list
curl http://localhost:8500/v1/catalog/services?pretty
# service web detail
curl http://localhost:8500/v1/catalog/service/web?pretty
# health check
# see the Status field: passing or critical
curl http://localhost:8500/v1/health/service/web?pretty

前面用到了service definition去register service,这只是一种方法,还可以用HTTP API 注册. 这里还有一些自动注册的工具: https://www.consul.io/downloads_tools

LB Dynamic Config

HAProxy: The Reliable, High Performance TCP/HTTP Load Balancer. HAProxy config file haproxy.cfg example:

1
2
3
4
5
6
7
8
9
10
11
12
13
global
maxconn 4096

defaults
mode http
timeout connect 5s
timeout client 50s
timeout server 50s

listen http-in
bind *:80
server web1 172.20.20.21:8080
server web2 172.20.20.22:8080

8080 port is where nginx web service from, bind *:80 is meant to expose port for health check, 意思是外界通过LB上的80 端口访问后台web servers, 这也就是为啥consul中LB的health check输出 居然是welcome to Nginx!,因为那是后台返回的页面.

In the demo, we run HAProxy container in lb machine. How to verify it is up and running? In any machine:

1
2
dig @localhost -p 8600 lb.service.consul SRV
# the lb record will show

Now let’s verify LB is actually working:

1
2
3
# try several times, LB will cycling through backend servers
# you will see different ip returned
curl http://localhost/ip.html

如果这时关掉一个web server,在HAProxy没有enable health check功能的情况下,仍然会把请求发往已经挂掉的server,则用户得到503 error. 这也是很多LB的问题,需要设置自身的health check。但如果用consul的DNS,由于各个server的health check已经集成进去了,consul会返回健康的server进行服务. So we can feed information to LB from consul dynamically.

Consul Template

Consul template is go template format: https://github.com/hashicorp/consul-template 这个不仅仅用于config LB, any application with config file can utilize this tool!

Workflow: consul template will listen changes from consul, as changes occur it will be pushed to the consul template daemon (run in lb machine). consul template daemon will generate HAProxy new config file from a template for HAProxy, then we tell docker to restart HAProxy (or HAProxy reload config).

This is the haproxy.ctmpl file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
global
maxconn 4096

defaults
mode http
timeout connect 5s
timeout client 50s
timeout server 50s

listen http-in
bind *:80{{range service "web"}}
server {{.Node}} {{.Address}}:{{.Port}}{{end}}

stats enable
stats uri /haproxy
stats refresh 5s

This part means in the web display HAProxy statistic report! 这个统计图挺直观的,但我这里由于route原因看不到, access from http://<Load balancer IP>/haproxy:

1
2
3
stats enable
stats uri /haproxy
stats refresh 5s

Next, install consul-template in lb machine, run some tests with template file:

1
2
# dry run
consul-template -template /vagrant/provision/haproxy.ctmpl -dry

At meanwhile, go to web1 machine, run docker stop/start web, you will see the real time updates in output from consul-template command above.

Then, create consul-template template file lb.consul-template.hcl, used to tell consul-template how to do its job.

1
2
consul-template -config /vagrant/provision/lb.consul-template.hcl
# you will see the haproxy.cfg is replaced by new one

Then we can provision the daemon run in background in lb machine:

1
(consul-template -config /vagrant/provision/lb.consul-template.hcl >/dev/null 2>&1)&

Open the consul UI, in terminal go to web1 or web2 machine, stop/start the docker, see the updates. Also in lb machine, run below command to see the LB still works good, it will not return the unhealthy server to you:

1
curl http://localhost/ip.html

Other tools

  • Envconsul Envconsul provides a convenient way to launch a subprocess with environment variables populated from HashiCorp Consul and Vault. 前面提到了config file for process, here Envconsul set env variables for process and kick off for us.

  • confd confd is a lightweight configuration management tool

  • fabio fabio is a fast, modern, zero-conf load balancing HTTP(S) and TCP router for deploying applications managed by consul

Reactive Configuration

One of primary use case is to update app configuration. for example, when services changes inject the changes to consul key/value pairs and have it pushed into our application.

注意key/value不要用来当Database, it’s not intended for! 但是运作的方式几乎和etcd一样! https://etcd.io/

Go to Consul UI to add key/value pairs, create a folder path /prod/portal/haproxy, then create key/value pair in it:

1
2
3
4
5
maxconn 2048
stats enable
timeout-client 50s
timeout-connect 5s
timeout-server 50s

SSH to ui node, let’s read the key/value stored:

1
2
3
4
5
6
7
8
9
10
11
# list all pairs
curl http://localhost:8500/v1/kv/?recurse'&'pretty

# add key/value via HTTP API
# /prod/portal/haproxy is path we created before
curl -X PUT -d '50s' http://localhost:8500/v1/kv/prod/portal/haproxy/timeout-server
# delete
curl -X DELETE http://localhost:8500/v1/kv/prod/portal/haproxy/timeout-server
# get one
curl -X GET http://localhost:8500/v1/kv/prod/portal/haproxy/timeout-server?pretty
curl -X GET http://localhost:8500/v1/kv/prod/portal/haproxy/timeout-server?raw

The API will return JSON data, you can use jq to parse it.

Update the LB config template haproxy.ctmpl as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
global
maxconn {{key "prod/portal/haproxy/maxconn"}}

defaults
mode http
timeout connect {{key "prod/portal/haproxy/timeout-connect"}}
timeout client {{key "prod/portal/haproxy/timeout-client"}}
timeout server {{key "prod/portal/haproxy/timeout-server"}}

listen http-in
bind *:80{{range service "web"}}
server {{.Node}} {{.Address}}:{{.Port}}{{end}}

stats {{key "prod/portal/haproxy/stats"}}
stats uri /haproxy
stats refresh 5s

Then make consul-template process reload without killing it:

1
2
# HUP signal will make consul-tempalte reload
killall -HUP consul-template

Then you will see the haproxy.cfg file is regenerated!

来谈谈为什么这个key/value setting如此重要: 有时候实现并不知道具体设置参数,在production环境,你可能想real time更新参数,比如这里LB中maxconn,实际使用中可能由于machine CPU, memory等因素,不得不调小,你可以用consul maint或其他方式去调节, but that would be a pain and the change will take time to converge across the infrastructure.

Use Key/Value store is really a reactive confiuration!

Blocking query

https://www.consul.io/api-docs/features/blocking

A blocking query is used to wait for a potential change using long polling. Not all endpoints support blocking, but each endpoint uniquely documents its support for blocking queries in the documentation.

Endpoints that support blocking queries return an HTTP header named X-Consul-Index. This is a unique identifier representing the current state of the requested resource.

Use curl -v to check HEADER info to see if it has X-Consul-Index.

这个功能可以用在比如自己的app long polling consul API, 去等待changes happen, reactive listen to changes of consul. 这比周期性的探测节省很多资源。for example:

1
curl -v http://localhost:8500/v1/kv/prod/portal/haproxy/stats?index=<X-Consul-Index value in header>'&'wait=40s

如果有change发生,每次X-Consul-Index value 都会变化.

Health Check

Gossip pool via Serf and Edge triggered updates, peer to peer. Serf: https://www.serfdom.io/ (在UI中每个node都有Serf health status)

If you kill and start the consul agent in one node, you will see the log something like:

1
2
serf: EventMemberFailed ...
serf: EventMemberJoin ...

There are LAN gossip and WAN gossip.

Information disseminated:

  • Membership (discovery, joining) - joining the cluster entails only knowing the address of one other node (not required to be a server)
  • Failure detection - affords distributed health checks, no need for centralized health checking
  • Event broadcast - i.e. leader elected, custom events

System-Level Check

非常类似于K8s的 liveness probe.

https://www.consul.io/docs/agent/checks.html One of the primary roles of the agent is management of system-level and application-level health checks. A health check is considered to be application-level if it is associated with a service. If not associated with a service, the check monitors the health of the entire node.

前面都是用到了service check, 这里增加check node status. For example, disk usage, memory usage, etc.

Update common.json config file, this config file will take effect on lb and web machines, 这部分配置在最近的新版本已经变化了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"retry_join": [
"172.20.20.31"
],
"data_dir": "/tmp/consul",
"client_addr": "0.0.0.0",
"enable_script_checks": true,
"checks": [
{
"id": "check_cpu_utilization",
"name": "CPU Utilization",
"args": ["/vagrant/provision/hc/cpu_utilization.sh"],
"interval": "10s"
},
{
"id": "check_mem_utilization",
"name": "MEM Utilization",
"args": ["/vagrant/provision/hc/mem_utilization.sh"],
"interval": "10s"
},
{
"id": "check_hdd_utilization",
"name": "HDD Utilization",
"args": ["/vagrant/provision/hc/hdd_utilization.sh"],
"interval": "10s"
}
]
}

Let’s see the mem_utilization.sh file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
AVAILABLE_RAM=`grep MemAvailable /proc/meminfo | awk '{print $2}'`
TOTAL_RAM=`grep MemTotal /proc/meminfo | awk '{print $2}'`
RAM_UTILIZATION=$(echo "scale = 2; 100-$AVAILABLE_RAM/$TOTAL_RAM*100" | bc)
RAM_UTILIZATION=${RAM_UTILIZATION%.*}

echo "RAM: ${RAM_UTILIZATION}%, ${AVAILABLE_RAM} available of ${TOTAL_RAM} total "

if (( $RAM_UTILIZATION > 95 ));
then
exit 2
fi

if (( $RAM_UTILIZATION > 70 ));
then
exit 1
fi

exit 0

The system-level health check sections will be displayed in consul UI. For stress test, install stress software in web1 machine (in the demo code it is added):

1
2
# install
sudo apt-get install stress

CPU stress test, then you will see in the consul UI the node is unhealthy and is cycled out from LB:

1
stress -c 1

Watching the consul UI for web1, you will see CPU check failed:

1
2
3
4
5
6
7
CPU: 100%
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
vagrant 3122 97.2 0.0 7312 100 pts/0 R+ 21:26 0:14 stress -c 1
root 822 1.4 10.7 578220 53828 ? Ssl 21:19 0:06 /usr/bin/docker daemon --raw-logs
vagrant 2121 0.6 11.1 785204 55636 ? Sl 21:20 0:02 consul agent -config-file /vagrant/config/common.json -config-file /vagrant/config/web.service.json -advertise 172.20.20.21
vagrant 3099 0.2 1.1 23012 5724 pts/0 Ss 21:26 0:00 -bash
root 1 0.1 0.7 33604 3792 ? Ss 21:19 0:00 /sbin/init

Once it recover, node will itself back into the pool. 这个功能非常有用,可以提前预警可能会发生问题的node. 比如某个web server overloaded,检测出unhealthy,则会被LB 移出,待恢复后又会自动加进去!

Recently I am working on Kubernetes Operator, using golang to implement the logic for operators, after having a brief understanding of Go value and Philosophy, and basic syntax, structure, this book is my next step.

There are additional, comprehensive resources on Go web site: https://go.dev/

Other 2 Chinese Go programming books, looks good:

今天遇到一个问题,如何将在一个stage中产生的变量,传递到另一个stage中。一种解决办法是使用global variable, for example, in declarative pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// variable to be used
def jobBaseName

stage ('Construct Img name') {
// this is from scripted pipeline syntax
jobBaseName = sh(
script: "echo ${BUILD_TAG} | awk '{print tolower($0)}' | sed 's/jenkins-//'",
returnStdout: true
)
}

stage ('Build Target Container') {
sh "ssh -i ~/ssh_keys/key.key user@somehost 'cd /dockerdata/build/${BUILD_TAG} && docker build -t localrepo/${jobBaseName}:${BUILD_NUMBER} .'"
}

还有人通过将变量写入文件中,再从另一个stage读取加载的方式,但这需要保证Stages are running on the same node agent.

此外,关于Jenkins中的environment variablebuild parameters,有如下需要注意的地方: https://stackoverflow.com/questions/50398334/what-is-the-relationship-between-environment-and-parameters-in-jenkinsfile-param

Basically it works as follow

  • env contains all environment variables, for example: env.BUILD_NUMBER
  • Jenkins pipeline automatically creates a global variable for each environment variable
  • params contains all build parameters, for example: params.WKC_BUILD_NUMBER
  • Jenkins also automatically creates an environment variable for each build parameter (and as a consequence of second point a global variable).

Environment variables can be overridden or unset (via Groovy script block) but params is an immutable Map and cannot be changed. Best practice is to always use params when you need to get a build parameter.

这些信息哪里来的呢?在配置pipeline时,查看pipeline syntax -> Global Variables Reference.

最近疫情比较严重,在家办公了,大家出行减少,于是上个月保险公司给我退了$11的汽车保费。今年我感觉疫情很难结束,并且wfh将会持续很长一段时间,平时出门也就买个菜,于是想把保险换成更便宜一些的。

我目前的投保的公司是Progressive,之前是Farmers。但我开车一向比较注意,所以从来没出过事故。

这里有篇文章介绍了一下美国汽车保险的情况: https://www.bangli.us/post/3842 https://www.guruin.com/guides/car-insurance https://www.dealmoon.com/guide/773024

后来换成了GEICO,价格便宜了近一半,直接在GEICO app 上申请的,续费和查看也很方便。

这段时间开始研究Operator了,刚好有这本书,计划快速过一遍,recap quick start. 看完了一遍,最深得感受就是,如果K8s是云的操作系统,那么Operator就是一个云应用程序自己的管理工具,也就是书中说的application SRE。

Book accompanying git repo: https://github.com/chengdol/chapters/ (this is forked from origin) 还推荐几本书, from O’reilly:

  • Programming Kubernetes (dive deeper into API)
  • Extending Kubernetes

My K8s operator-sdk demo git repo, step by step guide you setup a go-based operator and deploy in K8s cluster: https://github.com/chengdol/k8s-operator-sdk-demo

其他一些资料收集在了这篇blog中: Kubernetes Operator Learning

Chapter 1 Introduction

Operators grew out of work at CoreOS during 2015 and 2016. User experience with the Operators built there and continuing at Red Hat.

An Operator continues to monitor its application as it runs, and can back up data, recover from failures, and upgrade the application over time, automatically.

An Operator is a custom Kubernetes controller watching a CR type and taking application-specific actions to make reality match the spec in that resource.

Making an Operator means creating a CRD and providing a program that runs in a loop watching CRs of that kind.

The Operator pattern arose in response to infrastructure engineers and developers wanting to extend Kubernetes to provide features specific to their sites and software.

看到这里产生了一个疑问: Helm and Operator.

Chapter 2 Running Operators

这是最基本的operator 演示,一个etcd cluster,有很大的启发价值,注意下面例子中各自的创建顺序: https://github.com/kubernetes-operators-book/chapters/tree/master/ch03

First need cluster-wise privilege:

1
2
3
4
5
6
7
8
9
10
11
12
## need cluster wide privilege
kubectl describe clusterrole cluster-admin

## good
Name: cluster-admin
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
*.* [] [] [*]
[*] [] [*]

Start with etcd as ‘hello world’ example. Deviation:

you’ll deploy the etcd Operator, then have it create an etcd cluster according to your specifications. You will have the Operator recover from failures and perform a version upgrade while the etcd API continues to service read and write requests, showing how an Operator automates the lifecycle of a piece of foundation software.

A CRD is akin to a schema for a CR, defining the CR’s fields and the types of values those fields contain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: etcdclusters.etcd.database.coreos.com
spec:
group: etcd.database.coreos.com
names:
kind: EtcdCluster
listKind: EtcdClusterList
plural: etcdclusters
shortNames:
- etcdclus
- etcd
singular: etcdcluster
scope: Namespaced
version: v1beta2
versions:
- name: v1beta2
served: true
storage: true

The CR’s group, version, and kind together form the fully qualified name of a Kubernetes resource type. That canonical name must be unique across a cluster.

Defining an Operator Service Account:

1
2
3
4
apiVersion: v1
kind: ServiceAccount
metadata:
name: etcd-operator-sa

Defining role:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: etcd-operator-role
rules:
- apiGroups:
- etcd.database.coreos.com
resources:
- etcdclusters
- etcdbackups
- etcdrestores
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- '*'
- apiGroups:
- apps
resources:
- deployments
verbs:
- '*'
- apiGroups:
- ""
resources:
- secrets
verbs:
- get

Defining rolebinding, assigns the role to the service account for the etcd Operator:

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: etcd-operator-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: etcd-operator-role
subjects:
- kind: ServiceAccount
name: etcd-operator-sa
namespace: default

The Operator is a custom controller running in a pod, and it watches the EtcdCluster CR you defined earlier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: apps/v1
kind: Deployment
metadata:
name: etcd-operator
spec:
selector:
matchLabels:
app: etcd-operator
replicas: 1
template:
metadata:
labels:
app: etcd-operator
spec:
containers:
- name: etcd-operator
image: quay.io/coreos/etcd-operator:v0.9.4
command:
- etcd-operator
- --create-crd=false
env:
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
imagePullPolicy: IfNotPresent
serviceAccountName: etcd-operator-sa

Declaring an etcd cluster:

1
2
3
4
5
6
7
apiVersion: etcd.database.coreos.com/v1beta2
kind: EtcdCluster
metadata:
name: example-etcd-cluster
spec:
size: 3
version: 3.1.10

After create CR resource, operator will generate 3 replicas pod (the pod definition is written by operator logic).

This example etcd cluster is a first-class citizen, an EtcdCluster in your cluster’s API. Since it’s an API resource, you can get the etcd cluster spec and status directly from Kubernetes.

1
2
## etcdcluster is a resource just like pod/deploy/sts
kubectl describe etcdcluster example-etcd-cluster

The etcd Operator creates a Kubernetes service in the etcd cluster’s namespace:

1
kubectl get services --selector etcd_cluster=example-etcd-cluster

Run the etcd client on the cluster and use it to connect to the client service and interact with the etcd API.

1
kubectl run --rm -i --tty etcdctl --image quay.io/coreos/etcd --restart=Never -- /bin/sh

From the etcd container’s shell, create and read a key-value pair in etcd with etcdctl’s put and get verbs:

1
2
3
4
5
6
7
export ETCDCTL_API=3
export ETCDCSVC=http://example-etcd-cluster-client:2379
etcdctl --endpoints $ETCDCSVC put foo bar
etcdctl --endpoints $ETCDCSVC get foo

## check etcd cluster general health
etcdctl --endpoints http://example-etcd-cluster-client:2379 cluster-health

You can try to delete etcd pod or upgrade the version (edit cr file then apply) and watching the operator recover the health.

kubectl tricks for upgrade:

1
2
kubectl patch etcdcluster example-etcd-cluster --type='json' \
-p '[{"op": "replace", "path": "/spec/version", "value":3.3.12}]'

Chapter 3 Operators at the Kubernetes Interface

Operators extend two key Kubernetes concepts: resources and controllers. The Kubernetes API includes a mechanism, the CRD, for defining new resources.

这2段话把一般通用控制器和operator的区别讲清楚了:

The actions the ReplicaSet controller takes are intentionally general and application agnostic. It does not, should not, and truly cannot know the particulars of startup and shutdown sequences for every application that might run on a Kubernetes cluster.

An Operator is the application-specific combination of CRs and a custom controller that does know all the details about starting, scaling, recovering, and managing its application.

Every Operator has one or more custom controllers implementing its application-specific management logic.

An Operator, in turn, can be limited to a namespace, or it can maintain its operand across an entire cluster.

For example, cluster-scoped operator:

Istio operator: https://github.com/istio/operator cert-manager: https://github.com/jetstack/cert-manager

A service account is a special type of cluster user for authorizing programs instead of people. An Operator is a program that uses the Kubernetes API, and most Operators should derive their access rights from a service account.

Chapter 4 The Operator Framework

This chapter introduced the three pillars of the Operator Framework: the Operator SDK for building and developing Operators; Operator Lifecycle Manager for distributing, installing, and upgrading them; and Operator Metering for measuring Operator performance and resource consumption.

The Red Hat Operator Framework makes it simpler to create and distribute Operators. It makes building Operators easier with a software development kit (SDK) that automates much of the repetitive implementation work. The Framework also provides mechanisms for deploying and managing Operators. Operator Lifecycle Manager (OLM) is an Operator that installs, manages, and upgrades other Operators. Operator Metering is a metrics system that accounts for Operators’ use of cluster resources.

Operator SDK: https://github.com/operator-framework/operator-sdk The SDK currently includes first-class support for constructing Operators in the Go programming language, with support for other languages planned. The SDK also offers what might be described as an adapter architecture for Helm charts or Ansible playbooks.

Operator Lifecycle Manager takes the Operator pattern one level up the stack: it’s an Operator that acquires, deploys, and manages Operators on a Kubernetes cluster.

Operator Metering is a system for analyzing the resource usage of the Operators running on Kubernetes clusters.

Install operator SDK: https://sdk.operatorframework.io/docs/install-operator-sdk/ 注意k8s version是否与当前operator sdk兼容,比如我实验的时候k8s version 1.13.2,它支持的crd api version is apiextensions.k8s.io/v1beta1, 而最近的operator sdk生成的crd api version is apiextensions.k8s.io/v1. 书中用的operator sdk version 0.11.0.

Chapter 5 Sample Application: Visitors Site

In the chapters that follow, we’ll create Operators to deploy this application using each of the approaches provided by the Operator SDK (Helm, Ansible, and Go), and explore the benefits and drawbacks of each.

读到这里,疑惑Helm是如何处理这个问题的,特别是对同一个charts之中的依赖: When deploying applications through manifests, awareness of these relationships is required to ensure that the values line up.

The manifest-based installation for this demo: https://github.com/kubernetes-operators-book/chapters/tree/master/ch05 Now deploying it manually with correct order:

1
2
3
kubectl create -f database.yaml
kubectl create -f backend.yaml
kubectl create -f frontend.yaml

Deletion:

1
2
3
kubectl delete -f database.yaml
kubectl delete -f backend.yaml
kubectl delete -f frontend.yaml

Chapter 6 Adapter Operators

You would have to create CRDs to specify the interface for end users. Kubernetes controllers would not only need to be written with the Operator’s domain-specific logic, but also be correctly hooked into a running cluster to receive the proper notifications. Roles and service accounts would need to be created to permit the Operator to function in the capacity it needs. An Operator is run as a pod inside of a cluster, so an image would need to be built, along with its accompanying deployment manifest.

这章节主要是利用已有的Helm or Ansibel去构造Adapter Operator: The Operator SDK provides a solution to both these problems through its Adapter Operators. Through the command-line tool, the SDK generates the code necessary to run technologies such as Helm and Ansible in an Operator.

First understand the role of CRDs.

  • A CRD is the specification of what constitutes a CR. In particular, the CRD defines the allowed configuration values and the expected output that describes the current state of the resource.
  • A CRD is created when a new Operator project is generated by the SDK.
  • The SDK prompts the user for two pieces of information about the CRD during project creation: kind, api-version

Official operator SDK sample: https://github.com/operator-framework/operator-sdk-samples

Helm Operator

demo git repo to generate helm operator: https://github.com/kubernetes-operators-book/chapters/tree/master/ch06/visitors-helm

A Helm Operator can deploy each instance of an application with a different version of values.yaml. The Operator SDK generates Kubernetes controller code for a Helm Operator when it is passed the --type=helm argument. As a prerequisite, be sure to install the Helm command-line tools on your machine.

New Chart

Generate a blank helm chart structure within the operator project code:

1
2
3
OPERATOR_NAME=visitors-helm-operator
operator-sdk new $OPERATOR_NAME --api-version=example.com/v1 --kind=VisitorsApp --type=helm

At this point, everything is in place to begin to implement your chart.

There are several direcotyies created:

  • build: it contains Dockerfile for operator image
  • deploy: crds definition, role and rolebinding, service account
  • helm-charts: helm chart structure for your app
  • watches.yaml: maps each CR type to the specific Helm chart that is used to handle it.

Existing Chart

Helm install command 其实有很多参数可以customize,比如选择values yaml file, 但这里没有这么灵活,用的是默认的values.yaml.

一定要事先检查template validation,比如对于helm3:

1
helm template <chart dir or archive file> [--debug] | less

查看每个rendering 是否格式正确,helm template对format issue并不会报错。

Generate Helm operator atop existing helm archive, 对于OpenShift,要先oc login,否则operator-sdk 不能获得cluster info:

1
2
3
4
5
OPERATOR_NAME=visitors-helm-operator
## download existing chart archive
wget https://github.com/kubernetes-operators-book/chapters/releases/download/1.0.0/visitors-helm.tgz
## generate helm operator
operator-sdk new $OPERATOR_NAME --api-version=example.com/v1 --kind=VisitorsApp --type=helm --helm-chart=./visitors-helm.tgz
  • --helm-chart: A URL to a chart archive, The repository and name of a remote chart, or The location of a local directory
  • --helm-chart-repo: Specifies a remote repository URL for the chart
  • --helm-chart-version: Tells the SDK to fetch a specific version of the chart. If this is omitted, the latest available version is used.

You will see deploy/crds/example.com_v1_visitorsapp_cr.yaml has the fields exactly the same as values.yaml in helm chart.

Before running the chart, the Operator will map the values found in the custom resource’s spec field to the values.yaml file.

如此生成的CRD 和 role (extremely permissive) 可以直接使用,但可能不满足具体要求,比如constraints 以及权限限制,需要自己调整:

Ansible Operator

More or less the same as Helm operator generation. Generate blank Ansible operator project:

1
2
OPERATOR_NAME=visitors-ansible-operator
operator-sdk new $OPERATOR_NAME --api-version=example.com/v1 --kind=VisitorsApp --type=ansible

Test Operator

An Operator is delivered as a normal container image. However, during the development and testing cycle, it is often easier to skip the image creation process and simply run the Operator outside of the cluster. 这个用在开发测试的时候,它不会部署一个真正的operator deployment,只是一个process,但实验效果和真实的一样。这里只是针对helm and ansible的类型.

1
2
3
4
5
6
## go to root path of operator project
## set full path in `chart` field to chart
cp watches.yaml local-watches.yaml
kubectl apply -f deploy/crds/*_crd.yaml
## start opeerator process
operator-sdk up local --watches-file ./local-watches.yaml

The process is up and running, next is to apply your cr yaml:

1
2
kubectl apply -f deploy/crds/*_cr.yaml
kubectl delete -f deploy/crds/*_cr.yaml

你会看到log的变化,以及application在k8s cluster中的更新。 Once the test is complete, end the running process by pressing Ctrl-C.

During development, repeat this process to test changes. On each iteration, be sure to restart the Operator process to pick up any changes to the Helm or Ansible files

Deploy Operator

Running an Operator outside of the cluster, is convenient for testing and debugging purposes, but production Operators run as Kubernetes deployments.

  1. Build the operator image. The Operator SDK’s build command chains to the underlying Docker daemon to build the Operator image, and takes the full image name and version when run:
1
operator-sdk build jdob/visitors-operator:0.1

You can check the Dockerfile, no additional changes are needed. the ${HOME} is consistent with the path in watches.yaml.

Once built, push the image to an externally accessible repository

  1. Configure the deployment. Update the deploy/operator.yaml file that the SDK generates with the name of the image.

  2. Deploy CRD

  3. Deploy Service account and Role

  4. Deploy Operator deployment

Chapter 7 Operators in Go with the Operator SDK

这一节的code参考,没有把所有logic都写到一个文件下,而是针对不同的resource分开写的,还将公用的部分单独分出来了,很有参考价值, 我总结了一下实现,看这里: https://github.com/chengdol/k8s-operator-sdk-demo

The Operator SDK provides that flexibility by making it easy for developers to use the Go programming language, including its ecosystem of external libraries, in their Operators. Write acutall business logic of operator.

While you can write all these pieces manually, the Operator SDK provides commands that will automate the creation of much of the supporting code, allowing you to focus on implementing the actual business logic of the Operator.

We will explore the files that need to be edited with custom application logic and discuss some common practices for Operator development.

Create Go Based Operator

关于如何command line创建go-based operator书中的表述不清楚,我参考了Red Hat的文档:

这描述太含糊了: In particular, the Operator code must be located in your $GOPATH, 关键是怎么设置$GOPATH:

如果用go env | grep GOPATH,发现已经有默认值了$HOME/go,但还需要在bash env中export:

1
2
3
4
5
6
7
8
9
10
11
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
#export GO111MODULE=on

OPERATOR_NAME=visitors-operator
## 这个路径和后面的controller import中的路径要一致!
OPERATOR_PATH=$GOPATH/src/github.com/jdob
mkdir -p $OPERATOR_PATH
cd $OPERATOR_PATH
## no --type specified, default is go
operator-sdk new $OPERATOR_NAME

针对operator-sdk new出现的错误信息,我export了GO111MODULE=on,但后来重做一遍后这个错误又消失了:

The generation can take a few minutes as all of the Go dependencies are downloaded.

Add CRDs

You can add new CRDs to an Operator using the SDK’s add api command. Run from the Operator project root directory to generate CRD: 这应该说明一个Operator可以有多个CRDs.

1
2
3
cd $OPERATOR_PATH/$OPERATOR_NAME
operator-sdk add api --api-version=example.com/v1 --kind=VisitorsApp
## from command outputs, you will see what files are generated

3 files are important:

  • deploy/crds/*cr.yaml
  • deploy/crds/*crd.yaml
  • pkg/apis/example/v1/visitorsapp_types.go: contains a number of struct objects that the Operator codebase leverages

For example, in pkg/apis/example/v1/visitorsapp_types.go edit the Spec and Status struct:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// VisitorsAppSpec defines the desired state of VisitorsApp
// +k8s:openapi-gen=true
type VisitorsAppSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Add custom validation using kubebuilder tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html

Size int32 `json:"size"`
Title string `json:"title"`
}

// VisitorsAppStatus defines the observed state of VisitorsApp
// +k8s:openapi-gen=true
type VisitorsAppStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Add custom validation using kubebuilder tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html

BackendImage string `json:"backendImage"`
FrontendImage string `json:"frontendImage"`
}

After editing, run

1
2
## After any change to a *_types.go file, you need to update any generated code
operator-sdk generate k8s

Then customize deploy/crds/example_v1_visitorsapp_crd.yaml file to reflect the struct content, for example: https://github.com/chengdol/chapters/tree/master/ch07/visitors-operator/deploy/crds

这里并没有特意修改RBAC,用的默认Operator permission: https://github.com/chengdol/chapters/tree/master/ch07/visitors-operator/deploy

Write Control Logic

Inside of the Operator pod itself, you need a controller to watch for changes to CRs and react accordingly. Similar to adding a CRD, you use the SDK to generate the controller’s skeleton code.

1
2
## generate controller code skeleton
operator-sdk add controller --api-version=example.com/v1 --kind=VisitorsApp

The file pkg/controller/visitorsapp/visitorsapp_controller.go will be created, the is the controller file that implements the Operator’s custom logic.

More information on K8s controller: https://kubernetes.io/docs/concepts/architecture/controller/

主要有2个func需要customize: add and Reconcile,一个是watch也就是告诉K8s哪些resource需要监控,一个是控制逻辑. While the bulk of the Operator logic resides in the controller’s Reconcile function, the add function establishes the watches that will trigger reconcile events: https://github.com/chengdol/chapters/tree/master/ch07/visitors-operator/pkg/controller/visitorsapp

The first watch listens for changes to the primary resource that the controller monitors. 也就是自定义的kind类型。 The second watch, or more accurately, series of watches, listens for changes to any child resources the Operator created to support the primary resource. 也就是自定义kind类型中间接的其他resources,比如deployment, sts, service等

Reconcile function The Reconcile function, also known as the reconcile loop, is where the Operator’s logic resides: https://github.com/chengdol/chapters/blob/master/ch07/visitors-operator/pkg/controller/visitorsapp/visitorsapp_controller.go

The Reconcile function returns two objects: a ReconcileResult instance and an error. 有几种可能:

1
2
3
4
return reconcile.Result{}, nil
return reconcile.Result{}, err
return reconcile.Result{Requeue: true}, nil
return reconcile.Result{RequeueAfter: time.Second*5}, nil

Since Go-based Operators make heavy use of the Go Kubernetes libraries, it may be useful to review: https://pkg.go.dev/k8s.io/api the core/v1 and apps/v1 modules are frequently used to interact with the common Kubernetes resources.

这里提到了update status value,应该对应的是resource yaml中底部的status 信息:

1
2
instance.Status.BackendImage = "example"
err := r.client.Status().Update(context.TODO(), instance)

如同我在这章开头提到的,作者将不同resource的逻辑分开到不同的go file了,可以仔细观察怎么写的.

关于Child resource deletion: If the child resource’s owner type is correctly set to the primary resource, when the parent is deleted, Kubernetes garbage collection will automatically clean up all of its child resources

It is important to understand that when Kubernetes deletes a resource, it still calls the Reconcile function.

There are times, however, where specific cleanup logic is required. The approach in such instances is to block the deletion of the primary resource through the use of a finalizer. A finalizer is simply a series of strings on a resource, 感觉就是一个mark.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
finalizer := "visitors.example.com"

beingDeleted := instance.GetDeletionTimestamp() != nil
if beingDeleted {
if contains(instance.GetFinalizers(), finalizer) {

// Perform finalization logic. If this fails, leave the finalizer
// intact and requeue the reconcile request to attempt the clean
// up again without allowing Kubernetes to actually delete
// the resource.

instance.SetFinalizers(remove(instance.GetFinalizers(), finalizer))
err := r.client.Update(context.TODO(), instance)
if err != nil {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}

Idempotency

It is critical that Operators are idempotent. Multiple calls to reconcile an unchanged resource must produce the same effect each time.

  1. Before creating child resources, check to see if they already exist. Remember, Kubernetes may call the reconcile loop for a variety of reasons beyond when a user first creates a CR. Your controller should not duplicate the CR’s children on each iteration through the loop.

  2. Changes to a resource’s spec (in other words, its configuration values) trigger the reconcile loop. Therefore, it is often not enough to simply check for the existence of expected child resources. The Operator also needs to verify that the child resource configuration matches what is defined in the parent resource at the time of reconciliation.

  3. Reconciliation is not necessarily called for each change to the resource. It is possible that a single reconciliation may contain multiple changes. The Operator must be careful to ensure the entire state of the CR is represented by all of its child resources.

  4. Just because an Operator does not need to make changes during a reconciliation request doesn’t mean it doesn’t need to update the CR’s Status field. Depending on what values are captured in the CR’s status, it may make sense to update these even if the Operator determines it doesn’t need to make any changes to the existing resources.

Operator Impact

If the Operator incorrectly handles operations, they can negatively affect the performance of the entire cluster.

Test Operator

如果operator test有错误,则image build之后运行也会出现同样的错误!

The process running the Operator may be outside of the cluster, but Kubernetes will treat it as it does any other controller.

Go to the root project directory:

1
2
3
4
5
6
## deploy CRD
kubectl apply -f deploy/crds/*_crd.yaml
## start operator in local mode
operator-sdk up local --namespace default
## deploy CR
kubectl apply -f deploy/crds/*_cr.yaml

The Operator SDK uses credentials from the kubectl configuration file to connect to the cluster and attach the Operator. The running process acts as if it were an Operator pod running inside of the cluster and writes logging information to standard output.

Chapter 8 Operator Lifecycle Manager

这章节概念性的东西较多,建议多读几遍。 OLM git repo: https://github.com/operator-framework/operator-lifecycle-manager

Once you have written an Operator, it’s time to turn your attention to its installation and management. As there are multiple steps involved in deploying an Operator, a management layer becomes necessary to facilitate the process. 就是管理Operator的东西.

OLM’s benefits extend beyond installation into Day 2 operations, including managing upgrades to existing Operators, providing a means to convey Operator stability through version channels, and the ability to aggregate multiple Operator hosting sources into a single interface. OLM在Openshift 上是自带的,K8s上没有。OLM也是通过CRD实现的,在Openshift 中run oc get crd 就可以看到相关CRDs.

  1. ClusterServiceVersion You can think of a CSV as analogous to a Linux package, such as a Red Hat Package Manager (RPM) file.

Much like how a deployment describes the “pod template” for the pods it creates, a CSV contains a “deployment template” for the deployment of the Operator pod.

  1. CatalogSource A CatalogSource contains information for accessing a repository of Operators. OLM provides a utility API named packagemanifests for querying catalog sources, which provides a list of Operators and the catalogs in which they are found.
1
kubectl -n olm get packagemanifests
  1. Subscription End users create a subscription to install, and subsequently update, the Operators that OLM provides. A subscription is made to a channel, which is a stream of Operator versions, such as “stable” or “nightly.”

To continue with the earlier analogy to Linux packages, a subscription is equivalent to a command that installs a package, such as yum install.

  1. InstallPlan A subscription creates an InstallPlan, which describes the full list of resources that OLM will create to satisfy the CSV’s resource requirements.

  2. OperatorGroup An Operator belonging to an OperatorGroup will not react to custom resource changes in a namespace not indicated by the group.

Installing OLM

version v0.11.0, 我用的k8s v1.13.2,最近的版本不兼容了 https://github.com/operator-framework/operator-lifecycle-manager/releases

1
2
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/0.11.0/crds.yaml
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/0.11.0/olm.yaml

After applying, The CRDs for OLM are created, the olm pods are up and running in olm namespace. OLM可以用于和OperatorHub.io 进行交互,如同Helm 和HelmHub, Docker 和DockerHub. 书中用了个例子说明如何部署etcd operator from operatorHub.

后面主要是讲如何publish自己的operator了, 目前用不到。

Chapter 9 Operator Philosophy

Let’s try to connect those tactics to the strategic ideas that underpin them to understand an existential question: what are Operators for?

An Operator reduces human intervention bugs by automating the regular chores that keep its application running. Operators: Kubernetes Application Reliability Engineering

有些启发价值: You can build Operators that not only run and upgrade an application, but respond to errors or slowing performance.

Control loops in Kubernetes watch resources and react when they don’t match some desired state. Operators let you customize a control loop for resources that represent your application. The first Operator concerns are usually automatic deployment and self-service provisioning of the operand. Beyond that first level of the maturity model, an Operator should know its application’s critical state and how to repair it. The Operator can then be extended to observe key application metrics and act to tune, repair, or report on them.

Site Reliability Engineering lists the four golden signals as latency, traffic, errors, and saturation.

Highly Successful Operators:

  1. An Operator should run as a single Kubernetes deployment.
  2. Operators should define new custom resource types on the cluster.
  3. Operators should use appropriate Kubernetes abstractions whenever possible.
  4. Operator termination should not affect the operand.
  5. Operator termination should not affect the operand.
  6. Operator termination should not affect the operand.
  7. Operators should be thoroughly tested, including chaos testing.

Appendix

Running an Operator as a Deployment Inside a Cluster

Please see my git repo for more details.

1
2
3
## build operator image
## go to project root directory
operator-sdk build image:tag

Then docker push image to docker registry, replace the image placeholder in operator.yaml file. Then apply the CR yaml.

书中另外2个appendix 是关于CRD validation and RBAC control的设置。

有时有这种需求: pipeline 结束后,有新生成或被改动的文件,需要把这些变化check in 到 remote github repository中,其实就是git add/commit/push 操作。这在Jenkins中如何实现呢?

注意这里的Github repository is secured, 比如Github Enterprise。一般我们设置SSH credentials access (SSH Username with private key), 这个credential 会提前写到 Jenkins Credential Management中,在配置pipeline的时候,最后一步设置SCM -> Git, 除了输入Reporsity URL, 还要add SSH credential. 这样Jenkins才能正常地check out code. 当然,在pipeline steps 中 check out code也行,比如使用 git, checkout snippets.

对于check in code, 也可以使用snippet 比如:

  • withCredentials Bind credential to variables, 这个snippet 可以提供通过环境变量访问credential. 但在这里对于git SSH credential access, 需要设置让git去使用这个变量,this is unknown to me.

  • sshagent: 需要install plugin: https://plugins.jenkins.io/ssh-agent/, pass credential to it. 然后把git 操作放在这个snippet中即可. 比如:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    steps {
    sshagent(['<credential id>']) {
    // fetch a branch, edit and check in the code
    sh '''
    ## or git pull other repository
    git fetch
    git checkout $TARGET_BRANCH
    git reset --hard origin/$TARGET_BRANCH
    git pull

    CHECKOUT_BRANCH="feature/${TARGET_BRANCH}-${COMPONENT_NAME}-${COMPONENT_VERSION}"
    echo "Creating feature branch: $CHECKOUT_BRANCH"
    git checkout -b $CHECKOUT_BRANCH

    sed -i "/.*version.*/c\ version: $COMPONENT_VERSION" files/$COMPONENT_NAME.yaml
    git add files/$COMPONENT_NAME.yaml
    ## list file changes
    git status
    git -c user.name="unibot" -c user.email="unibot@il.example.com" commit -m "Update ${COMPONENT_NAME} to ${COMPONENT_VERSION}"
    git push --set-upstream origin $CHECKOUT_BRANCH
    '''
    }
    }

    参考这里的代码: https://github.com/jenkinsci/pipeline-examples/blob/master/pipeline-examples/push-git-repo/pushGitRepo.groovy

在这里,如果我没有权限去安装sshagent plugin, 还有一个比较好的办法是,设置一个 dedicated node with pre-set SSH credential. 然后需要执行git check in任务的时候指定在这个node上进行即可。

0%