Github external provisioner.

The external provisioner can be backed by many type of filesystem, here we focus on nfs-client.

Notice that in this project, you will see nfs-client and nfs directories, nfs-client means we already has a nfs server and use it on client. nfs means we don’t have a nfs server, but we share other filesystem in nfs way.

Another tool very similar is Rook, see my blog Rook Storage Orchestrator.

The Rook is more heavy and this project is lightweight.

  1. Setup NFS server

  2. Install nfs-utils on all worker nodes See my blog NFS Server and Client Setup, this Server Setup chapter.

  3. Setup NFS provisioner This blog give me clear instruction on how to adopt the nfs-client provisioner: openshift dynamic NFS persistent volume using NFS-client-provisioner.

Notces:

  1. I find a bug in this project, when set rbac, you need to specify -n test-1, otherwise the role was created in test-1 but rolebinding is created in default namespace.

  2. The NFS provisioner is global scoped.

  3. The NFS_SERVER env in deployment.yaml can be hostname or IP address.

  4. If several pods use the same PVC, they share the same PV.

  5. you can customize the storage class if it’s available. For example, set the reclaim policy as retain instead of delete. see doc.

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
## default is delete
reclaimPolicy: Retain
## allow resize the volume by editing the corresponding PVC object
## cannot shrink
allowVolumeExpansion: true
volumeBindingMode: Immediate

//TODO [ ] read official document [ ] udemy course

主讲Anthos的,由于service mesh是其中重要组成,所以讲了很多service mesh的内容, 并且讲得还很好。 Qucik Labs and slides are from PluralSight Anthos special 关于service mesh的实验可以回顾一下是如何在GCloud中操作的。slides也可以下载看看。

Istio is the implementation of a service mesh that improves application resilience as you connect, manage, and secure microservices. It provides operational control and performance insights for a network of containerized applications. It can work across environments(think about Google Anthos)!

Important network functions as below, service mesh decouple them from applications:

  • authn
  • authz
  • latency
  • fault tolerance
  • circuit breaking
  • quota
  • rate limiting
  • load balancing
  • logging
  • metrics
  • distributed tracing
  • topology

So summarize there are 3 parts:

  • Traffic control
  • Observability (dashboard: prometheus, grafana, jaeger, kiali)
  • Security

Istio uses envoy and sidecar pattern in the K8s pods.

Istio main components:

  • Pilot: control plane manages the distributed proxies across the either environment, push service communication policies, just like a software defined network.
    • service discovery
    • traffic management
    • intelligent routing
    • resiliency
  • Mixer: collect info and send telemetry, logs and traces to your system of choice (prometheus, influxDB, Stackdriver, etc)
  • Citadel: policies management, service to service auth[n,z], using mutual TLS, credential management.

How does Istio work, for example, life of a request in the mesh:

  1. service A comes up.
  2. envoy is deployed with it and fetches service information, routing and configuration policy from Pilot.
  3. If Citadel is being used, TLS certs are securely distriuted as well.
  4. service A calls service B.
  5. client-side envoy intercepts the call.
  6. envoy consults config to know how/where to route call to service B.
  7. envoy forwards to appropriate instance of service B, the envoy on server side intercepts the request.
  8. server-side envoy checks with Mixer to validate the call should be allowed.
  9. server-side envoy forwards request to service B for response.
  10. envoy forwards response to the original caller, the response is intercepted by envoy on the caller side.
  11. envoy reports telemetry to Mixer, which in turn notifies appropriate plugins.
  12. client-side envoy forwards response to service A
  13. client-side envoy reports telemetry to Mixer, which in turn notifies appropriate plugins.

2019年12月4日,我拿到了DJI Mavic 2 Pro,其他配件正在陆续抵达中。很高兴,要开始一个体验生活的新分支了,并且是从一个前所未有的角度看世界。我准备用几年时间打造一个自己的channel,分享鸟瞰美景,记录足迹。很多东西要学,除了维护保养,操控,还有摄影,剪辑等等目前我也是几乎没有经验,Let’s go!

在这里我会记录一些关于无人机拍摄的技巧,以及其他相关的方面,当然,还有我的作品。 我想提醒的是无人机是一件复杂设备,前期学习很重要,否则会对产品,人身安全或其他财产造成伤害。

02/01/2020 还未首飞😑,还在忙其他非常重要的事情!春天已经来了. 04/09/2020 还未首飞😌,新冠肺炎疫情爆发了,重要的事情还没忙完!唉。。。 05/16/2021 终于首飞了😒

电池指南

一定要仔细阅读大疆的电池手册,挺多注意事项的,总结一下:

使用注意

  1. 禁止接触任何液体
  2. 不要使用非大疆官方电池
  3. 不要给abnormal电池充电
  4. 不要在电池turn on时从无人机上安装或拆卸
  5. 使用温度范围-10 ~ 40摄氏度,温度过高请自然冷却后使用
  6. 远离强电磁环境,可能损害电池控制板
  7. 不要重压
  8. 不要exhaust电池
  9. 注意电池的隔离,防止接口短路
  10. 飞行前确保电池充满

充电注意

  1. 使用大疆认证的充电器
  2. 充电前请turn off电池
  3. 不要在充电时离开
  4. 不要飞行后立马充电,防止温度过高,理想充电温度22 ~ 28摄氏度
  5. 大疆充电器会在充满后自己切断,但最好人为及时断开

存放注意

  1. 长期存贮确保充电至40 ~ 60%
  2. 不要在天热时将电池放在车内
  3. 电池在10天未使用时会自动发热放电至60%
  4. 每3个月至少完整充放电一次
  5. 电池存储温度-10 ~ 45摄氏度
  6. 长期不使用电池会休眠,充电唤醒
  7. 将电池从无人机上取下单独长期存放

旅行注意

  1. 登机前,将电池释放至30%以下,可以通过飞行消耗实现

安全须知

飞行环境

  1. 飞行远离复杂电磁环境
  2. 飞行高度在6000米以上可能影响性能
  3. 飞行环境温度:-10 ~ 40摄氏度
  4. 环境风速小于10m/s
  5. 注意所在地是否允许无人机飞行

飞前检查

  1. 检查传感器无遮挡
  2. 遥控器以及无人机电池充满状态
  3. 电池安装正确牢固
  4. 螺旋桨臂正确展开
  5. 螺旋桨无破损,安装牢固
  6. 摄像头清洁无污染,伸缩旋转无阻碍
  7. 仅在设备要求时校准罗盘
  8. 熟悉选择的飞行模式,理解各项功能以及航行警报

飞行操作

  1. 在智能飞行模式时,不要靠近强反射面,比如水面或雪地,传感器或受影响
  2. 降落后,首先关闭引擎,然后依次关闭电池,遥控器

Log

05/16/2021 今天跟着这个教程走了一遍: DJI mavic 2 pro beginner guide.

设置好了账号个一些基本配置,然后在院子里进行了首飞。简单的take off/landing. 声音挺大的,下次去公园试试,目前降落没有掌握太好,用的是自主模式,然后就是摄像/照相还没尝试。

05/17/2021

charger hub will not charge batteries parallelly, one by one instead. 注意天线侧面面向无人机最好,也就是天线和display其实是有角度的。

[x] Multiple flight modes: turn on [x] Return to home(the take off point) altitude: 30 ~ 60m 自主返回高度设定 [x] Turn off beginner mode

[x] Caliberate IMU, fold the drone (once of out the box suggested),校准会在不同姿态下进行 [x] Enable visual obstacle avoidance, and others in advanced settings [x] Aircraft battery settings: 15% threshold, RTH [x] Gimbal settings: Gimbal auto caliberation [x] file index mode: continuous [x] set center point: cross

05/18/2021

propeller 螺旋桨 直接死按下降stick 就是landing yaw/turn left/right

左上角的gear 用来操作gimbal的上下视角, 也可以长按屏幕来控制gimbal的上下左右移动 右上角gear 用来调整shutter speed 快门速度

05/19/2021

Revisit what I have learned and hands-on

05/22/2021

Flying at park, video records, try tacker mode [x] register the drone: FAADroneZone.FAA.gov, see [here](https://youtu.be/ToRAN1-vDTM?t=1079

05/27/2021

尝试了降落在手中,这样回收比较方便,特别是在地形不好的情况下。手中起飞道理相同。

When I was working on securing docker registry, I followed the instructions but when run docker push I always get x509: certificate signed by unknown authority error, this means the self-signed certificate is not identified by docker daemon.

This time to get more detail information, need to check the docker daemon log.

How to Enable Debug Mode?

By default, the debug mode is off, check here to enable debugging section: https://docs.docker.com/config/daemon/

Edit the daemon.json file, which is usually located in /etc/docker/. You may need to create this file if it is not there.

1
2
3
{
"debug": true
}

Then send a HUP signal to the daemon to cause it to reload its configuration. On Linux hosts, use the following command:

1
sudo kill -SIGHUP $(pidof dockerd)

Where is The Log?

https://stackoverflow.com/questions/30969435/where-is-the-docker-daemon-log

1
2
3
4
5
6
7
8
9
Ubuntu (old using upstart ) - /var/log/upstart/docker.log
Ubuntu (new using systemd ) - sudo journalctl -fu docker.service
Amazon Linux AMI - /var/log/docker
Boot2Docker - /var/log/docker.log
Debian GNU/Linux - /var/log/daemon.log
CentOS - /var/log/daemon.log | grep docker
CoreOS - journalctl -u docker.service
Fedora - journalctl -u docker.service
Red Hat Enterprise Linux Server - /var/log/messages | grep docker

In Red Hat, from /var/log/messages file I clearly see that the docker daemon pick certificate under /etc/docker/certs.d/<domain, no port number!> folder.

If your OS is using systemd, the journalctl command can help, but the output from container is also dumping here, see this issue: https://github.com/moby/moby/issues/23339.

You can filter it by (works fine in Red Hat):

1
journalctl -fu docker _TRANSPORT=stdout + OBJECT_EXE=docker

This is all about securing servers with SSL/TLS certificates from udemy course SSL complete guide.

The quality of SSL varies, you have SSL setup doesn’t mean your site is good secured. The HTTPS may not work correctly, sub-optimal, you can test it here: https://www.ssllabs.com/index.html

If you click the lock icon at left of the website address, it will show you if the connection is secured or not, it’s certificates, cookies and so on. further click the certificate icon, you will see root CA, intermediate CA and certificate.

Install wireshark on Mac, go to download the stable version dmg package and double click to install.

You can use Chrome inspect -> Network to see traffic or use wireshark

For example, from Network select one item, check HEADER information you can get IP address of remote server, or just use host, nslookup commands to get IP address.

Interesting, from Network I see the the chrome browser sometime uses IPV6 address talk to server, for example facebook and some other sites. see this question

大概总结一下: openssl 目前有command 一次性生成(或分开生成 private key -> CSR(certificate signing request) -> self-signed certificate) private key 和 (self-signed) certificate, 一般用pem 的格式。这2个东西是放在web server上的,也可以把private key 和 certificate 合并到一个pem 文件中。还要注意的是,这里没有用到public key,但public key可以从private key中生成(其实里面已经包含了public key的信息)。此外,web server给 client的certificate 就只是certificate,不会有private key尽管它们可能在一个pem file中。

在client部分,对于self-signed certificate, 需要设置操作系统trust it,但如果是用的let’s encrypt(或其他well-known CA) 则很可能已经支持了。联想一下TLS handshake, 得到web server的certificate 后,会逐层验证到root CA(会下载所有相关的certificates), 由于client自身已经携带了well-known root CA的证书了,并且也知道CA的public key,所以就会知道这个root CA是否合法。如果合法,就会进行symmetric key的生成和交换,用于之后的数据传输。

具体命令操作可以参考: <<Set up Secure Docker Registry Container>> 如果要把这个过程自动化,比如自动给网站request安排证书,renew更新,revoke撤销等,需要用到一些automation,比如certbot, or cert-manager in K8s,见下面一章:

然后再来看看crt/key 文件和它们的内容, 一般来说会用到形如tls.crt, tls.key的文件,其中tls.key 一般就是private key的内容了, 比如:

1
2
3
-----BEGIN RSA PRIVATE KEY-----
///xxxxx
-----END RSA PRIVATE KEY-----

tls.crt 就是certificate, 它是一种PEM formatted file, PEM means Privacy Enhanced Mail (concatenated certificate container files), can have different extension like: tls.cert, tls.cer, tls.pem, etc. PEM is a container format that may include just the public certificate, or may include an entire certificate chain including public key, private key, and root certificates. Confusingly, it may also encode a CSR, for example, here it contains 2 block certificates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// can also contain private key
-----BEGIN RSA PRIVATE KEY-----
(Your Private Key: your_domain_name.key)
-----END RSA PRIVATE KEY-----
// trust chain
// 一般顺序是从上到下: your domain cert -> intermediate cert -> root cert
-----BEGIN CERTIFICATE-----
(Your Primary SSL certificate: your_domain_name.crt)
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
(Your Intermediate certificate: DigiCertCA.crt)
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
(Your Root certificate: TrustedRoot.crt)
-----END CERTIFICATE-----

阅读以下这些文章: PEM, DER, CRT, and CER: X.509 Encodings and Conversions Creating a .pem File for SSL Certificate Installations

Decode PEM encoded ssl/tls certificate to verify it contains correct information:

1
2
# 如果有多个certificate block in one file, 则解析的是第一个
openssl x509 -in <crt file> -text -noout

Cert-manager

Used in K8s to secure for example ingress, search this post: <<Cert-manager Light Note>>.

Certbot

Get free HTTPs certificates forever, it utilizes Let's Encrypt CA to automatically refresh the certificate for your web site: https://letsencrypt.org/docs/ https://certbot.eff.org/

How it works in detail: https://letsencrypt.org/how-it-works/ This is accomplished by running a certificate management agent on the web server.

Encryption

symmetric encryption, the same key is used by both sides, for example: AES. This algorithm is embedded in SSL with HTTPS protocol. asymmetric encryption, for example: RSA.

Hash

How does hash work to verify data integrity:

1
2
3
data + hash(data)  ---------> data + hash(data)
| |
|--hash-->| (compare if they are the same)

Notice that in database the password is hashed, not plain text.

Hash algorithms: MD5, SHA

  • MD5: 128 bits, echo 123 | md5

for SHA, use shasum command in linux or use inline tool.

  • SHA-1: 160 bits
  • SHA-256: 256 bits
  • SHA-512: 512 bits
1
2
## SHA-256
shasum -a 256 -t test.txt
  • HMAC: can be used with md5 or sha. In cryptography, an HMAC (sometimes expanded as either keyed-hash message authentication code or hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. It may be used to simultaneously verify both the data integrity and the authenticity of a message, as with any MAC. Any cryptographic hash function, such as SHA-256 or SHA-3, may be used in the calculation of an HMAC.

Asymmetric Keys

Encryption

1
2
3
data -------------> code =========>  code -------------> data  (owner side)
public key private key
encryption decryption

Usually(but not necessarily), the keys are interchangeable, in the sense that if key A encrypts a message, then B can decrypt it, and if key B encrypts a message, then key A can decrypt it. While common, this property is not essential to asymmetric encryption.

Signature

1
2
3
4
5
6
            data                          |----------> hash value
| (hash) ==========> | compare /|\
| | |
private key encrypt | public key decrypte
\|/ | (hash) |
data + encrypted hash data + encrypted hash

Signing ensures the data is sent by the owner of private key and not has been modified inbetween.

What is the difference of digest and signature? https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.2.0/com.ibm.mq.sec.doc/q009810_.htm A message digest is a fixed size numeric representation of the contents of a message, computed by a hash function. A message digest can be encrypted by the sender’s private key, forming a digital signature.

More readings: How does a public key verify a signature?

PKI

public key infrastructure is a set of roles, policies, hardware, software and procedures needed to create, manage, distribute, use, store and revoke digital certificates and manage public-key encryption. The purpose of a PKI is to facilitate the secure electronic transfer of information for a range of network activities such as e-commerce, internet banking and confidential email.

Certificate

A file with some contents:

  1. certificate owner
  2. certificate issuer
  3. signature (RSA created, made by issuer)
  4. public key (from owner, we then use this public key to HTTPS)

Self-signed certificate: issued and signed by the owner. The basic rule is we trust the CA (the issuer) so on the certificate owner.

Why we need intermediary CAs?

There are not so much public root CAs because of problem of trust. Actually anybody can create own root CA but nobody will trust it. That’s why there is limited set of global root CAs that are trusted worldwide by operating systems and browsers. You can view list of such global CAs with their root certificates in any browser or OS.

Such root CAs have certificates with long period of validity and their main responsibility is simple create “source of trust”. That’s why they don’t issue certificates to end users to avoid additional work and minimize risk that their private keys will be compromised. Intermediate CAs certificates don’t necessarily need to be in the list of trusted certificates in the OS or browser. They need simply be issued by trusted root CA.

Chain of Trust

Let’s see openssl command, generate RSA private key and public key: 这里没有谈到self-signed certificate,见后面

1
2
3
4
5
6
7
8
9
10
11
## check help for sub-command genrsa
openssl genrsa -h

## generate private.pem file private key with aes256 encryption method
## will ask you input pass phrase
## 实际上private.pem 包含了public key 的信息了!
openssl genrsa -aes256 -out private.pem

## generate public key from above private key
## will ask you pass phrase from private key
openssl rsa -in private.pem -outform PEM -pubout -out public.pem

Root CAs in OS

How does web browser trust the root CAs and certificates? The OS ships a list of trusted certificates, in Mac search Keychain Access.

In Linux, see this link: On Red Hat/Centos, It includes all trusted certificate authorities under /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt. Just add your new certificate authority file(s) to the directory /etc/pki/ca-trust/source/anchors, then run /bin/update-ca-trust to update the certificate authority file.

Verify chain of trust

1
2
3
4
5
6
7
           root CA         |     intermediate CA       |        end user
----------------------------------------------------------------------------------------
self-singed certificate | signed by root CA | signed by intermediate CA
----------------------------------------------------------------------------------------
signature: encrpted | signature: encrpted | signature: encrpted
by private key of root CA | by private key of root CA | by private key of intermediate
| | CA

CSR: certificate signing request (the root CA receive CSR from intermediate CA, the signature of intermediate CA is signed by root CA, the root CA also provide issuer info for intermediate CA. Similarly, end user is signed by intermediate CA)

Web server (the end user) sends you its certificate and all intermediate certificates. Then on your side start the verification process from end user certificates back to top intermediate certificate then root certificate.

这里如何验证的: To verify a certificate, a browser will obtain a sequence of certificates, each one having signed the next certificate in the sequence, connecting the signing CA’s root to the server’s certificate.

There is a online tool to check the certificates chain: https://www.geocerts.com/ssl-checker.

Create Self-signed Certificate

Openssl Essential, several ways to generate certificates

Before creating certificate, you need CSR, and before CSR, you need first generate asymmetric keys. (Becuase certificate needs to include signature from upstream and also your public key)

Choose common name (CN) according to the main domain where certificate will be used. (for example, in secure docker registry, CN is the registry address), actually CN is deprecated and Alternative Name(SAN) is used instead.

What is /etc/ssl/certs directory? Actually this is a softlink to /etc/pki/tls/certs.

Generate self-signed certificate, see this post

1
2
3
4
5
# this is CN certificate, if you want to have SAN(Subject Alternative Name), see below
openssl req \
-newkey rsa:4096 -nodes -x509 -sha256\
-keyout key.pem -out cert.pem -days 365 \
-subj "/C=US/ST=CA/L=San Jose/O=Company Name/OU=Org/CN=<domain>"
  • -nodes: short for No DES, if you don’t want to protect your private key with a passphrase.
  • Add -subj '/CN=localhost' to suppress questions about the contents of the certificate (replace localhost with your desired domain).
  • For anyone else using this in automation, here’s all of the common parameters for the subject: -subj "/C=US/ST=CA/L=San Jose/O=Company Name/OU=Org/CN=<domain>"
  • Remember to use -sha256 to generate SHA-256-based certificate.

注意有时遇到的cert 文件中可能包含多个CERTIFICATE 块,其中有intermediate CA. 在kubernetes 中构造tls secret的时候,直接使用即可。

To generate SAN certificate, see this post. 主要是注意如何构造 san.cnf 文件给 openssl 使用. SAN主要是为了一证多用,注意这个和wildcard tls certificate不一样, wildcard tls certificate 主要是为了包容subdomains.

SSL/TLS and HTTPS

Both are cryptographic protocols used in HTTPS:

  • SSL: Secure Socket Layer
  • TLS: Transport Layer Security.

Difference between SSL vs TLS: TLS is an update and secure version of SSL. https://www.globalsign.com/en/blog/ssl-vs-tls-difference/ It’s important to note that certificates are not dependent on protocols. Sometimes you hear SSL/TLS certificate, it may be more accurate to call them Certificates for use with SSL and TLS, since the protocols are determined by your server configuration, not the certificates themselves.

Go to ssllab, you can check which version of TLS the web server use, input the web server address and scan, then click IP icon.

Why RSA is not used in data encryption?

  1. too slow.
  2. bi-directional data encryption requires RSA key pairs on both sides. We encrypt data use symmetic key after setup secure connection.

Why need rsa key pairs on both sides? Because the key is interchangeable, everyone has public key can decrypt data encrypted by private key!

Establish TLS Session

You can see verbose from curl command

1
2
3
# -I: fetch header only
# -v: verbose
curl -Iv https://www.google.com
  1. establish tcp session
  2. establish tls session (negotiate protocol)
  3. web server sends its certifiate (intermediate and others) to browser
  4. browser generate symmetic key secured by public key from server and send to server. Or use Diffie–Hellman key exchange.

Let’s see wireshark for wikipedia connection: The top 3 are TCP handshakes, then you see TLS client hello, then TLS server hello.

In client hello, there are lots of information to negoitate with server, here you see some supported version of TLS, also cipher suites.

In server hello, you see the the server has selected one of cipher suites. The TLS_ECDHE.._SHA256 means it uses Diffie–Hellman key exchange and sha256 as hash.

Other Readings

这个网站上的资源可以好好看看: https://www.cloudflare.com/learning/

–>> How Does SSL Work? The main use case for SSL/TLS is securing communications between a client and a server, but it can also secure email, VoIP, and other communications over unsecured networks.

–>> TLS handshakes During a TLS handshake, the two communicating sides exchange messages to acknowledge each other, verify each other, establish the encryption algorithms they will use, and agree on session keys.

SSL handshakes are now called TLS handshakes, although the “SSL” name is still in wide use.

A TLS handshake also happens whenever any other communications use HTTPS, including API calls and DNS over HTTPS queries.

这里讲到了2种不同的TLS handshake过程,一个是public-private key参与,另一个是Diffie–Hellman handshake. 对于第一种方式,提到了client从server cerficiate extract public key: https://stackoverflow.com/questions/17143606/how-to-save-public-key-from-a-certificate-in-pem-format

–>> How does proxy handle TLS handshakes (还记得envoy吗?) HTTPS knows how to tunnel the TLS handshake even through the proxy. 也就说TLS/SSL through proxy就是通过HTTP CONNECT tunnel去实现的, 特别是看一下第二个回答, comments:

So, the proxy is not MITM’ing the HTTPS connection, by replacing the server’s certificate with its own - it’s simply passing the HTTPS connection straight through between the client and the server. Is that right?

Normally, when HTTPS is done through a proxy, this is done with the CONNECT mechanism: the client talks to the proxy and asks it to provide a bidirectional tunnel for bytes with the target system. In that case, the certificate that the client sees is really from the server, not from the proxy. In that situation, the proxy is kept on the outside of the SSL/TLS session – it can see that some SSL/TLS is taking place, but it has no access to the encryption keys.

Diffie–Hellman

Diffie–Hellman uses one-way function, for example, mod operation. See, here a, b are priviate keys on both side, g, p are public key. A, B are mod result, K is the final result that both side can get use to encrypt the data.

Elliptic-curve cryptography is used in Diffie–Hellman.

Custom Domain

Purchase custom domain and use free hosting to setup our website.

//TODO [ ] https://www.youtube.com/watch?v=kQYQ_3ayz8w&list=PLvadQtO-ihXt5k8XME2iv0cKpKhcYqe7i&index=5

常用的关于networking 检查的commands: ss, lsof, netstat, ifconfig, hostname, ip, route, iptables, nc, ping, arp, curl, wget, host, nslookup, dig.

这篇总结主要是来自PluralSight上的LPIC-1课程的Network chapter,以及LFCE Advanced Networking training. 后来加入了一些iptables的内容, from Youtube. Environment: CentOS 7 Enterprise Linux or RedHat.

Frequently Asked Question: What is going on when you hit URL in browser?

About domain name: www.microsoft.com.:

  • root domain: .
  • top-level domain: com
  • second-level domain: microsoft
  • third-level domain: www

以上是最基本的流程,如果使用了HTTPS,还可以描述一下TLS handshakes的过程, 再比如中间有proxy则会Tunnel,有load balancer则可能有TLS termination等等。

Ip vs Ifconfig

ifconfig is obsolete, use ip instead. 我专门有一篇写的ip command.

ipv4: 32 bits long, dotted decimal ipv6: 128 bits long, quad hex

Hostname

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# show full hostname
hostname -f

# node hostname
uname -n

# query and change the system hostname and related settings
hostnamectl

Static hostname: halos1.fyre.xxx.com
Icon name: computer-vm
Chassis: vm
Machine ID: f7bbe4af93974cbfa5c55b68c011d41c
Boot ID: 4e30e7107fa441a9b3ad70d0b784782d
Virtualization: kvm
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
Kernel: Linux 3.10.0-957.10.1.el7.x86_64
Architecture: x86-64

# show domain name
# The chances are unless we have a web server running on our computer, we will not have any dns domain
# name. By default, there is no web server running on a system and hence there is no result when we
# type “dnsdomainname” on the terminal and hit enter.
dnsdomainname
1
2
3
4
5
6
7
8
9
10
11
12
# this will not be persistent
# the static hostname is still unchanged but transient hostname is xxx.example.com
# you can see transient name by hostnamectl
hostname xxx.examplel.com

# this will be persistent in
# /etc/hostname
hostnamectl set-hostname xxx.example.com

# set pretty hostname which includes '
# /etc/machine-info
hostnamectl set-hostname "xxx'ok.example.com"

Notice that the order we add in /etc/hosts file is important! 把fully qualified hostname放第一个,然后aliases,否则在一些场景会出问题!

1
2
# /etc/hosts
<ip> <fully qualified domain name: FQDN> <aliases>

除了local hosts file, 来看看DNS设置, 我有一篇blog讲到了这个。 dig command (DNS lookup utility),用来check response and checking hostname from DNS server.

1
2
3
4
5
# use default dns server
# -t A: type A record
dig www.pluralsight.com -t A
# use specified dns server, for example, google dns server 8.8.8.8
dig www.pluralsight.com @8.8.8.8 -t A

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<<>> DiG 9.9.4-RedHat-9.9.4-61.el7_5.1 <<>> www.pluralsight.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14726
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.pluralsight.com. IN A

# 59,186,186 is TTL (second, keep changing)
;; ANSWER SECTION:
www.pluralsight.com. 59 IN CNAME www.pluralsight.com.cdn.cloudflare.net.
www.pluralsight.com.cdn.cloudflare.net. 186 IN A 104.19.162.127
www.pluralsight.com.cdn.cloudflare.net. 186 IN A 104.19.161.127

# server now is 8.8.8.8
;; Query time: 60 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Apr 12 13:03:48 PDT 2020
;; MSG SIZE rcvd: 132

Add short format +short to return the IP address only:

1
2
# only show resolved output
dig +short www.pluralsight.com @8.8.8.8

How to check dns record TTL: You can set TTL for the DNS record that defines how long a resolver supposed to cache the DNS query before the query expires. TTL typically used to reduce the load on your authoritative name servers and to speed up DNS queries for clients.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# A is type, check loacl dns resolver
dig A google.com

# other type
# AAAA: ipv6
dig AAAA google.com
# canonical name
dig cname google.com

# get authoritative dns server
# NS: name server
dig +short NS google.com
# check by authoritative dns server
dig A google.com @ns1.google.com.

# onyl show ttl
dig +nocmd +noall +answer +ttlid A google.com
# human-readable
dig +nocmd +noall +answer +ttlunits A google.com

Network services

04/12/2020 目前我只是查看配置,没有去设置过。

Display and set IP address

1
2
3
4
ip -4 addr
ip addr show eth0
# not persist
ip addr add 192.168.1.50/24 dev eth0

没太明白这些配置的具体用法。 Network Manager tool, 这个tool也不是万能的,有的地方不适用, can be used to set persistent change so we will not lost it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# check status
systemctl status NetworkManager
# if not active, start it
systemctl start NetworkManager

# nmcli command
# command-line tool for controlling NetworkManager
# show all connections
nmcli connection show
# pretty format
nmcli -p connection show eth0

# terminal graph interface
nmtui
# then edit a connection, select network interface
# config ipv4 ip address/gateway.
systemctl restart network

Traditional network service, more flexible and common.

1
systemctl status network

The network configuration is read from scripts under /etc/sysconfig/network-scripts/.

1
ifcfg-eth0  ifcfg-eth1  ifcfg-lo ...

这些文件里面都写好了配置,more details see this link: https://www.computernetworkingnotes.com/rhce-study-guide/network-configuration-files-in-linux-explained.html

1
2
3
4
5
6
TYPE=Ethernet
BOOTPROTO=dhcp
NAME=eth0
DEVICE=eth0
ONBOOT=yes
...

After editing the ifcfg-xx file, bring down and up that interface:

1
2
ifdown eth0
ifup eth0

Routing

[ ] IP tables vs routing tables 有啥区别,使用场景? see this question and diagram in comment.

Display routing tables 路由表

1
2
3
4
5
6
7
# see below
ip r
# route and netstat 每个column的意思更清楚一些
# -n: displays the results as IP addresses only and does not attempt to perform a DNS lookup
netstat -rn
# -e: display as netstat format
route -n [-ee]

Explain host routing table (因为这不是一个router), the column name explaination can see man route,比如Flags字母的含义。 The order in the routing table does not matter, the longer prefix always takes priority.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 简而言之,路由表就是找,到哪里,出口在哪以及下一跳是谁

# Destination 表示destination `network name` or `host name`
# 是用来和要出去的packet destination IP 和 (Genmask)mask 作用之后得到的结果对比的
# 如果match了,则通过Iface(interface)送出去
# 如果和mask作用后有多个match, 则去match最长的那个destination

# 0.0.0.0在Destination中表示默认网关,network mask也是0.0.0.0, 任何一个IP和0.0.0.0 与操作,最后
# 就是0.0.0.0了,所以没有match的IP都去了default gateway了

# Gateway: gateway address, 比如192.168.0.1,如果是0.0.0.0,表示unspecified 或者`没有`, 有时
# 用 * 表示没有.
# this assumes that the network is locally connected, as there is no intermediate hop.
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 100 0 0 ens4
# 注意这里是个host IP了,不是network name
192.168.0.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens4
192.168.9.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0

对比一下ip r command, 显示不太一样:

1
2
3
4
5
6
7
8
# proto [type]: routing protocol identifier of this route
# scope link: 表示在设备的网段内通过此链接允许通信

# default gateway
default via 192.168.0.1 dev ens4 proto dhcp metric 100
192.168.0.1 dev ens4 proto dhcp scope link metric 100
# docker0
192.168.9.0/24 dev docker0 proto kernel scope link src 192.168.9.1

Adding routes, 把所有的找不到routing的traffic全部转到192.168.56.104上去,通过eth0, 比如当前的machine无法访问外网,而192.168.56.104却可以, 但之后192.168.56.104也需要配置成router。

1
2
3
# this command is not persistent
# default can be formatted as 192.168.1.0/24
ip route add default via 192.168.56.104 dev eth0

如果需要make it persist, need to edit /etc/sysconfig/network-scripts/ corresponding file eth0, 或者自己添加script,然后重启network systemctl restart network.

Configuring a linux system as router:

1
2
3
4
5
6
# now let's configure machine 192.168.56.104 as a router
vim /etc/sysctl.conf
# add this line to enable ipv4 forward
net.ipv4.ip_forward=1
# reload
sysctl -p

当时在做项目的时候需要去DataStage Ops Console查看performance, 但Openshift worker node外界无法直接访问,只能通过infra node的routing才行,于是先用nodePort expose service, 再设置infra node到对应worker node port的映射,最后对外用MASQUERADE。

1
2
3
4
5
6
# this is operating on nat iptables
# run in infra node
# DNAT: destination nat
iptables -t nat -A PREROUTING -p tcp --dport 32160 -j DNAT --to-destination <worker private IP>:32160
iptables -t nat -A POSTROUTING -j MASQUERADE
iptables -t nat -nvL

Allowing access to the internet via NAT, so traffic can get back to private network.

注意routing这部分还没有涉及到firewall, firewall is inactive

1
2
3
4
5
6
7
# -t nat: working on nat table
# -A POSTROUTING: appending to post routing chain
# -o eth0: outbound via eth0, eth0 connects to internet
# -j MASQUERADE: jump to MASQUERADE rule

# not persistent, see iptables section below
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

then if you check iptables -t nat -nvL will see the postrouting rule with new line added.

Firewall

其实很多linux是靠iptables去实现firewall的功能的,见下一节,firewalld service背后改动的也是iptables.

Implement packet filtering (iptables and firewalld both can do this) firewall zone: represent a concept to manage incoming traffic more transparently. The zones are connected to networking interfaces or assigned a range of source addresses. You manage firewall rules for each zone independently.

配置命令类似于kubectl/oc的形式。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
systemctl start firewalld

# show default zone
firewall-cmd --get-default-zone
# show active zones, will see interfaces apply to it
firewall-cmd --get-active-zones
# show available zones
firewall-cmd --get-zones

# permanently remove interface eth0 from public zone
firewall-cmd --permanent --zone=public --remove-interface=eth0
# permanently add eth0 to external zone
firewall-cmd --permanent --zone=external --add-interface=eth0
# permanently add eth1 to internal zone
firewall-cmd --permanent --zone=internal --add-interface=eth1

# change default zone
firewall-cmd --set-default-zone=external
# after updating, restart to take effect
systemctl restart firewalld

后面主要讲了firewall的配置,可以对不同的zone添加或删除services, ports等,service的默认配置文件在/usr/lib/firewalld/services目录,但是自己创建的service文件在/etc/firewalld/services/

Iptables

用iptables也可以实现firewall的功能via filter table.

There are currently five independent tables:

  • filter: This is the default table (if no -t option is passed),It contains the built-in chains INPUT (for packets destined to local sockets),FORWARD (for packets being routed through the box), and OUTPUT (for locally-generated packets).
  • nat: This table is consulted when a packet that creates a new connection is encountered. It consists of three built-ins: PREROUTING (forltering packets as soon as they come in), OUTPUT (for altering locally-generated packets before routing), and POSTROUTING (foraltering packets as they are about to go out). IPv6 NAT support is available since kernel 3.7.
  • mangle: This table is used for specialized packet alteration.
  • raw: This table is used mainly for configuring exemptions from connection tracking in combination with the NOTRACK target.
  • security: This table is used for Mandatory Access Control (MAC) networking rules
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# list 3 basic chain in filter table: INPUT, FORWARD, OUTPUT
# INPUT: traffic comes in firewall
# FORWARD: traffic pass through firewall
# OUTPUT: traffic leaving firewall
iptables [-t filter] -L

# policy ACCEPT: default policy is ACCEPT if no specific rules
# other policies: DROP, REJECT(will send ICMP rejecter to sender)
# by default, most system won't have any rules
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Change default policies。 注意, 可以自己添加rules去加功能,但不要轻易去更改default policy ACCEPT。否则出了意外都不能连接上了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# set default policy to DROP
# accept any traffic for INPUT and OUTPUT

# rules 类似于switch中的case,从上到下match,顺序很重要!
# -A: append
iptables -A INPUT -j ACCEPT
iptables -A OUTPUT -j ACCEPT
# 这里设置为DROP是因为上面新加了ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP

# accept any loopback traffic
# loopback traffic never leaves machine
# -i: in-interface
# -o: out-interface
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# -v: verbose,
# -n: numberic data
# --line-numbers: show rules index
iptables -nvL --line-numbers

# keep current traffic, for example, current ssh connection
iptables -A INPUT -j ACCEPT -m conntrack --ctstate ESTABLISHED,RELATED
iptables -A OUTPUT -j ACCEPT -m conntrack --ctstate ESTABLISHED,RELATED

# remove rule by index from --line-numbers
# -D: delete rule
# 这里就把之前ACCEPT去掉了,但链接并不会断开,因为有conntrack with established
iptables -D INPUT 1
iptables -D OUTPUT 1

# 目前为止,没有新的流量可以进来或出去
# add filter rules to iptables firewall for inbound and outbound traffic
# others can ping me
iptables -A INPUT -j ACCEPT -p icmp --icmp-type 8
# I can ping others
iptables -A OUTPUT -j ACCEPT -p icmp --icmp-type 8
# others can ssh in
# add comment
iptables -A INPUT -j ACCEPT -p tcp --dport 22 -m comment --comment "allow ssh from all"

# I can access others
# 当时这里理解有点问题,为什么不需要INPUT 80 port呢?
iptables -A OUTPUT -j ACCEPT -p tcp --dport 80
iptables -A OUTPUT -j ACCEPT -p tcp --dport 443
# DNS
iptables -A OUTPUT -j ACCEPT -p tcp --dport 53
iptables -A OUTPUT -j ACCEPT -p udp --dport 53
# NTP
iptables -A OUTPUT -j ACCEPT -p tcp --dport 123
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# save current config
# can edit in this output file
iptables-save > orgset
iptables-restore < orgset

# drop if not match
# 这个放最后,否则一来就drop了,但如果设置了default DROP则不需要了
iptables -A INPUT -j DROP
# not acting as a router
iptables -A FORWARD -j DROP

# -I: insert
# 把这个rule加到INPUT chain的第一行
iptables -I INPUT 1 -p tcp --dport 80 -j ACCEPT

# clear rules in all chains
iptables -F [chain name]

来看看iptables service的使用,变成systemctl service的形式了,使用上更正规一些。

1
yum install -y iptables-services

/etc/sysconfig目录下,有iptables and iptables-config files, If set these two values as yes, then iptables will save the config automatically in iptables file, easy to maintain.

1
2
3
4
5
6
7
8
9
10
11
# Save current firewall rules on stop.
# Value: yes|no, default: no
# Saves all firewall rules to /etc/sysconfig/iptables if firewall gets stopped
# (e.g. on system shutdown).
IPTABLES_SAVE_ON_STOP="yes"

# Save current firewall rules on restart.
# Value: yes|no, default: no
# Saves all firewall rules to /etc/sysconfig/iptables if firewall gets
# restarted.
IPTABLES_SAVE_ON_RESTART="yes"

Monitoring Network

Measure network performance, bottleneck

1
2
# 可以查看途径的IP,比如VPN看路径是不是正确的
tracepath www.google.com

traceroute vs tracepath: https://askubuntu.com/questions/114264/what-are-the-significant-differences-between-tracepath-and-traceroute some option of traceroute need root privilege, and has more features then tracepath.

Display network status

1
2
3
# 显示有多少error, drop packets来看是不是网络有问题
ip -s -h link
ip -s -h link show eth0

netstat command can also do the same thing.

1
2
3
4
5
6
7
netstat -i

Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 16008695 0 5 0 8446165 0 0 0 BMRU
eth1 1500 461914 0 12 0 35082 0 0 0 BMRU
lo 65536 277761 0 0 0 277761 0 0 0 LRU

还介绍了一下sysstat command,需要yum安装,安装之后它会收集每日的系统历史数据供查看。这也是一个很重要的系统监控工具。 还有一个command nmap, 用来scan ports:

1
2
3
4
5
yum install -y nmap
# check what ports in your system is opening
nmap scanme.nmap.org
# list interface and routes information
nmap -iflist

Can use ss command (similar to netstat) to show listening tcp ports:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# show listening ipv4 tcp sockets in numeric format
ss -ltn -4

# *:* means listening from any address and any port
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 64 *:2049 *:*
LISTEN 0 128 *:36168 *:*
LISTEN 0 128 *:111 *:*

# list current active connections
ss -t

# 9.30.166.179:ssh is my Mac IP, it ssh to current host
# 这里State is ESTAB, 如果握手没回应,则会显示SYN-SENT
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 128 9.30.166.179:ssh 9.160.91.147:62991
ESTAB 0 0 9.30.166.179:54556 54.183.140.32:https

Network Basic

这里主要是通过做实验,把基本概念过了一遍。用Vitual Box 设置实验环境,在虚拟机中安装使用wireshark, tcpdump很清晰,没有其他干扰信息。设置实验环境时,可以有1主2从,主机可以访问外界(Adapter1 设置NAT, Adapter2/3 设置Internal Network),从机可以访问主机,间接实现外部访问(各自的Adapter1 设置Internal Network连接主机的Internal Network). 然后可以进行各种ip, route, iptables的实验了。

Network topology: LAN, WAN (bus, star, ring, full mesh) Network devices: adapter, switch, router, firewall OSI model

subnetting: a logically grouped collection of devices on the same network subnet mask: network portion / host portion special address: network address (all 0 in host portion) broadcast (all 1 in host portion) loopback 127.0.0.1 classful subnet: class A/B/C, they are inefficient

VLSM: variable length subet mask, for example x.x.x/25 NAT: one to one, many to one map ARP: address resolution protocol (IP -> MAC), broadcast on bus to see who has MAC for a particular IP DNS: map hostname to IP, UDP protocol

IP packet: can be fragmented and reassembled by router and host. fragments其实很影响throughput,因为每个IP packet都有header。还要注意有的IP加密 (VPN)会额外增加IP packet的长度,造成fragments. TTL: time to live in IP header, this is how traceroute works

Routing Table: static: path defined by admin dynamic: path programmatically defined, routing protocol software Quagga on Linux

TCP: connection oriented: three way handshake connection establishment/termination data transfer ports: system can have more than one IP, ports are only unique per IP well know port: 0-1024 flow control: maintained by receiver congestion control: the sender slow down error detection and retransmission

UDP: send it and forget it DNS (dig, host commands) VoIP

  1. setup http service on server host
1
2
3
4
5
6
7
8
yum install -y httpd
# if firewall is on
firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --reload
# set page content
echo "hello world" > /var/www/html/index.html
systemctl enable httpd
systemctl start httpd
  1. get the web page from other host
1
wget http://<ip or hostname>/index.html
  1. install tcpdump wireshark on other host
1
2
3
yum install -y tcpdump wireshark wireshark-gnome
# if you have desktop in linux, start wireshark
wireshark &

Check the arp cache

1
2
3
4
5
# '?' means stale
arp -a
ip neighbor
# delete arp cache
arp -d 192.168.1.1

specify size of the data and ping total number:

1
2
3
4
5
6
# -c 1: ping once
# -s 1472: 1472 bytes long (this is not total length of IP, it will append header)
# so maybe exceed 1500 MTU and then packet will be fragmented
ping -c 1 -s 1472 192.168.1.1
# -t set TTL
ping -c 2 -t 5 192.168.0.1

Create a large file to transfer:

1
2
3
4
5
6
7
# fast allocate file
# -l5G: length of file is 5G
fallocate -l5G test.bin
# then using scp to copy from network
scp ...
# you can check wireshark to see the tcp window scaling graph
# will see slow start and speed up

Traffic control setting 用来模拟网络不好的情况, 如用scp在传输文件,设置tc bad performance,然后恢复,会发现transmission rate提高了。可以查看wireshark window scaling graph 和 IO graph. Linux 下 TC 命令原理及详解

1
2
3
tc qdisc add dev eth1 root netem delay 3000m loss 5%
# remove the above policy
tc qdisc del dev eth1 root

let’s see the statistic: After performance recover, TCP congestion window size enlarge quickly:

This is IO graph, shows TCP window size and update points:

Network Troubleshooting

Network is not reachable. For example, cannot ping through.

1
2
3
4
5
6
7
8
9
10
# check subnet and gateway, then
ip route
# check interface, state DOWN? NO-CARRIER? then
ip addr
# check MAC mapping in layer 2, then
arp -a
# layer1 is ok? link detected no?
# 注意虚拟机是没有这个统计的!真实网卡才有,之前遇到过这个情景了
# port speed 也可以查看
ethtool eth0

No route to host,比如在scp的时候,这时去host server上看一下port是不是打开的

1
ss -lnt4

wireshark看一下client端的情况,发现可能是firewall issue! 端口被屏蔽了。

This blog is for system design, please revisit frequently to refresh. The notes are mainly from https://www.educative.io/ and Youtube channel.

有的系统设计主要是各功能部件合理组合:

  1. Design Instagarm
  2. Design Dropbox
  3. Design Twitter post tweets(photos, videos), follow others, favorite tweets generate timeline of top tweets low latancy highly available consistency can take a hit

storage: text + photo + video ingress (write): new generated storage / sec egress (read): read volume / sec

read heavy system data sharding: user id -> tweet id -> (creation time + tweet id, sort by time) query all servers and aggregate

cache for hot users and tweets

  1. Designing Twitter Search

  2. Designing a Web Crawler (BFS, modular, url frontier, DNS, fetcher, DIS, content filter, extractor, url filter)

  3. Designing Facebook Messenger each chat server serves a bunch of users, LB maps user to it’s chat server, chat server commuicate with each other to send/receive message message handling: long polling to receive message hashtable keep track of online user, if offline, notify delivery failure to sender handle message order: 单独靠timestamp不行,use sequence number with every message for each user

database: support high frequence write/read row, quick small updates, range based search: HBase, column-oriented key-value NoSQL database partition by UserID, low latency

有的主要涉及到了数据结构和算法:

  1. Typeahead suggestion (trie, reference)

  2. API rate limiter (dynamic sliding window)

  3. Designing Facebook’s Newsfeed (offline feed generation) contain updates, posts, video, photos from all people user follows user average has 200 followers, 300M DAU, fetch 5 times a day, 1KB each post, so can get traffic. cache each users’ news feed in mem for quick fetch. feed generation: retrieve, rank, store offline generate by dedicated servers, Map<UserID, LikedHashMap/TreeMap<PostID, PostItem>> + LastGenerateTime in memory, LRU cache for user or find user’s activity pattern to help generate newsfeed feed publishing: push to notify, pull for serving

  4. Designing Yelp (querying, objects don’t change often, QuadTree) 解释一下我的理解,这里partition讲的是partition quadtree. 从DB中读location id, 通过hashing map to different quadtree server (这个mapping其实就是quadtree index,可以在quadtree server fail后用来重新构造它的数据),然后各自构造自己的quadtree.这些quadtree servers有一个aggregator server(它有自己的copies)。于是每次request要去所有quadtree server查询,然后聚合返回的数据。对于每个quadtree server,它所包含的location id也有一个本地的mapping, to know which DB servers contains this locatio id info. 这个mapping也使用的hashing实现。

  5. Designing Uber backend (requirements, objects do change often, QuadTree)

  6. Design Ticketmaster (first come first serve, highly concurrent, financial transactions ACID)

CAP Theorem

CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance.

When we design a distributed system, trading off among CAP is almost the first thing we want to consider.

Thinking process

  1. requirements clarification
  2. back of the envelope estimation: scale, storate, bandwidth.
  3. system interface definition
  4. defining data model
  5. high level design
  6. detailed design
  7. identifying and resolving bottlenecks

Crucial Components

这里的笔记主要根据以下几点展开:

  1. Database (book: 7 weeks 7 databases)
  2. Cache system (redis, memcache)
  3. Message queue (kafka && zookeeper or others)
  4. Load balancer (nginx, Round Robin approach)
  5. Log systems
  6. monitor system
  7. My domain of knowledge k8s, docker, micro-services

Key Characteristics of Distributed Systems

Scalability: scaling without performance loss (but actually will). Reliability: keep delivering services when some components fail. Availability: reliable means available, but not vice versa Efficiency: latency and (throughput)bandwidth. Manageability: ease of diagnosing and understanding problems when they occur.

常用技术知识

备份的说法:

Standby replicas Failover to other healthy copies Duplicates Backup (spare) Redundancy (redundant secondary copy)

NoSQL Database:

An Introduction To NoSQL Databases Big Data: social network, search engine, traditional methods of processing and storage are inadequate.

  1. Key-value stores: Redis, Dynamo (redis can also be cache)
  2. Document database: MongoDB, Couchbase
  3. Wide-column database: Cassandra, HBase
  4. Graph database: Neo4J

Advantage of NOSQL database: no data models(no pre-defind schema), unstructed , easy to scale up and down (horizontal data sharding), high performance with big data.

Advantage of SQL database: relational data, normalization (eliminate redundancy), SQL, data integrity, ACID compliance.

Consistent Hashing (with virtual replicas)

https://www.youtube.com/watch?v=ffE1mQWxyKM Using hash mod strategy is not efficient, think about that add a new server, then original 20 % 3 = 2 now is 20 % 4 = 0. We have to re-organize all the existing mappings.

https://www.youtube.com/watch?v=zaRkONvyGr8 Consistent hashing can be used in many situations, like distributed cache, load balancing, database, etc.

For example, we have n servers. Hash the request and get the location of it in the ring, find the server with hash value equal or larger than it and send this request to that server (clockwise move). But server may not distributed in ring evenly or the requests is not uniformly (thus server load factor is not 1/n), so we can use virtual replicas, this can implement by other hash function.

With contsistent hashing, add or remove servers will not cause much overhead. The new added server will grab objects from its near servers and removed server, all original objects will move to next server after the removed one.

Long Polling (轮询)

https://www.jianshu.com/p/d3f66b1eb748?from=timeline&isappinstalled=0 和一般的polling都属于pull(拉模式)。

题外话: push模式其实也是建立了一个持久的connection,但server一旦有新的信息就会push给client,而不会去在乎client的处理能力,这是一个缺点, long polling对于client要更灵活一些(因为client会request first)。

This is a variation of the traditional polling technique that allows the server to push information to a client whenever the data is available. With Long-Polling, the client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately (keep the connection connected). That’s why this technique is sometimes referred to as a Hanging GET.

Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed due to timeouts or receive the disconnect from server.

如果client突然unavailable了,如何检测呢?这个connection是如何保持的?我猜想的是connection保持期间,并不需要额外的sync查看server client是否健在(我记得TCP有一个机制会检测这个connection是否健康?)。如果server 发送了message未收到acknowledge则说明client不在了,则connection中断。

Data Sharding

https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6 Horizontal partitioning is also called as Data Sharding

Web Server vs Application Server

https://stackoverflow.com/questions/936197/what-is-the-difference-between-application-server-and-web-server

proxy server

https://www.educative.io/courses/grokking-the-system-design-interview/N8G9MvM4OR2 A proxy server is an intermediate server between the client and the back-end server.

Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compressing a resource). Another advantage of a proxy server is that its cache can serve a lot of requests.

  1. open (forwarding) proxy: hide clients
  2. reverse proxy: hide servers

Map Reduce

We can have a Map-Reduce (MR) set-up These MR jobs will calculate frequencies of all searched terms in the past hour.

Exponential Moving Average (EMA)

In EMA, we give more weight to the latest data. It’s also known as the exponentially weighted moving average.

Some Design Bottlenecks

  1. data compression 需要吗, 如何选择?

  2. capacity estimation: metadata + content 两方面都要考虑,high level estimations 主要包括: storage for each day, storage for years, incoming bandwidth, outgoing bandwidth. 这些主要来自于: Total user, Daily active user (DAU), size of each request, how many entries each user produce, data growth, 有时对某个量单独估计比较好。

  3. read heavy or wirte heavy? bandwidth, ingress: 每日新增数据总量/秒; egress: 用户浏览或下载总量/秒.

  4. database需要有哪些符合场景的特点? 比如quick small updates, ACID, range based search, etc.

  5. how about consider the peak time read and wirte throughput.

  6. hot user in database handle, 怎么设计database去减轻这个问题.

  7. we may need aggregator server for fetching and process data from different DB or caches.

  8. monitoring system, collect metrics: daily peak, latency. we will realize if we need more replication, load balancing, or caching.

  9. load balancer can sit: between client and web server, web server and application server (or cahce), application server and database. load balancer can be single point of failure, need redundancy to take over when main is down.

  10. load balancer: Round Robin approach, or more intelligent.

  11. cache policy, LRU, 80-20 rule.

Other System Design Videos:

Introduce to System Design

Introduce to System Design 同样推荐了这本书<<Designing Data Intensive Applications>>, 会对这些topics有更深入的讲解。

  1. ask good question: which features care about, which not? how much to scale (data, request, latency)
  2. don’t use buzzword (be clear about the tech you use)
  3. clear and organized thinking
  4. drive discussion (80% I talk)

Things to consider:

  1. Features
  2. API
  3. Availability
  4. Latency
  5. Scalability
  6. Durability
  7. Class Diagram
  8. Security and Privacy
  9. Cost-effective

Concepts to know:

  1. Vertical vs horizontal scaling
  2. CAP theorem
  3. ACID vs BASE
  4. Partitioning/Sharding
  5. Consistent Hashing
  6. Optimistic vs pessimistic locking
  7. Strong vs eventual consistency
  8. RelationalDB vs NoSQL
  9. Types of NoSQL Key value Wide column Document-based Graph-based
  10. Caching
  11. Data center/racks/hosts
  12. CPU/memory/Hard drives/Network bandwidth
  13. Random vs sequential read/writes to disk
  14. HTTP vs http2 vs WebSocket
  15. TCP/IP model
  16. ipv4 vs ipv6
  17. TCP vs UDP
  18. DNS lookup
  19. Http & TLS
  20. Public key infrastructure and certificate authority(CA)
  21. Symmetric vs asymmetric encryption
  22. Load Balancer
  23. CDNs & Edges
  24. Bloom filters and Count-Min sketch
  25. Paxos
  26. Leader election
  27. Design patterns and Object-oriented design
  28. Virtual machines and containers
  29. Pub-sub architecture
  30. MapReduce
  31. Multithreading, locks, synchronization, CAS(compare and set)

Tools:

  1. Cassandra
  2. MongoDB/Couchbase
  3. Mysql
  4. Memcached
  5. Redis
  6. Zookeeper
  7. Kafka
  8. NGINX
  9. HAProxy
  10. Solr, Elastic search
  11. Amazon S3
  12. Docker, Kubernetes, Mesos
  13. Hadoop/Spark and HDFS

Design Spotify| Apple Muisc | Youtube Music

Design Spotify| Apple Muisc | Youtube Music

  1. scope: cover and what else you are not going to cover
  2. key components (具体分析了一下spotify工作的过程,比如存储,传输protocol转换,low latency, CDN)
  3. data model
  4. scaling

if data size is high, consider compress audio data quality, user use different device and network condition distribution CDN

scaling group 比如一些stateless servers可以使用k8s, containers去管理。

Check CPU information on Linux, just like check memory by watching /proc/meminfo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat /proc/cpuinfo

## each processor has a dedicated description
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 61
model name : Intel Core Processor (Broadwell, IBRS)
stepping : 2
microcode : 0x1
cpu MHz : 2199.996
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt spec_ctrl
bogomips : 4399.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual

Count number of processing units

1
cat /proc/cpuinfo | grep processor | wc -l

To get actualy number of cores

1
cat /proc/cpuinfo | grep 'core id'

Note: The number of processors shown by /proc/cpuinfo might not be the actual number of cores on the processor. For example a processor with 2 cores and hyperthreading would be reported as a processor with 4 cores.

About LinkedList operation time complexity:

Adding to either end of a linked list does not require a traversal, as long as you keep a reference to both ends of the list. This is what Java does for its add and addFirst/addLast methods.

Same goes for parameterless remove and removeFirst/removeLast methods - they operate on list ends.

remove(int) and remove(Object) operations, on the other hand, are not O(1). They requires traversal, so their costs are O(n).

In the situation that issue k8s instructions from inside the container, usually we use curl command to do that (if you have kubectl binary in container’s execution path, you can use kubectl command as well).

First you need credentials and api server information:

1
2
3
4
5
## MY_POD_NAMESPACE
NAMESPACE=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)
K8S=https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT
CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)

You can get these all from environment variables, when create the pod, k8s has already injected these information into the containers.

Of course, if the service account you use does not have full privilege, the API access is limited.

then for example, get the detail of current pod:

1
2
3
4
5
6
7
8
9
10
11
12
POD_NAME="$MY_POD_NAME"
NS="$MY_POD_NAMESPACE"
OUT_FILE=$(mktemp /tmp/pod-schedule.XXXX)

## http_code is the return status code
http_code=$(curl -w "%{http_code}" -sS --cacert $CACERT -H "Content-Type: application/json" -H "Accept: application/json, */*" -H "Authorization: Bearer $TOKEN" "$K8S/api/v1/namespaces/$NS/pods/$POD_NAME" -o $OUT_FILE)

if [[ $http_code -ne 200 ]]; then
echo "{\"result\": \"Failure\", \"httpReturnCode\":$http_code}" |${JQ} '.'
exit 1
fi
image=$(cat $OUT_FILE |jq '.spec.containers[] | select(.name=="xxx") | .image')

How do I know the curl path to request?

1
kubectl get pod -v 10

this will show you verbose message (curl under the hood), then you can get the path and use it in your curl command.

Not all kubectl commands are clearly with curl, for example kubectl exec, still need some efforts to know.

references: https://blog.openshift.com/executing-commands-in-pods-using-k8s-api/ https://docs.okd.io/latest/rest_api/api/v1.Pod.html#Post-api-v1-namespaces-namespace-pods-name-exec

0%