Linux Make Big Files

Posted on 2020-12-29 Edited on 2022-02-14 In Storage

最近在学习Linux storage的过程中，研究了一下几个创建大文件的命令，特别是它们的区别. 可以参考这个提问Quickly create a large file on a Linux system.

cd /tmp
# slow
dd if=/dev/zero of=./ddfile bs=1G count=3
# fastest
truncate -s 3G ./trunfile
# fast
fallocate -l 3G ./fallfile

# sync, depends on you needs
# if no sync, files may be still in memory
sync

# ls -ltrh
# in logical view, they are all 3GB
-rw-r--r--. 1 root root 3.0G Dec 30 07:01 ddfile
-rw-r--r--. 1 root root 3.0G Dec 30 07:02 trunfile
-rw-r--r--. 1 root root 3.0G Dec 30 07:02 fallfile

# check physical storage
# truncate space almost does not count 
# because of sparse
df -h .

# time test
# several seconds
time /bin/cp ./ddfile ~
time /bin/cp ./fallfile ~

# near instant
time /bin/cp ./trunfile ~

注意这里sync之后，file cache是仍然存在的，如果要彻底drop file cache, run echo 3 > /proc/sys/vm/drop_caches

所以说，在需要实打实disk allocated的场景中，不要使用truncate, 比如测试network, IO performance. dd最耗时，占用CPU 以及IO最多，fallocate比dd稍微好些，创建大文件比较快，也确实占用了空间。

其实dd也可以创建sparse file, with seek option:

dd if=/dev/zero of=sparse_file bs=1 count=0 seek=3G
# Block is 0
stat sparse_file
Size: 3221225472	Blocks: 0          IO Block: 4096   regular file

dd can be configured write to disk rather as well:

1	dd if=/dev/urandom of=/dev/sdb1 bs=1M count=2048

You can also shrink a file with many empty blocks with fallocate command:

dd if=/dev/zero of=./empty bs=1G count=3
# see the block number before and after change
stat ./empty
Size: 3221225472	Blocks: 6291544    IO Block: 4096   regular file

# deallocate empty block
fallocate -d ./empty

stat ./empty
Size: 3221225472	Blocks: 0          IO Block: 4096   regular file

Python Linter

Posted on 2020-12-29 Edited on 2022-04-20 In Python

最近项目codebase 迁移到了GoB/Gerrit 的体系中，提交代码后CI 会做code linting 操作并且提示错误信息，最好是在提交前在本地先自检，但目前似乎没有集成本地linting 的功能, 经过观察，可以自己搭建linting 的环境去对任意2次commits 之间改动过的py, yaml or other 文件进行语法，格式的检查。

Lint Steps

为了方便，这里使用python virtualenv, 也可以使用docker 环境，mount整个repo 然后处理. work on a python virtual env, for example virtualenv -p python3 venv:

# activate the venv first
# need to be the same yamllint pylint version as the team use
# different version has different output maybe 
pip install yamllint==1.17.0
pip install pylint==2.12.2

# filter and get updated yaml and python file name list
# diff between HEAD~1 HEAD, order matters!
# the between is for endpoint not for range!
YAML_FILES=$(git diff --name-only --diff-filter=ACMR HEAD~1 HEAD | grep -E "(.+\.yaml$|.+\.yml$)" || echo "")
PY_FILES=$(git diff --name-only --diff-filter=ACMR HEAD~1 HEAD | grep -E ".+\.py" || echo "")

# .pylintrc and .yamllint
# should be in the code repo
# -r: no run if input is empty
# xargs default input will be placed at end
echo $PY_FILES | xargs -r pylint -sn --rcfile=.pylintrc >> linter_results_tmp
echo $YAML_FILES | xargs -r yamllint -c .yamllint >> linter_results_tmp

cat linter_results_tmp && rm -f linter_results_tmp

Disable PyLint

You may need to update .pylintrc setting to skip warnings/errors, edit disable line to add error code or statement, for example:

1	disable=C0103,missing-docstring,too-many-branches

Or disabling the pylint check inline:

1 2	if __name__ == '__main__': run() # click command, pylint: disable=E1120

Or more readable, use the symbolic name:

1 2	if __name__ == '__main__': run() # click command, pylint: disable=no-value-for-parameter

And disable in a function level, for example:

def wont_raise_pylint():
  # pylint: disable=W0212
  some_module._protected_member()
  some_module._protected_member()

And disable in a file or bigger scoup, the following part will be disabled by this rule, you can enable it again:

1
2
3

# pylint: disable=use-implicit-booleaness-not-comparison
...
# pylint: enable=use-implicit-booleaness-not-comparison

Linux Storage System

Posted on 2020-12-25 Edited on 2021-01-19 In Storage

[x] ordinary file can also be formatted and mounted [x] terraform mount disk /dev/sdb, why this name? [x] do experiment, using vagrant mount extra disk [x] blkid, lsblk 使用场景 [x] fstab mount 设置

Vagrant demo please see vagrant-storage. This is for VirtualBox provider, at the time of writting this is a experimental feature.

After launch the vagrant machine:

# up
VAGRANT_EXPERIMENTAL="disks" vagrant up
# down
vagrant destroy -f

# check block device, their name, type, size and mount point
# sdb and sdc are the additional disks added in Vagrantfile
lsblk
NAME                    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda                       8:0    0  40G  0 disk
├─sda1                    8:1    0   1G  0 part /boot
└─sda2                    8:2    0  31G  0 part
  ├─centos_centos7-root 253:0    0  29G  0 lvm  /
  └─centos_centos7-swap 253:1    0   2G  0 lvm  [SWAP]
sdb                       8:16   0   2G  0 disk
sdc                       8:32   0   2G  0 disk

# /dev/sda is not used up, let's add one more partition of 3GB
fdisk /dev/sda
Command (m for help): n
Partition type:
   # we have sda1 sda2 2 primart already
   p   primary (2 primary, 0 extended, 2 free)
   e   extended
Select (default p):
Using default response p
Partition number (3,4, default 3):
First sector (67108864-83886079, default 67108864):
Using default value 67108864
Last sector, +sectors or +size{K,M,G} (67108864-83886079, default 83886079): +3GB
Partition 3 of type Linux and of size 2.8 GiB is set

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

# re-read partition table for /dev/sda
partprobe -s /dev/sda

# now see lsblk, sda3
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   40G  0 disk
├─sda1                    8:1    0    1G  0 part /boot
├─sda2                    8:2    0   31G  0 part
│ ├─centos_centos7-root 253:0    0   29G  0 lvm  /
│ └─centos_centos7-swap 253:1    0    2G  0 lvm  [SWAP]
└─sda3                    8:3    0  2.8G  0 part
sdb                       8:16   0    2G  0 disk
sdc                       8:32   0    2G  0 disk

# format sda3
mkfs -t ext4 /dev/sda3

# see type
blkid | grep sda3
/dev/sda3: UUID="7d4a365c-1639-41de-a7c7-ebe79ea2830c" TYPE="ext4"

# mount to /data3 folder
mkdir /data3
mount /dev/sda3 /data3
mount | grep sda3
# check lsblk
# now see lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   40G  0 disk
├─sda1                    8:1    0    1G  0 part /boot
├─sda2                    8:2    0   31G  0 part
│ ├─centos_centos7-root 253:0    0   29G  0 lvm  /
│ └─centos_centos7-swap 253:1    0    2G  0 lvm  [SWAP]
└─sda3                    8:3    0  2.8G  0 part /data3
sdb                       8:16   0    2G  0 disk
sdc                       8:32   0    2G  0 disk

# check space and inode usage
cd /data3
df -k .
df -i .

If you copy a big file to /data3, its use% may not change because flush does not happen, run sync to flush file system buffer.

# format whole disk sdb without partition
# use btrfs and mount with compress
mkfs -t btrfs /dev/sdb

mkdir /datab
mount /dev/sdb -o compress /datab

mount | grep sdb
/dev/sdb on /datab type btrfs (rw,relatime,seclabel,compress=zlib,space_cache,subvolid=5,subvol=/)

# check lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   40G  0 disk
├─sda1                    8:1    0    1G  0 part /boot
├─sda2                    8:2    0   31G  0 part
│ ├─centos_centos7-root 253:0    0   29G  0 lvm  /
│ └─centos_centos7-swap 253:1    0    2G  0 lvm  [SWAP]
└─sda3                    8:3    0  2.8G  0 part /data3
sdb                       8:16   0    2G  0 disk /datab
sdc                       8:32   0    2G  0 disk

For creating logical volumes, please see LVM section. For creating loop device, please see Loop Device section.

From Linkedin Learn Linux: Storage Systems.

首先需要理解加入一个新的disk到系统之后，发生了什么，以及需要做什么工作才能使用这个新加入的disk。可以参考一下这篇文章. 主要用到了fdisk(partition), mkfs(make filesystem), mount or fstab(make accessable)

新加入的disk 的device file 如何命名的, Name conventions of device file, see here, 比如/dev/sda, /dev/sdb, etc.

总的来看，可以用这样的顺序去观察block storage system:

lsblk, blkid 查看block device的大致状态，filesystem, disk, partition, mount point等
/etc/fstab 查看是否persistent 以及 mount option 是否合适
mount 查看一下defaults mount option 具体内容是什么
df -k/-i 查看mount point space使用情况
dd 测试I/O performance (或者结合iperf3如果是NFS 之类的分布式存储)

Partition

lsblk and blkid are used to identify block storages (linux also has character device). 有一点要注意，使用lsblk的时候，比如:

NAME                    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda                       8:0    0  32G  0 disk
├─sda1                    8:1    0   1G  0 part /boot
└─sda2                    8:2    0  31G  0 part
  ├─centos_centos7-root 253:0    0  29G  0 lvm  /
  └─centos_centos7-swap 253:1    0   2G  0 lvm  [SWAP]

区分TYPE的不同类型(也可以从缩进结构看出), MOUNTPOINT中有值的部分才会在mount command中显示。在/etc/fstab中也会看到对应的entries.

Check storage partition:

# list partitions and see type of a disk
# `p` to see part 
# `n` to add new part
# `d` to delete part
# `w` to confirm
# `q` to quit
# /dev/sda is disk type
fdisk /dev/sda

# or
ls /dev/sda*

Formating

Formatting partition or disk is to make a filesystem on it.

# format as ext4 type to /dev/sdb1 partition
# -t: type
# -f: force reformat
mkfs -t ext4 -f /dev/sdb1
mkfs -t xfs -f /dev/sdb1

Mounting

Mounting is associating a filesystem with a directory, mornally we would mount an empty directory, otherwise the content in existing directory will be hidden.

mount /dev/sdb1 /data

# non-disk filesystem, system will do special mounts for you
# -t: type
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t debugfs debugfs /sys/kernel/debug
# NFS
mount server:dir /nfs-data

The persistent mounts are in /etc/fstab file, If you type mount command on shell, you will see several mount points, like above proc, sysfs, debugfs. These are special mounts not in /etc/fstab, systemd mounts them on boot automatically.

For the mount options specific to file system type, see man page fstab and mount.

You can unmount by umount command, filesystem cannot be unmounted while in use, for example files are open, process has dir in it. Can check by lsof command.

1	umount /data

Filesystem Types

Commonly used ones: ext2/3/4, xfs, btrfs. they have different properties.

man 5 filesystemd
man 5 ext4
man 5 xfs
man 5 btrfs

Note that sometime you cannot create file because of no space left on device, but when you check df -k ., it is not full, check df -i . may help, you may use up the inode capacity even if there are lots of space remain.

LVM

logical volume manager, a layer above physical partitions that allows multiple partitions to appear as one. Provides for growing a filesystem by adding physical space, can stripe and mirror.

There are 3 levels of abstraction, one onto another:

physical volumes: disk or disk partitions
volume groups: collections of physical volumes
logical volumes: partitioning up a volume group

Assume we have new disk partitions /dev/sdc1, /dev/sdc2, /dev/sdc3:

lvm
# show all physical volume
> pvs -a
# find lv on physical volume 
> pvck /dev/sda2
# show logical volume group
> vgs

# create physical volume
pvcreate /dev/sdc1
pvcreate /dev/sdc2
pvcreate /dev/sdc3

# list physical volume
# you can see volume group name
pvscan

# display detail
pvdisplay /dev/sdc1

Let’s create volume group:

# -c n: cluster no
# vg1: group name
vgcreate -c n vg1 /dev/sdc1 /dev/sdc2
pvscan
# add /dev/sdc3 to group
vgextend vg1 /dev/sdc3
pvscan
# remove unused pv
vgreduce vg1 /dev/sdc3
pvscan
# remove group
vgremove vg1
pvscan

Then create logical volume:

# create a logical volume 600M from group vg1
# -L: size
# -n: lv name
lvcreate -L 600M -n apple vg1
# format and mount it
mkfs -t ext4 /dev/vg1/apple
mkdir /apple && mount /dev/vg1/apple /apple

// apple will across 2 physocal volume
sdc                       8:32   0    2G  0 disk
├─sdc1                    8:33   0  500M  0 part
│ └─vg1-apple           253:2    0  600M  0 lvm  /apple
├─sdc2                    8:34   0  500M  0 part
│ └─vg1-apple           253:2    0  600M  0 lvm  /apple
└─sdc3                    8:35   0  500M  0 part

umount /apple
# extend
# 注意不能超过所在group的总大小
# it will also extend filesystem automatically
lvextend -L +200M -r /dev/vg1/apple

# shrink, first resuze filesystem
# then lv
fsadm -e resize /dev/vg1/apple 300M
lvresize --size 300M /dev/vg1/apple 

# remove and reduce for logical volume
lvremove
lvreduce

现在就明白了，在Vagrantfile demo中，最开始已经有了2个logical volume: root and swap:

NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   40G  0 disk
├─sda1                    8:1    0    1G  0 part /boot
├─sda2                    8:2    0   31G  0 part
│ ├─centos_centos7-root 253:0    0   29G  0 lvm  /
│ └─centos_centos7-swap 253:1    0    2G  0 lvm  [SWAP]

可以查看这2个lv 的信息, 这表示/dev/sda2是一个physical volume，centos_centos7 是volume group name. 对于logical volume的每一个抽象层，都有对应的command:

# show physical volume info
pvdisplay
# show lv info
lvdisplay
# show volume group info
vgdisplay

Swapping

Swapping 对于很多应用和服务都不太好，影响性能，所以注意是否需要关闭。 Usually swapping is set in /etc/fstab file, for example, in Vagrant machine:

1
2
3

// as mentioned, /dev/mapper/centos_centos7-swap
// is a logical volume
/dev/mapper/centos_centos7-swap swap                    swap    defaults        0 0

To disable it, just comment out and run swapoff /dev/mapper/centos_centos7-swap or swapoff -a

A partition or file can be configured as swap space:

# a partition
mkswap /dev/sdc2
# on and off, not persistent
swapon /dev/sdc2
swapoff /dev/sdc2

# check swap components
swapon -s

Uses a file as swap space (loop device也是类似的情况)

# generate 1G file
# or using truncate or fallocate
dd if=/dev/zero of=/tmp/myswap bs=1G count=1
chown root:root /tmp/myswap
chmod 0600 /tmp/myswap
# enable swap
mkswap /tmp/myswap
swapon /tmp/myswap
# off
swapoff /tmp/myswap

Then if you check free -h, swap space gets extended 1G.

Loop Device

这里不得不说到Pseudo-devices, 也就是常用的/dev/null, /dev/zero, /dev/full, /dev/random, etc. 它们都是character based devices.

What is loop device wiki, is a pseudo-device that makes a file accessible as block device. files of this kind are often used for CD, ISO images, etc. So after mounting them you can access the content.

In my blog <<What's New in CentOS 8>>, I have mentioned the commands:

# shrink or extend file to specific size
# much faster than 'dd' command
truncate -s 1g /tmp/loop.img
# you can make partition on it or skip this step
# for example, make 2 partitions 200M and 400M each
fdisk /tmp/loop.img

# create loopback device from a file, will associate file with /dev/loop0
# -f: find unused loop device
# -P: force kernel to scan partition table on newly create loop device
losetup -fP /tmp/loop.img
# now from lsblk, you will see the partitions under /dev/loop0

# format
mkfs -t ext4 /dev/loop0p1
mkfs -t xfs /dev/loop0p2

# mount it
mkdir /loop{0,1}
mount /dev/loop0p1 /loop0
mount /dev/loop0p2 /loop1
mount | grep loop0
# unmount 
umount /loop0
umount /loop1

Or without losetup command:

truncate -s 1g /tmp/loop.img2
mkdir /loop2
# must first format the file
mkfs -t xfs /tmp/loop.img2
# -o: option is loop
# asociate /dev/loop1 to the file
# no need losetup command
mount -o loop /tmp/loop.img2 /loop2

# unmount 
umount /loop2

So let’s see

// see first unused loop device
// losetup -f
/dev/loop2

// losetup -a
/dev/loop0: [64768]:16777282 (/tmp/loop.img)
/dev/loop1: [64768]:16777842 (/tmp/loop.img2)

// lsblk, only show loop part
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0                     7:0    0    1G  0 loop
├─loop0p1               259:0    0  200M  0 loop /loop0
└─loop0p2               259:1    0  400M  0 loop /loop1
loop1                     7:1    0    1G  0 loop /loop2

// blkid
/dev/loop0p1: UUID="293db313-904b-40a7-9e05-de2ea6f7e12a" TYPE="ext4"
/dev/loop0p2: UUID="2d382bc4-8323-45d1-927b-17bbd1e8880d" TYPE="xfs"
/dev/loop0: PTTYPE="dos"
/dev/loop1: UUID="40282d5f-1d4e-495c-a480-78470237f8e2" TYPE="xfs"

RAID Partitioning

Here is software RAID, combining multiple disks to improve performance and/or reliability, we can have striping, redundancy features, etc.

# level1: mirror
# level5: redundancy
mdadm --create --verbose /dev/md/myraid --level=5 --raid-devices=3 /dev/sdd{1,2,3}
mkfs -t ext4 /dev/md/myraid
mkdir /mydir && mount /dev/md/myraid /mydir
# check
lsblk -o name,size,fstype,type
# cancel
umount /mydir
mdadm --stop  /dev/md/myraid

SSHFS

Filesystem client based on ssh. Fuse-based in user space, not privileged. Communication securely over SSH. Using standard SSH port (you can specify other ports).

类似于NFS mount, 通过SSH实现，这种方式还是挺方便，比如需要在bastion host中运行程序，可以把在develop host上的code repo 同步挂在到bastion中，就不用scp了.

# on client install
sudo apt-get install sshfs
# or
yum install -y fuse-sshfs

mkdir /sshdir
# mount remote root home directory
# the connection may be flakely
sshfs [user]@<hostname or ip>:[dir] /sshdir [options]

# check on /sshdir side host
cd /sshdir && df -h .
mount | grep ssh

# unmount
fusermount -u /sshdir

Linux System Admin

Posted on 2020-12-25 Edited on 2021-07-18 In Linux

这个课程的收获就是boot stages, kernel的升级以及linux logging的种类，使用。特别是journald，来自systemd。我从其他文章中补充了一些内容，包括loginctl.

Boot

Linux booting process:

firmware stage (BIOS or UEFI)
boot loader stage (grub2)
kernel stage (ramdisk -> root filesystem)
initialization stage (systemd)

/boot directory is about kernel. grub configuration file:

1
2
3

# providing boot menu and excuting kernel
# -N: show line number
sudo less -N /boot/grub2/grub.cfg

讲了一下如何定制grub 的kernel 菜单选项 to add custom boot entry，grub 的菜单会在开机时的图形界面显示。可以在开机时更改kernel line加入systemd rescue or emergency target, refer here:

Kernel

Upgrade kernel version for CentOS:

# uname -r
sudo yum list installed kernel-*
# see new kernel available
sudo yum list available kernel

# update
sudo yum update -y kernel
# then reboot and check kernel version
sudo reboot

The above steps usually cannot help much because the lack of latest version in official repo. We need third-party repo, see this artical for help: How to Upgrade Linux Kernel in CentOS 7

Because we use SSH session to upgrade the kernal, so we are not able to select the kernel version on boot menu, we can do it by configuring the grub2:

# check kernel index list, index starts from 0
sudo awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg

CentOS Linux (5.4.125-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.31.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.25.1.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-adbe471f40421bfbf841690042db23fd) 7 (Core)

Switch to 5.4.125-1 version:

# set kernel index 0
sudo grub2-set-default 0
# reconfig boot loader code
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

After rebooting, check the kernel version:

uname -r

Switch back to old kernel version is easy:

# # set kernel index 2, see above index list
sudo grub2-set-default 2
# reconfig boot loader code
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

注意kernel version 和 OS version不一样，比如查看CentOS OS version:

1
2
3

cat /etc/centos-release
# or
rpm -qa centos-release

Linux Logging

Linux has 2 logging systems，这 2 个logging systems can run parallelly, or you can use journal alone.

rsyslog (persistent logs, can log remotely)
journald (nonpersistent by default)

syslog vs rsyslog vs syslog-ng: Basically, they are all the same, in the way they all permit the logging of data from different types of systems in a central repository, each project trying to improve the previous one with more reliability and functionalities.

Different logs for differnet purpose, some for failed jobs, some for cron jobs, etc. The rsyslog is a daemon:

1	systemctl status rsyslog

/etc/rsyslog.conf is the configuration file, see section under #### RULES ####. For example, anything beyond mail, authpriv and cron is logged in /var/log/messages, in the below file, cron.* means messages of all priorities will be logged (debug, info, notice, warn, err, crit, alert, emerg), cron.warn will log warn and above:

#### RULES ####

# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.*                                                 /dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages

# The authpriv file has restricted access.
authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog


# Log cron stuff
cron.*                                                  /var/log/cron

# Everybody gets emergency messages
*.emerg                                                 :omusrmsg:*

# Save news errors of level crit and higher in a special file.
uucp,news.crit                                          /var/log/spooler

# Save boot messages also to boot.log
local7.*                                                /var/log/boot.log

Log rotate config is in /etc/logrotate.conf and there is a cron job for rotate /etc/cron.daily/logrotate.

If you want to log message to system log file, use logger command:

# you will see it in /var/log/messages
logger "hello"
# -p: priority
logger -p local4.info " This is a info message from local 4"

How to search and view rsyslog, see this article: Linux uses a set of configuration files, directories, programs, commands and daemons to create, store and recycle these log messages. The default location for log files in Linux is /var/log.

If you check with ls -ltr -S /var/log, the lastlog file may have a big size, way bigger than the disk space, it is a sparse file.

At the heart of the logging mechanism is the rsyslog daemon. This service is responsible for listening to log messages from different parts of a Linux system and routing the message to an appropriate log file in the /var/log directory. It can also forward log messages to another Linux server.

/var/log/messages 可以用vim等工具正常查看. command who, last其实是使用了/var/run/utmp and /var/run/wtmp

journalctl have the same logs as in rsyslogd, from here, persistent journal can replace rsyslogd.

# persist journald by making a dir
sudo mkdir -p /var/log/journal
sudo systemctl restart systemd-journald
# you will see journal records here
ls -l /var/log/journal

Or enable in /etc/systemd/journald.conf, set Storage=persistent.

You can specify date ranges:

journalctl --since "2020-12-11 15:44:32"
# time left off is 00:00:00 midight
journalctl --since "2020-10-01" --until "2020-10-03 03:00"
journalctl --since yesterday
journalctl --since 09:00 --until "1 hour ago"

Some useful commands:

# list boots
journalctl --list-boots
# check last boot journal
# -b -1: last boot
sudo reboot
sudo journalctl -b -1

# combine
journalctl -u nginx.service -u php-fpm.service --since today

# pid, uid, gid
journalctl _PID=8088
journalctl _UID=33 --since today

# -F: show available values
journalctl -F _GID
journalctl -F _UID

# check executable
journalctl /usr/bin/bash

# display only kernel message
journalctl -k

# by priority, can use number or name
#0: emerg
#1: alert
#2: crit
#3: err
#4: warning
#5: notice
#6: info
#7: debug
journalctl -p err -b

# the same as tail -n/-f
journalctl -n 10
journalctl -f

# disk usage
journalctl --disk-usage
# shrink
sudo journalctl --vacuum-size=1G
sudo journalctl --vacuum-time=1years

You can use right arrow key to see full entry if it is too long.

# print all on stdout, no pager with less
journalctl --no-pager

# -o output format
#cat: Displays only the message field itself.
#export: A binary format suitable for transferring or backing up.
#json: Standard JSON with one entry per line.
#json-pretty: JSON formatted for better human-readability
#json-sse: JSON formatted output wrapped to make add server-sent event compatible
#short: The default syslog style output
#short-iso: The default format augmented to show ISO 8601 wallclock timestamps.
#short-monotonic: The default format with monotonic timestamps.
#short-precise: The default format with microsecond precision
#verbose: Shows every journal field available for the entry, including those usually hidde 
#internally.
journalctl -b -u nginx -o json
journalctl -b -u nginx -o json-pretty

Linux Session

Other capabilities, like log management and user sessions are handled by separated daemons and management utilities (journald/journalctl and logind/loginctl respectively).

Get info about user and the processes he is running before:

# list sessions
loginctl list-sessions

# session status 
# you can see the user action history
loginctl session-status [session id]
loginctl show-session [session id]
loginctl kill-session [session id]

Systemd Essential

Posted on 2020-12-23 Edited on 2023-01-21 In Linux

Version

Different version may have different syntax and options in unit file, check it first:

1	systemctl --version

To see systemd service unit configuration:

1	man 5 systemd.service

Systemd

How To Use Systemctl to Manage Systemd Services and Units

History:

1	SysV Init -> Upstart -> Systemd

The systemd, system and service manager, is an init system used to bootstrap the user space and to manage system processes after boot. Use systemctl command to manage the service on a systemd enabled system.

The fundamental purpose of an init system is to initialize the components that must be started after the Linux kernel is booted (traditionally known as “userland” components). The init system is also used to manage services and daemons for the server at any point while the system is running.

Main commands, for example using nginx, some commands have to use sudo if you are non-root user, since it will affect te state of the operating system, you can leave off the .service suffix in command.

# start on boot
# This hooks it up to a certain boot “target”
# causing it to be triggered when that target is started.
sudo systemctl enable nginx.service
sudo systemctl disable nginx.service

# start and stop
sudo systemctl start nginx.service
sudo systemctl stop nginx.service

# when change the configuration of service
# To attempt to reload the service without 
# interrupting normal functionality
sudo systemctl reload nginx.service
# if reload is not available, restart instead
sudo systemctl reload-or-restart nginx.service

# status overview, you can see:
# unit file path
# drop-in
# enabled or disabled at vendor and custom
# up time
# Cgroup
systemctl status nginx.service

# find overridden config files for all units
# check unit drop in config snippets
sudo systemd-delta

enable will create a soft link into the location on disk where systemd looks for autostart files (usually /etc/systemd/system/some_target.target.wants).

The exit code can be used for shell script:

# may need sudo
systemctl is-active nginx.service
systemctl is-enabled nginx.service
systemctl is-failed nginx.service

Check system states managed by systemd:

# list enabled units only
systemctl [list-units]

# list all of the units that systemd has loaded or attempted to load into memory
# include not currently active
systemctl list-units --all [--state=active|inactive|failed] [--type=service|target]
# list failed daemons, this is useful when reboot VM but daemons not
systemctl list-units --state failed

# show every available units installed on the system
systemctl list-unit-files

list-unit-files state column: The state will usually be enabled, disabled, static, or masked. In this context, static means that the unit file does not contain an [Install] section, which is used to enable a unit. As such, these units cannot be enabled. Usually, this means that the unit performs a one-off action or is used only as a dependency of another unit and should not be run by itself.

1
2
3

# after mask, cannot enable or start service
sudo systemctl mask nginx.service
sudo systemctl unmask nginx.service

To see full content and path of a unit file, vanilla unit files(don’t touch them, override them if needed in other places, for example /etc/systemd/system) are in /usr/lib/systemd/system and customized are in /etc/systemd/system folder:

# show unit file content and path
# if has overriding snippet, will show them as well
systemctl cat nginx.service

# list dependencies 
systemctl list-dependencies nginx.service [--all] [--reverse] [--before] [--after]

# low-level detail of unit
# all key=values
# -p: display a single property
systemctl show nginx.service [-p ExecStart]

# edit unit file
sudo systemctl edit --full nginx.service
# append unit file snippet
sudo systemctl edit nginx.service
# then reload to pick up changes
sodu systemctl daemon-reload

You can also further override by creating override.conf file in /etc/systemd/system/xxx.service.d folder, there are some principles for overriding, see here.

In systemd, service and other unit files can be tied to a target.

Targets are special unit files that describe a system state or synchronization point. Like other units, the files that define targets can be identified by their suffix, which in this case is .target. Targets do not do much by themselves, but are instead used to group other units together.

This can be used in order to bring the system to certain states, much like other init systems use runlevels. (仍然可以显示系统的runlevel的)

For instance, there is a swap.target that is used to indicate that swap is ready for use. Units that are part of this process can sync with this target by indicating in their configuration that they are WantedBy= or RequiredBy= the swap.target. Units that require swap to be available can specify this condition using the Wants=, Requires=, and After= specifications to indicate the nature of their relationship.

# all available targets
systemctl list-unit-files --type=target

# current default target
systemctl get-default

# set default target
sudo systemctl set-default multi-user.target
sudo systemctl set-default runlevel3.target

# see what units are tied to a target
systemctl list-dependencies multi-user.target

Unlike runlevels, multiple targets can be active at one time. An active target indicates that systemd has attempted to start all of the units tied to the target and has not tried to tear them down again.

1 2	# show all active targets systemctl list-units --type=target

This is similar to changing the runlevel in other init systems. For instance, if you are operating in a graphical environment with graphical.target active, you can shutdown the graphical system and put the system into a multi-user command line state by isolating the multi-user.target. Since graphical.target depends on multi-user.target but not the other way around, all of the graphical units will be stopped.

# check units that will be kept alive
systemctl list-dependencies multi-user.target
# transition
sudo systemctl isolate multi-user.target

Stopping and rebooting system, note shutdown, reboot, poweroff are actually softlink to systemctl!

sudo systemctl poweroff
sudo systemctl reboot
# boot into rescue mode (single-user)
sudo systemctl rescue

These all alert logged in users that the event is occurring.

有意思的是，这些都是同一个softlink，但是调用却有不同的效果呢? 利用了$0 作为判断, see here.

lrwxrwxrwx. 1 root root          16 Jun  6  2020 halt -> ../bin/systemctl
lrwxrwxrwx. 1 root root          16 Jun  6  2020 poweroff -> ../bin/systemctl
lrwxrwxrwx. 1 root root          16 Jun  6  2020 reboot -> ../bin/systemctl
lrwxrwxrwx. 1 root root          16 Jun  6  2020 runlevel -> ../bin/systemctl
lrwxrwxrwx. 1 root root          16 Jun  6  2020 shutdown -> ../bin/systemctl
lrwxrwxrwx. 1 root root          16 Jun  6  2020 telinit -> ../bin/systemctl

Systemd Journal

How To Use Journalctl to View and Manipulate Systemd Logs

The journal is implemented with the journald daemon, which handles all of the messages produced by the kernel, initrd, services, etc.

这里介绍了关于systemd service journal的使用，更详细的介绍Journal 可以参考我的blog <<Linux System Admin>>.

Set time of the prompt:

# available time zone
timedatectl list-timezones
# set time zone
sudo timedatectl set-timezone America/Los_Angeles
# check
timedatectl status

# or using UTC
# --utc: time zone
journalctl --utc

To see log of a specific service:

# full log
journalctl -e

# -u: unit, but some log may be missing in ExecStartPre
# -b: limit to current boot
# -e: show ending logs
# -r: show in reverse order
# -f: tail log
journalctl -u nginx.service [--since] [--until] [-e]

# combine related units
# good for debug
journalctl -u nginx.service -u php-fpm.service --since today

Service Unit File

Understanding Systemd Units and Unit Files

Here only focus on .service unit. Systemd manages a broad range of resources, such as .target, .socket, .device, .mount, .swap, etc. They may have different section blocks.

Section block names are case-sensitive. Non-standard section name [X-name] has a X- prefix.

[Unit]
#This is generally used for defining metadata for the unit 
# and configuring the relationship of the unit to other units
Description=
Documentation=

Requires=
BindsTo=

Wants=
Before=
After=

Conflicts=
Conditionxxx=
Assertxxx=

[Service]
# provide configuration that is only applicable for services
Environment=
Type=simple(default)|forking|oneshot|dbus|notify|idle

PIDFile=

# deprecated syntax
# if true, User and Group only applied to ExecStart
PermissionsStartOnly=true

ExecStartPre=

ExecStart=
ExecStartPost=
ExecReload=
ExecStop=
ExecStopPost=

User=
Group=

RestartSec=
Restart=always|on-success|on-failure|on-abnormal|on-abort|on-watchdog
TimeoutSec=

[Install]
# only units that can be enabled will have this section
WantedBy=
RequiredBy=

Alias=
Also=
DefaultInstance=

key=value pairs 不止这些，遇到新的可以补充。

Systemd does not use a shell to execute commands and does not perform $PATH lookup, so for example in ExecStartPre, must specify shell context to run command:

1	ExecStartPre=/bin/sh -c 'pgrep process_name > /var/run/process_name.pid'

And because of no $PATH lookup, must use absolute path to binary or executable.

Setting PermissionsStartOnly=true means that User & Group are only applied to ExecStart. So switching to the new syntax will be :

ExecStartPre=+/bin/bash -c '/bin/journalctl -b -u ntpdate | /bin/grep -q -e "adjust time server" -e "step time server"'
ExecStartPre=+/bin/mkdir -p /path/to/somedir
ExecStartPre=-/usr/bin/<command that may fail>
ExecStart=/path/to/myservice
ExecStop=+/bin/kill -INT ${MAINPID}
ExecReload=+/bin/kill -INT ${MAINPID} && /path/to/myservice

Prefix with + will be executed with higher privilege as root. Prefix - means if any of those commands fail, the remaining exec will not be interrupted, otherwise the unit is considered failed.

Also note that per ExecStartPre is running in isolation.

If needs to check the log of latest daemon start process, this will show you all log instead of only unit, I found if use -u to specify unit, some log will miss:

# use `b` and `f` to move page
# -e: tail
# -x: more details
journalctl -ex

Systemd with multiple execStart: serivce type 和 ExecStart 个数有关.

Cloud Init Quick Start

Posted on 2020-12-21 Edited on 2021-10-11 In Cloud

当时的项目用到了cloud-init 进行本机系统启动后的配置，替代之前Ansible的配置操作(也可以做ansible之前的一些更为基础的配置，比如设置network, SSH等)，使其在boot后到达可用状态。其实和Ansible 一样都是configuration management tool, Ansible is push-based, cloud-init is pull-based.

LXD/LXC container can be used with cloud-init.

Cloud-init

cloud-init official document, User data config example.

这段话解释得很清楚了: Cloud images are operating system templates and every instance starts out as an identical clone of every other instance. It is the user data that gives every cloud instance its personality and cloud-init is the tool that applies user data to your instances automatically.

To use cloud-init, need to install packages, for example in CentOS:

yum install -y cloud-init
# you can see these services
systemctl cat cloud-init-local
systemctl cat cloud-init
systemctl cat cloud-config.service
systemctl cat cloud-final.service

See this IBM post for how to install cloud-init on Centos

目前各大云厂商都支持cloud-init, 在infra as code中，cloud-init可以通过传递一个cloud-init.tpl metadata file 到 Terraform instance resource metadate的 user-data 中进行设置. 这样在instance 启动时，相应的就会自动配置了。

data "template_file" "setup" {
  # template file
  template = file("${path.module}/cloud-init.tpl")
  # pass var for rendering
  vars = {
    foo              = "/dev/sdb"
    foo_config    = base64encode(data.template_file.foo_config.rendered)
  }
}

# instance definition
resource "google_compute_instance" "backup" {
  # pass to it
  metadata = {
    user-data  = data.template_file.setup.rendered
  }
}

If you are working on gcloud, go to instance detail page, check Custom metadata -> user data will display the rendered script.

要点是如何写这个cloud-init.tpl metadata file, notice that must include this line at very beginning and no space after #:

1	#cloud-config

Debug Cloud-init

Troubleshooting VM provisioning with cloud-init

The log of cloud-init is in /var/log/cloud-init.log. It will show you errors if something failed.

上次还遇到一个问题，就是当时#cloud-config格式没对，导致cloud-init 无法解析这个文件，所以user metadata没有得到执行，这时如果看log file 不是很明显，需要查看/var/log/boot.log文件，通过对比发现这个错误:

1	Unhandled non-multipart (text/x-not-multipart) userdata ...

这说明格式错了，当时这个问题卡了几个小时，一直没注意到这个地方。

Others

在构造user的password的时候，需要一个hash的数值: openssl passwd Why is the output of “openssl passwd” different each time?

# -1: MD5
openssl passwd -1
# -salt: add salt
openssl passwd -1 -salt yoursalt
# from stdin
echo 'admin' | openssl passwd -1 -stdin -salt yoursalt

Ansible Galaxy Modules

Posted on 2020-12-16 Edited on 2020-12-21 In Ansible

最近在做项目的时候，发现用到了Ansible Galaxy上的模块，这里记录一下。

Web page

ansible java role，对于安装不同的Java 版本非常方便。

Git Release Tag and Branch

Posted on 2020-12-09 Edited on 2020-12-11 In Git

Release Tag

When to use tag: every time you push to production, tag the release. Release tag points to a specific commit in history.

# fetch all remote tags
git fetch --tags
# list all tags
git tag [--list]
# delete tag
git tag -d <tag name>
# Delete a tag from the server with push tags
git push --delete origin <tag name>

# in your current branch tag latest commit
# create annotated tag, add more info
# will open editor
git tag -a 0.1.0

# tag specific commit
git tag -a 0.1.0 <commit hash>

# push tag to remote
git push origin <branch> --tags

这个git command很有用

1 2	## can see commit detail, tag annotation, and so on. git show [tag\|or anything]

References: what is tag annotated and non-annotated tag move a tag on a branch to different commit

Release Branch

Generally you only need tags for releases. But when you need to make changes to production release without affecting master, you need release branch. For example, make hotfix. Release branch can be updated with new commits.

Use cases for release branches:

Manual QA.
Long running releases.
On demand hot fixes.

Workflow is like

# we have some hot fixes on release branch, then create a release tag on it for production
# finally merge into master
master ------ + ----- + -----+ --- + --------------- + ------>
               \                               /
                \       hotfix   hotfix       /
                 \-------- + ----- + --------/
      release branch           (release tag)

[x] This is why we have double-commit back to master branch. 我们把release tag放在master branch中了。

You can also create release branch on the fly from release tags, the workflow:

# on master branch
# checkout to a tag (detched HEAD)
git checkout v1.1
# create release branch on release tag
git checkout -b rb1.1

# or in one command to checkout a tag to a new branch
git checkout tags/v1.1 -b rb1.1

# make fix
<hot fix>

# commit
git commit -m "hotfix"
git tag -a v1.1.1 -m "hotfix"
# merge
git checkout master
# or rebase/squash or fast-forward
git merge rb1.1 -m "merged hotfix"
# delete branch, the tag is still there, because you merged to master
git branch -d rb1.1

Gitlab CI

Posted on 2020-12-05 Edited on 2021-01-19 In Gitlab

目前使用的source code management tool 是gitlab, 除了行使git 的功能外，对每次 merge request 都做了额外的CI/CD操作，这里记录一下相关语法和总结 from course Continuous Delivery with GitLab

CI: code and feature integraion, combining updates into existing code base, testing with automation. CD: delivery can mean deployment, the process of building and deploying the app, for example, upload the object to somewhere that customer can download.

Gitlab uses pipelines to do both CI/CD, defined in .gitlab-ci.yml file at your branch.

Tips

[x] To navigate the source code in gitlab repo, try launch the Web IDE, will show you a structure tree on left side of the files. [x] Use snippet to share code or file block for issue solving, the same as gPaste. [x] To-do list is someone mentions you in some events. [x] Milestone is a goal that needs to track. [x] Merge request (pull request in github) after merged can auto close the issue, deponds on setting.

Setup self-managed Gitlab

You can experiment with gitlab community edition locally by bringing up a gitlab server through Vagrant. For example, Vagrantfile, there are 2 VMs, one VM for configuring docker gitlab runner:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# server static ip
GITLAB_IP = "192.168.50.10"
# worker static ip
GITLAB_RUNNER_IP = "192.168.50.11"

Vagrant.configure("2") do |config|
  # gitlab server VM
  config.vm.define "server", primary: true do | server|
    server.vm.hostname = "gitlab"
    server.vm.box = "bento/ubuntu-16.04"
    ## private network
    server.vm.network "private_network", ip: GITLAB_IP

    server.vm.provider "virtualbox" do |v|
      v.memory = 2048
      v.cpus = 2
    end
  end

  # gitlab runner VM with docker installed
  config.vm.define "runner", primary: true do | runner|
    runner.vm.hostname = "runner"
    runner.vm.box = "bento/ubuntu-16.04"
    ## private network
    runner.vm.network "private_network", ip: GITLAB_RUNNER_IP

    runner.vm.provider "virtualbox" do |v|
      v.memory = 1024
      v.cpus = 2
    end
  end

end

Vagrant quick commands:

vagrant up
# ssh to server
vagrant ssh [gitlab]
# ssh to worker
vagrant ssh runner
# destroy uses
vagrant destroy -f

Install the packages references from there, but it uses enterprise edition, we use community edition.

# Update package manager and install prerequisites
sudo apt-get update
sudo apt-get install -y curl openssh-server ca-certificates
# we don't need email in this case, so skip it

# Set up gitlab apt repository
curl https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh | sudo bash

# Install gitlab
# this is the IP address in vagrant file
# gitlab-ce is community edition
sudo EXTERNAL_URL="http://192.168.50.10" apt-get install gitlab-ce

After install, go to browser and hit http://192.168.50.10, reset root password and login as root with the reseted password.

Experiment

[1] Create a new project hello world (You can also create it by setting jenkins pipeline) Use root user to create a private project, check RAEDME added option.

[2] Create a admin user, so don’t need to use root user anymore. Grant the new user as admin, edit the password that will be used as temporary password next time you login. sign out and sign in again with new admin user.

[3] Setup SSH for your user The same process as setup SSH on github, go to setting -> SSH keys.

[4] Create new project under admin user, set as priviate scope.

[4] Create anthos vagrant VM as gitlab client To avoid messing up system git global configuration, then vagrant ssh and git clone the project.

Go to project dashboard, in the left menu: The CI/CD tab is what we will focus on The Operations tab is where gitlab integrate other systems in your stack, for example kubernetes. The Settings -> CI/CD is about configuration.

CI/CD

[x] SonarQube, code quality testing tool.

.gitlab-ci.yml 通过设计stage 搭配完成了both CI/CD 的操作。可以通过不同的条件判断，对特定的branch 进行不同的CI/CD. 每次MR 之前和之后都各有一个 pipeline，针对的是MR前后的branch. 设置了jenkins pipeline double-commit 到master branch, 因为如果需要修改gitlab-ci.yml 只会checked in 到 master中，所以变化要在master中得到体现。

CI test levels, each of them is a stage in pipeline, should fail early and fail often.

syntax and linting
unit and integration
acceptance

Gitlab runner is similar to jenkins, support run on VM, bare metal system or docker container or kubernetes. Here we use docker, so install docker first, can reference here

Here we install docker on gitlab server VM. [x] You can spin up another VM with 2GB, install docker and run gitlab runner container there. But make sure the VM can ping each other, just like what I did in Vagrantfile.

This docker install is on Ubuntu, Centos or other linux distro please see different way to install docker:

sudo apt-get update

sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

# add docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# add stable repository
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

# install docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# verify install good
sudo docker run hello-world

Install docker gitlab runner, reference is here

# name is gitlab-runner
# -v: will create folder automatically
sudo docker run -d --name gitlab-runner --restart always \
    -v /srv/gitlab-runner/config:/etc/gitlab-runner \
    -v /var/run/docker.sock:/var/run/docker.sock \
    gitlab/gitlab-runner:latest

Then register the runner to your gitlab project, go to gitlab project Settings -> CI/CD -> Runners expand to see the registeration token.

# later gitlab-runner is command
# register is argument
sudo docker exec -it gitlab-runner gitlab-runner register

# command prompt:
Enter the GitLab instance URL (for example, https://gitlab.com/):
# from runner expand
http://192.168.50.10/
Enter the registration token:
# from runner expand
K5G9S5e5wmcdoANUGLF4
Enter a description for the runner:
[5922b65a9261]: docker
Enter tags for the runner (comma-separated):
# gitlab-ci will refer this tag
docker-tag
Registering runner... succeeded                     runner=K5G9S5e5
Enter an executor: docker, docker-ssh, virtualbox, docker+machine, docker-ssh+machine, custom, parallels, shell, ssh, kubernetes:
docker
Enter the default Docker image (for example, ruby:2.6):
# this can be overrided later
alpine:latest

Then reload the gitlab runner page, you will see the registered runner is there, click runner name to see specific. This runner is locked to this project, but you can alter it (the edit icon right near runner).

Create .gitlab-ci.yml in your repo to specify the pipeline, if you create it on web IDE, you can choose a template for it, for example the bash template, more advanced syntax please see gitlab-ci doc:

---
# will override the image alpine:latest above
image: busybox:latest

# global variable, used by ${CHART_NAME}
variables:
  CHART_NAME: xxxx
  VERSION_NUM: xxxx

# specify order or skip some stages
stages:
  - test
  - build
  - deploy

before_script:
  - echo "Before script section"
  - echo "For example you might run an update here or install a build dependency"
  - echo "Or perhaps you might print out some debugging details"

after_script:
  - echo "After script section"
  - echo "For example you might do some cleanup here"

# execute in order if no stages list
build1:
  # tags means run on the docker runner I installed above that taged as `docker-tag`
  tags:
    - docker-tag
  stage: build
  script:
    - echo "Do your build here"

test1:
  tags:
    - docker-tag
  stage: test
  script:
    - echo "Do a test here"
    - echo "For example run a test suite"

test2:
  tags:
    - docker-tag
  stage: test
  script:
    - echo "Do another parallel test here"
    - echo "For example run a lint test"

deploy1:
  tags:
    - docker-tag
  stage: deploy
  script:
    - echo "Do your deploy here"

In the Pipeline page, CI Lint is the tool can edit and validate the .gitlab-ci yaml file syntax. You can also use Settings -> CI/CD -> Environment variables expand to set the env variables.

[x] where is the run-dev-check.sh script hosted? it is git cloned from another repo.

1
2
3

script:
  - git clone -v $CLOUDSIMPLE_CI_REPO_URL
  - ci-cd/common-jobs/run-dev-check.sh

Shell Pattern Matching

Posted on 2020-11-28 Edited on 2022-06-20 In Shell

Parameter expansion

这是在script 中处理string, number 数据的常用方法. 可以用来代替sed, cut这些external programs, speed up significantly. As our experience with scripting grows, the ability to effectively manipulate strings and numbers will prove extremely valuable.

对于变量值的检查(比如参数是否为dash 开头，否则当作argument使用)，提取(比如提取一个文件名去掉后缀)很有帮助，这个手册概括了所有情况，但可能不好理解，可以动手试一下就知道了, Shell parameter expansion. 其实用pipeline 也可以达到相同的效果，但是会麻烦一些。

这是中文总结，还不错Shell扩展(Shell Expansions)-参数扩展(Shell Parameter Expansion). 有个地方写错了 $$ 才是当前shell 的PID。

注意null 和 unset variable的区别, set -u 可以检测报错使用没有定义的variable (也就是unset), null 在这里就是empty的意思，比如var=，这个变量是存在的，只是没有值:

var=
# 空
echo $var
# true
[[ -z $var ]]
# false
[[ -n $var ]]

set -u
unset var
# 报错
echo $var

Bash’s various forms of parameter expansion can also distinguish between unset and null values:

# `w` can be literal or another variable

# if a is unset, value of w is used
a=${a-w}
# if a is unset or null(empty), value of w is used
# for example, used for positional parameter passed from outside
a=${a:-w}

# if a is unset or null(empty), value of w is assigned to a
# 不能用于positional parameters的赋值, 比如 ${3:=hello}
${a:=w}

# if a is unset or null(empty), w is written to stderr
# and script exits with err
a=${a:?w}

Expansion that return variable names:

# return variable name starts with prefix
# these 2 are identical
${!prefix*} 
${!prefix@}
# return all BASH prefixed variables
echo ${!BASH*}

Indirect parameter expansion:

parameter="var"
var="hello"

# echo is hello
echo ${!parameter}

Substring expansion，来自上面的链接中 Shell parameter expansion 的例子:

$ string=01234567890abcdefgh
# 7 is start index
$ echo ${string:7}
7890abcdefgh
# 0 is number of char to cut
$ echo ${string:7:0}

$ echo ${string:7:2}
78
$ echo ${string:7:-2}
7890abcdef
# 要空格, 防止和:- 混淆
$ echo ${string: -7}
bcdefgh
$ echo ${string: -7:0}

$ echo ${string: -7:2}
bc
$ echo ${string: -7:-2}
bcdef

# set $1 positional parameter
$ set -- 01234567890abcdefgh
$ echo ${1:7}
7890abcdefgh
$ echo ${1:7:0}

$ echo ${1:7:2}
78
$ echo ${1:7:-2}
7890abcdef
$ echo ${1: -7}
bcdefgh
$ echo ${1: -7:0}

$ echo ${1: -7:2}
bc
# -2:  start from end
$ echo ${1: -7:-2}
bcdef

# array
$ array[0]=01234567890abcdefgh
$ echo ${array[0]:7}
7890abcdefgh
$ echo ${array[0]:7:0}

$ echo ${array[0]:7:2}
78
$ echo ${array[0]:7:-2}
7890abcdef
$ echo ${array[0]: -7}
bcdefgh
$ echo ${array[0]: -7:0}

$ echo ${array[0]: -7:2}
bc
$ echo ${array[0]: -7:-2}
bcdef

其他常见的用法，主要是针对string 操作，特别是pathname, much faster than cut extraction!!

# get string length
${#string}
# note, the number of positional parameters
${#@}

# 检查第一个位置参数是不是以-开头，扩展结果是删除最短的match部分
# 对 ${1} 的操作, for example ${1} is --verbose
# get: -verbose
${1#-} 
# 同上，但删除最长的match部分
# get: verbose
${1##-}

# get filename name from a download url
# \ is used to escape / in path
${1##*\/}
# get path of a url
${1%\/*}

This can be also used to do substring contains checking:

1 2	# empty the whole string if substring target is inside [ ! -z "${1##target}" ]

其实# or ## 后面可以使用pattern matching, 这样功能更强, 比如:

${1#+(-)}
${1##+(-)}

# remove leading space or blank
# note that double [[]] wrapper!
shopt -s extglob
${1##*([[:blank:]]|[[:space:]])}
# remove trailing space or blank
${1%%*([[:blank:]]|[[:space:]])}

# remove .tar.gz or .tgz suffix
${1%%(.tgz|.tar.gz)}

参数替换, search and replace ${parameter/pattern/string}, replace pattern in parameter with string.

foo=JPG.JPG 
# replace first match
# jpg.JPG
echo ${foo/JPG/jpg} 
# replace all matches
# jpg.jpg
echo ${foo//JPG/jpg} 
# replace only start
# jpg.JPG
echo ${foo/#JPG/jpg} 
# replace only end
# JPG.jpg
echo ${foo/%JPG/jpg}

大小写变换

Shell Globs

A glob is a wildcard that is processed by the shell and expands into a list of arguments.

Glob is like regular expression but less expressive and eaiser to use. Glob match file names, for example ls [0-9]?file*.txt, whereas regular expression match text, for example ls | grep '[0-9].file.*\.txt'. Sometimes the funtionality can look blurred depending on how you use it. 都可以用在if condition [[ =~ ]], case condition中.

In ls [0-9]?file*.txt, ls does not support regular expression, shell expands the glob and used by ls. grep '^A.*\.txt' *.txt, grep is using regular expression on the files context that file name is expanded by shell from glob.

Shell expansion types and execution order (precedence high to low from up to bottom):

brace expansion touch file{1..2}
tilde expansion ls ~
parameter and variable expansion ${1:1:1}, ${PATH}
command substitution $() or ``
word splitting
arithmetic expansion echo $((11 + 22))
filename expansion echo file{1..2}.*
quote removal echo "$USER"

Wildcards

ls *.txt
# ? is any one char
ls file?.txt
ls file??.txt

Character Set

注意，在Linux中，根据locale的设置，这个regular expression 其实是不包含a的:

ls /usr/sbin/[A-Z]*
# 默认字典顺序 is actually in order
# aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
# 所以[A-Z] 不含a

解决办法是用POSIX Character class(见下一节), this standards introduced a concept called a locale, which could be adjusted to select the character set needed for a particular location. We can see the language setting of our system using the following command:

echo $LANG
# usually is
en_US.UTF-8
# 所以对于上面[A-Z]*的正确写法是
ls /usr/sbin/[[:upper:]]*

注意和brace expansion {} 区别，brace expansion是展开，character set是一种match:

# character set
# match one of them
ls [123abc]file.txt
ls file[0-9].txt
ls file[a-z9].txt

# 这个显示的结果和bash设置有关, 和上面提到的问题一样，不过这里更改了LC_COLLATE的值
# locale, In bash terminal, set LC_COLLATE=C (collation)
ls file[A-G].txt
# ! is inversion, not include a-z
ls file[!a-z].txt
# put at end to match !, it is a special char
ls file[a-z!].txt
ls file[az-].txt

Character classes

# [:upper:] is the character class
# put char class in char set []
ls file[[:upper:]?].txt
ls file[[:lower:]?].txt
ls file[![:lower:][:space:]].txt

Others class useful:

# numbers
[:digit:]
# upper and lower case
[:alpha:]
# upper and lower and numbers
[:alnum:]
# upper
[:upper:]
# lower
[:lower:]
# space, tab, carriage return, newline, vertical tab, and form feed.
# is superset of [:blank:]
[:space:]
# space and tab characters
[:blank:]

Shell globbing Options

使用shopt command的设置 glob的一些特性，比如设置nullglob, extglob, etc.

shopt -s extglob, when using extended pattern matching operators. see here shopt -s nocasematch, set bash case-insensitive match in case or [[ ]] condition. 这个是从bash tutorial 中文版中学到的, shopt is bash built-in setting, unlike set is from POSIX.

Extended Globs

You need to open it:

1 2	shopt \| grep extglob shopt -s extglob

For example, create test cases:

1 2	touch file1.png photo.jpg photo photo.png file.png photo.png.jpg rm -f file1.png photo.jpg photo photo.png file.png photo.png.jpg

# @(match): match one or others
# match photo.jpg
ls photo@(.jpg)
ls @(file)
# photo.jpg or photo.png
ls photo@(.jpg|.png)

# ?(match): match 0 or 1
ls photo?(.jpg|.png)

# +(match): match 1 or more
ls photo+(.jpg|.png)

# *(match): match 0 or more
ls photo*(.jpg|.png)

# !(match): invert match
# all files that do not have photo or file name and do not end with jpg or png
!(+(file|photo)*+(.jpg|.png))

主要用在command line, if condition [[ =~ ]], case condition上，比regular expression matching更快。

Brace Expansion

这个用在比如for loop的counter, create file pre/suffix.

# create file1.txt file2.txt file4.txt
touch file{1,2,4}.txt
touch file{1..1000}.txt

# expand from left to right, 两两组合
echo {a..c}{10..15}

# specify increase step
echo {1..100..2}
# can pad heading 0
echo {0001..10..2}
echo {10..0}

echo {a..z..2}

# can be nested
echo file-201{1..9}-{0{0..9},1{1..2}}-{1..30}.{tar,bak}.{tgz,bz2}
# create folder structure
mkdir -p 20{10..20}/{01-12}

For easy copy and rename file:

1
2
3

# match before and after ,
# file file.bkp
cp -f a/long/path/file{,.bkp}

Regular Expression

注意regular expression 和 globs 的区别，regular expression 是match text的，globs是 shell来扩展的. 会使用到的地方:

grep
sed
awk
if [[ =~ ]]
vim for search
less for search
find -regex
locate -regex

Regular Expression Info POSIX regular expression has basic regular expression(BRE) and extended regular expression(ERE). ERE Syntax, note that ERE has () {} ? + | expression that BRE does not:

. matches one char
[ ] character set
\ escape single char
| alternation: match to occur from among a set of expressions
( ) pattern grouping, for example, separate | with others: ^(AA|BB|CC)+
? * + { } repetition operators
^abc leading anchor
abc$ trailing anchor
[^abc] netates pattern, ^ must appear at beginning

Use ERE whenever possible!!! The support in GNU tools are:

grep -E ERE, grep [-G] default is BRE
sed -E ERE, sed default BRE
awk only supports ERE
[[ =~ ]] ERE

# match one char with zero to 3 occurances
.{,3}
# match one char with 3 to 7 occurances
.{3,7}
# match one char with 3 to more occurances
.{3,}

Backreferences

A pattern stored in a buffer to be recalled later, limit of nine for example: \1 to \9:

# \1 is (ss) pattern
(ss).*\1
(ss).*\1.*\1

# radar
# opapo
^(.)(.).\2\1$

POSIX ERE does not support backreferences, GNU version supports it.

Bash Extended Regexp

Used in [[ =~ ]] in if condition, it is simple to write then extended globs but less efficiency.

BASH_REMATCH: regular expression match, the matched text is placed into array BASH_REMATCH:

[[ abcdef =~ b.d ]]
# the matched is bcd in BASH_REMATCH[0]
echo ${BASH_REMATCH[0]}
# if no match BASH_REMATCH[0] is null

${BASH_REMATCH[0]}很有用，因为存了match的内容，如果是多个group ( ) pattern的组合，则每个group一次存放在${BASH_REMATCH[n]}, n is 0/1/2/3…。

Grep EREs

Grep is global regular expression print. Stick to grep -E 'xxxx'. grep -E -w only match a whole word. grep -E -x only match whole line, same as using anchors. grep -E -o only return the text that match the expression. grep -E -q quiet mode, used to verfiy existence of search item, 用于以前没有[[ =~ ]]的时候.

Sed EREs

See my Sed blog.

Awk EREs

Only support ERE by default, see my awk dedicated blog.