How Linux Works, 2nd Edition

Great book for all Linux developers and administrators! Just note for future quick revisit!

Chapter1. The Big picture

The most effective way to understand how an operating system works is through abstraction—a fancy way of saying that you can ignore most of the details.

The kernel is software residing in memory that tells the CPU what to do. The kernel manages the hardware and acts primarily as an interface between the hardware and any running program.

Processes—the running programs that the kernel manages—collectively make up the system’s upper level, called user space.

+-------------------------------------------------------------------------+
|                                                                         |
|  User process                                                           |
|                                                                         |
|   +-------------------+   +-----------------+    +---------------+      |
|   |  GUI              |   |  Servers        |    |  Shell        |      |
|   |                   |   |                 |    |               |      |
|   +-------------------+   +-----------------+    +---------------+      |
|                                                                         |
+-------------------------------------------------------------------------+


+-------------------------------------------------------------------------+
|                                                                         |
|  Linux kernel                                                           |
|  +--------------+     +--------------------------+                      |
|  |  system calls|     |   process management     |                      |
|  +--------------+     +--------------------------+                      |
|                                                                         |
|  +---------------------+        +-------------------------------+       |
|  |  device driver      |        |   memory management           |       |
|  +---------------------+        +-------------------------------+       |
+-------------------------------------------------------------------------+

+-------------------------------------------------------------------------+
|                                                                         |
|  Hardware                                                               |
|                                                                         |
|  +-------------------+   +-------------------+   +---------------+      |
|  |   CPU             |   |    RAM            |   |  Disk         |      |
|  +-------------------+   +-------------------+   +---------------+      |
|  +---------------------+                                                |
|  |   Network           |                                                |
|  +---------------------+                                                |
+-------------------------------------------------------------------------+

There is a critical difference between the ways that the kernel and user processes run: The kernel runs in kernel mode, and the user processes run in user mode. Code running in kernel mode has unrestricted access to the processor and main memory. This is a powerful but dangerous privilege that allows a kernel process to easily crash the entire system. The area that only the kernel can access is called kernel space.

User mode, in comparison, restricts access to a (usually quite small) subset of memory and safe CPU operations. User space refers to the parts of main memory that the user processes can access. If a process makes a mistake and crashes, the consequences are limited and can be cleaned up by the kernel. This means that if your web browser crashes, it probably won’t take down the scientific computation that you’ve been running in the background for days.

Hardware

A CPU is just an operator on memory; it reads its instructions and data from the memory and writes data back out to the memory.

You’ll often hear the term state in reference to memory, processes, the kernel, and other parts of a computer system. Strictly speaking, a state is a particular arrangement of bits. For example, if you have four bits in your memory, 0110, 0001, and 1011 represent three different states.

The term image refers to a particular physical arrangement of bits.

Kernel

Nearly everything that the kernel does revolves around main memory. One of the kernel’s tasks is to split memory into many subdivisions, and it must maintain certain state information about those subdivisions at all times. Each process gets its own share of memory, and the kernel must ensure that each process keeps to its share.

The kernel is in charge of managing tasks in four general system areas: Processes. The kernel is responsible for determining which processes are allowed to use the CPU.

Memory. The kernel needs to keep track of all memory—what is currently allocated to a particular process, what might be shared between processes, and what is free.

Device drivers. The kernel acts as an interface between hardware (such as a disk) and processes. It’s usually the kernel’s job to operate the hardware.

System calls and support. Processes normally use system calls to communicate with the kernel.

The act of one process giving up control of the CPU to another process is called a context switch.

The kernel is responsible for context switching. To understand how this works, let’s think about a situation in which a process is running in user mode but its time slice is up. Here’s what happens:

The CPU (the actual hardware) interrupts the current process based on an internal timer, switches into kernel mode, and hands control back to the kernel.
The kernel records the current state of the CPU and memory, which will be essential to resuming the process that was just interrupted.
The kernel performs any tasks that might have come up during the preceding time slice (such as collecting data from input and output, or I/O, operations).
The kernel is now ready to let another process run. The kernel analyzes the list of processes that are ready to run and chooses one.
The kernel prepares the memory for this new process, and then prepares the CPU.
The kernel tells the CPU how long the time slice for the new process will last.
The kernel switches the CPU into user mode and hands control of the CPU to the process.

The context switch answers the important question of when the kernel runs. The answer is that it runs between process time slices during a context switch.

Modern CPUs include a memory management unit (MMU) that enables a memory access scheme called virtual memory. When using virtual memory, a process does not directly access the memory by its physical location in the hardware. Instead, the kernel sets up each process to act as if it had an entire machine to itself. When the process accesses some of its memory, the MMU intercepts the access and uses a memory address map to translate the memory location from the process into an actual physical memory location on the machine. The kernel must still initialize and continuously maintain and alter this memory address map. For example, during a context switch, the kernel has to change the map from the outgoing process to the incoming process.

The implementation of a memory address map is called a page table.

The kernel’s role with devices is pretty simple. A device is typically accessible only in kernel mode because improper access (such as a user process asking to turn off the power) could crash the machine. Another problem is that different devices rarely have the same programming interface, even if the devices do the same thing, such as two different network cards. Therefore, device drivers have traditionally been part of the kernel.

There are several other kinds of kernel features available to user processes. For example, system calls (or syscalls) perform specific tasks that a user process alone cannot do well or at all. For example, the acts of opening, reading, and writing files all involve system calls.

Other than init, all user processes on a Linux system start as a result of fork(), and most of the time, you also run exec() to start a new program instead of running a copy of an existing process.

User Space

As mentioned earlier, the main memory that the kernel allocates for user processes is called user space. Because a process is simply a state (or image) in memory, user space also refers to the memory for the entire collection of running processes.

Users

A user is an entity that can run processes and own files. A user is associated with a username. For example, a system could have a user named billyjoe. However, the kernel does not manage the usernames; instead, it identifies users by simple numeric identifiers called userids.

Users exist primarily to support permissions and boundaries.

In addition, as powerful as the root user is, it still runs in the operating system’s user mode, not kernel mode.

Groups are sets of users. The primary purpose of groups is to allow a user to share file access to other users in a group.

Chapter 2. Basic Commands and Directory Hierarchy

Some resources: <<UNIX for the Impatient>> <<Learning the UNIX Operating System>>

The shell is one of the most important parts of a Unix system. A shell is a program that runs commands. The shell also serves as a small programming environment.

Many important parts of the system are actually shell scripts—text files that contain a sequence of shell commands.

There are many different Unix shells, but all derive several of their features from the Bourne shell (/bin/sh), a standard shell developed at Bell Labs for early versions of Unix. Every Unix system needs the Bourne shell in order to function correctly, as you will see throughout this book.

Linux uses an enhanced version of the Bourne shell called bash or the “Bourne-again” shell. The bash shell is the default shell on most Linux distributions, and /bin/sh is normally a link to bash on a Linux system.

cat command: The command is called cat because it performs concatenation when it prints the contents of more than one file.

Pressing CTRL-D on an empty line stops the current standard input entry from the terminal (and often terminates a program). Don’t confuse this with CTRL-C, which terminates a program regardless of its input or output.

Unix filenames do not need extensions and often do not carry them.

shell globs don’t match dot files unless you explicitly use a pattern such as .*. This is why rm -rf ./* doesn’t remove hidden objects.

You can run into problems with globs because .* matches . and .. (the current and parent directories)

The shell can store temporary variables, called shell variables, containing the values of text strings. Shell variables are very useful for keeping track of values in scripts, and some shell variables control the way the shell behaves.

An environment variable is like a shell variable, but it’s not specific to the shell. All processes on Unix systems have environment variable storage. The main difference between environment and shell variables is that the operating system passes all of your shell’s environment variables to programs that the shell runs (for example, the sub-script), whereas shell variables cannot be accessed in the commands that you run.

Assign an environment variable with the shell’s export command. For example, if you’d like to make the $STUFF shell variable into an environment variable, use the following:

1 2	STUFF=123 export STUFF

PATH is a special environment variable that contains the command path (or path for short). A command path is a list of system directories that the shell searches when trying to locate a command.

resource: <<Learning the vi and Vim Editor>>

Some kill process ways.There are many types of signals. The default is TERM, or terminate.

1
2
3

kill -STOP pid
kill -CONT pid
kill -KILL pid  # the same as kill -9 pid

To see if you’ve accidentally suspended any processes on your current terminal, run the jobs command.

You can detach a process from the shell and put it in the “background” with the ampersand &. The best way to make sure that a background process doesn’t bother you is to redirect its output (and possibly input).

Some executable files have an s in the owner permissions listing instead of an x. This indicates that the executable is setuid, meaning that when you execute the program, it runs as though the file owner is the user instead of you. Many programs use this setuid bit to run as root in order to get the privileges they need to change system files. One example is the passwd program, which needs to change the /etc/passwd file.

Directories also have permissions. You can list the contents of a directory if it’s readable, but you can only access a file in a directory if the directory is executable. (One common mistake people make when setting the permissions of directories is to accidentally remove the execute permission when using absolute modes.)

You can specify a set of default permissions with the umask (user file-creation mode mask) shell command, which applies a predefined set of permissions to any new file you create. In general, use umask 022 if you want everyone to be able to see all of the files and directories that you create, and use umask 077 if you don’t. (You’ll need to put the umask command with the desired mode in one of your startup files to make your new default permissions apply to later sessions).

How to calculate the umask?

For directories, the base permissions are (rwxrwxrwx) 0777 and for files they are 0666 (rw-rw-rw).

You can simply subtract the umask from the base permissions to determine the final permission for file as follows: 666 – 022 = 644 subtract to get permissions of new file (666-022) : 644 (rw-r–r–)

You can simply subtract the umask from the base permissions to determine the final permission for directory as follows: 777 – 022 = 755 Subtract to get permissions of new directory (777-022) : 755 (rwxr-xr-x)

Another compression program in Unix is bzip2, whose compressed files end with .bz2. While marginally slower than gzip, bzip2 often compacts text files a little more, and it is therefore increasingly popular in the distribution of source code.

The bzip2 compression/decompression option for tar is j:

1 2	tar jcvf xx.bz2 file... tar jxvf xx.bz2

Linux Directory Hierarchy Essentials

Simplified overview of the hierarchy

                                                  +---------+
                                                  |   /     |
                                                  +-----+---+
                                                        |
    +-------------+-------------+-----------+----------------------+-----------+-----------+----------+
    |             |             |           |           |          |           |           |          |
    |             |             |           |           |          |           |           |          |
    |             |             |           |           |          |           |           |          |
    v             v             v           v           v          v           v           v          v
+---+----+    +---+----+   +----+---+  +----+---+  +----+---+  +---+---+  +----+---+  +----+----+ +---------+
|  /bin  |    |  /dev  |   |  /etc  |  |   /usr |  | /home  |  | /lib  |  |  /sbin |  |   /tmp  | |  /var   |
+--------+    +--------+   +--------+  +----+---+  +--------+  +-------+  +--------+  +---------+ +----+----+
                                            |                                                          |
                                            |                                                     +----+-----+
                                            |                                                     |          |
              +---------+----------+--------------------+----------+                              |          |
              |         |          |        |           |          |                              |          |
              v         v          v        v           v          v                              v          v
         +----+--+  +---+--+  +----+---+  +-+----+  +---+---+  +---+---+                     +----+---+  +---+----+
         | bin/  |  | man/ |  |  lib/  |  |local/|  | sbin/ |  | share/|                     | log/   |  |  /tmp  |
         +-------+  +------+  +--------+  +------+  +-------+  +-------+                     +--------+  +--------+

/bin Contains ready-to-run programs (also known as an executables), including most of the basic Unix commands such as ls and cp. Most of the programs in /bin are in binary format, having been created by a C compiler, but some are shell scripts in modern systems.
/dev Contains device files.
/etc This core system configuration directory contains the user password, boot, device, networking, and other setup files. Many items in /etc are specific to the machine’s hardware.
/home Holds personal directories for regular users.
/lib An abbreviation for library, this directory holds library files containing code that executables can use.
/proc Provides system statistics through a browsable directory-and-file interface. The /proc directory contains information about currently running processes as well as some kernel parameters.
/sys This directory is similar to /proc in that it provides a device and system interface.
/sbin The place for system executables. Programs in /sbin directories relate to system management.
/tmp A storage area for smaller, temporary files that you don’t care much about. If something is extremely important, don’t put it in /tmp because most distributions clear /tmp when the machine boots and some even remove its old files periodically. Also, don’t let /tmp fill up with garbage because its space is usually shared with something critical
/usr Although pronounced “user,” this subdirectory has no user files. Instead, it contains a large directory hierarchy, including the bulk of the Linux system. Many of the directory names in /usr are the same as those in the root directory (like /usr/bin and /usr/lib), and they hold the same type of files. (The reason that the root directory does not contain the complete system is primarily historic—in the past, it was to keep space requirements low for the root.)
/var The variable subdirectory, where programs record runtime information. System logging, user tracking, caches, and other files that system programs create and manage are here.
/boot Contains kernel boot loader files. These files pertain only to the very first stage of the Linux startup procedure.
/media A base attachment point for removable media such as flash drives that is found in many distributions.
/opt This may contain additional third-party software.

Kernel Location

On Linux systems, the kernel is normally in /vmlinuz or /boot/vmlinuz. A boot loader loads this file into memory and sets it in motion when the system boots.

Once the boot loader runs and sets the kernel in motion, the main kernel file is no longer used by the running system. However, you’ll find many modules that the kernel can load and unload on demand during the course of normal system operation. Called loadable kernel modules, they are located under /lib/modules.

Chapter 3. Devices

It’s important to understand how the kernel interacts with user space when presented with new devices. The udev system enables user-space programs to automatically configure and use new devices.

udev (userspace /dev) is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory.

Device Files

It is easy to manipulate most devices on a Unix system because the kernel presents many of the device I/O interfaces to user processes as files. These device files are sometimes called device nodes. Not only can a programmer use regular file operations to work with a device, but some devices are also accessible to standard programs like cat. However, not all devices or device capabilities are accessible with standard file I/O.

Device files are in the /dev directory, and running ls /dev reveals more than a few files in /dev.

if run

ls -l
brw-rw----     1 root disk 8, 1 Sep  6 08:37 sda1
crw-rw-rw-     1 root root 1, 3 Sep  6 08:37 null
prw-r--r--     1 root root    0 Mar  3 19:17 fdata
srw-rw-rw-     1 root root    0 Dec 18 07:43 log

if the first char in file mode is b, c, p, or s, the file is a device. These letters stand for block, character, pipe, and socket, respectively.

The numbers before the dates in the first two lines are the major and minor device numbers that help the kernel identify the device. Similar devices usually have the same major number.

Block device

Programs access data from a block device in fixed chunks. The sda1 in the preceding example is a disk device, a type of block device.

Character device

Character devices work with data streams. Printers directly attached to your computer are represented by character devices. It’s important to note that during character device interaction, the kernel cannot back up and reexamine the data stream after it has passed data to a device or process.

Pipe device

Named pipes are like character devices, with another process at the other end of the I/O stream instead of a kernel driver.

Socket device

Sockets are special-purpose interfaces that are frequently used for interprocess communication.

Not all devices have device files because the block and character device I/O interfaces are not appropriate in all cases. For example, network interfaces don’t have device files. It is theoretically possible to interact with a network interface using a single character device, but because it would be exceptionally difficult, the kernel uses other I/O interfaces.

The sysfs Device Path

To provide a uniform view for attached devices based on their actual hardware attributes, the Linux kernel offers the sysfs interface through a system of files and directories. The base path for devices is /sys/devices (this is a real directory!).

ls -ltr /sys/devices/

total 0
drwxr-xr-x 21 root root 0 Apr 25 23:18 virtual
drwxr-xr-x  3 root root 0 Apr 25 23:18 tracepoint
drwxr-xr-x 10 root root 0 Apr 25 23:18 system
drwxr-xr-x  3 root root 0 Apr 25 23:18 software
drwxr-xr-x  8 root root 0 Apr 25 23:18 pnp0
drwxr-xr-x  9 root root 0 Apr 25 23:18 platform
drwxr-xr-x 15 root root 0 Apr 25 23:18 pci0000:00
drwxr-xr-x  5 root root 0 Apr 25 23:18 msr
drwxr-xr-x  6 root root 0 Apr 25 23:18 LNXSYSTM:00
drwxr-xr-x  3 root root 0 Apr 25 23:18 breakpoint

The /dev file is there so that user processes can use the device, whereas the /sys/devices path is used to view information and manage the device. In /dev you can run:

udevadm info --query=all --name=/dev/null

P: /devices/virtual/mem/null
N: null
E: DEVMODE=0666
E: DEVNAME=/dev/null
E: DEVPATH=/devices/virtual/mem/null
E: MAJOR=1
E: MINOR=3
E: SUBSYSTEM=mem

this command will show the sysfs location /devices/virtual/mem/null

dd and Devices

The program dd is extremely useful when working with block and character devices. This program’s sole function is to read from an input file or stream and write to an output file or stream, possibly doing some encoding conversion on the way.

I am not using it.

Device Name Summary

Not necessarily as described below, may be some variations:

Hard Disks: /dev/sd*

Most hard disks attached to current Linux systems correspond to device names with an sd prefix, such as /dev/sda, /dev/sdb, and so on. These devices represent entire disks; the kernel makes separate device files, such as /dev/sda1 and /dev/sda2, for the partitions on a disk.

The sd portion of the name stands for SCSI disk.

Linux assigns devices to device files in the order in which its drivers encounter devices. This may cause problem when you remove one disk and insert another, because the device name changed for old disk. Most modern Linux systems use the Universally Unique Identifier (UUID) for persistent disk device access.

CD and DVD Drives: /dev/sr*

Linux recognizes most optical storage drives as the SCSI devices /dev/sr0, /dev/sr1, and so on.

PATA Hard Disks: /dev/hd*
Terminals: /dev/tty*, /dev/pts/*, and /dev/tty

Terminals are devices for moving characters between a user process and an I/O device, usually for text output to a terminal screen.

Pseudoterminal devices are emulated terminals that understand the I/O features of real terminals.

Two common terminal devices are /dev/tty1 (the first virtual console) and /dev/pts/0 (the first pseudoterminal device). The /dev/tty device is the controlling terminal of the current process.

teletypewriter, tty in shorthand

I am always confused, at least you need to know shell is the command line interpreter! What is the difference between Terminal, Console, Shell, and Command Line?

Linux has two primary display modes: text mode and an X Window System server (graphics mode, usually via a display manager). Although Linux systems traditionally booted in text mode, most distributions now use kernel parameters and interim graphical display mechanisms to completely hide text mode as the system is booting. In such cases, the system switches over to full graphics mode near the end of the boot process.

OK, skip rest of the content in Chapter 3.

Chapter 4. Disks and Filesystems

Schematic of a typical Linux disk:

Partitions are subdivisions of the whole disk. On Linux, they’re denoted with a number after the whole block device, and therefore have device names such as /dev/sda1 and /dev/sdb3.

Partitions are defined on a small area of the disk called a partition table.

The next layer after the partition is the filesystem, the database of files and directories that you’re accustomed to interacting with in user space.

To access data on a disk, the Linux kernel uses the system of layers like this:

Notice that you can work with the disk through the filesystem as well as directly through the disk devices.

Partitioning Disk Devices

You can view RedHat Doc for more information about partition

Let’s view the partition table:

parted -l

Model: ATA WDC WD3200AAJS-2 (scsi)
Disk /dev/sda: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number   Start   End    Size   Type      File system    Flags
 1       1049kB  316GB  316GB  primary   ext4           boot
 2       316GB   320GB  4235MB extended
 5       316GB   320GB  4235MB logical   linux-swap(v1)

Model: FLASH Drive UT_USB20 (scsi)
Disk /dev/sdf: 4041MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size     File system  Name        Flags
 1      17.4kB  1000MB  1000MB                myfirst
 2      1000MB  4040MB  3040MB                mysecond

There are 2 different partition tables: MBR (msdos) and GPT (gpt). The MBR table in this example contains primary, extended, and logical partitions.

Changing Partition Tables

You can use parted command to change partition. Check /proc/partitions can get full partition information.

cat /proc/partitions

major minor  #blocks  name
 252        0  262144000 vda
 252        1    1048576 vda1
 252        2  260976640 vda2
 253        0  252706816 dm-0
 253        1    8257536 dm-1

Filesystems

The last link between the kernel and user space for disks is typically the file-system; this is what you’re accustomed to interacting with when you run commands such as ls and cd. As previously mentioned, the filesystem is a form of database; it supplies the structure to transform a simple block device into the sophisticated hierarchy of files and subdirectories that users can understand.

Filesystem Types

The Fourth Extended filesystem (ext4) is the current iteration of a line of filesystems native to Linux. The Second Extended filesystem (ext2) was a longtime default for Linux systems inspired by traditional Unix filesystems such as the Unix File System (UFS) and the Fast File System (FFS). The Third Extended filesystem (ext3) added a journal feature (a small cache outside the normal filesystem data structure) to enhance data integrity and hasten booting. The ext4 filesystem is an incremental improvement with support for larger files than ext2 or ext3 support and a greater number of subdirectories.

Create a Filesystems

Once you’re done with the partitioning process, you’re ready to create filesystems. As with partitioning, you’ll do this in user space because a user-space process can directly access and manipulate a block device.

For example, you can create an ext4 partition on /dev/sdf2

1	mkfs -t ext4 /dev/sdf2

Filesystem creation is a task that you should only need to perform after adding a new disk or repartitioning an old one. You should create a filesystem just once for each new partition that has no preexisting data (or that has data that you want to remove). Creating a new filesystem on top of an existing filesystem will effectively destroy the old data.

It turns out that mkfs is only a frontend for a series of filesystem creation programs:

ls -l /sbin/mkfs.*

-rwxr-xr-x. 1 root root 375240 Mar  7  2017 /sbin/mkfs.btrfs
-rwxr-xr-x  1 root root  37080 Jul 12  2018 /sbin/mkfs.cramfs
-rwxr-xr-x  4 root root  96384 Apr 10  2018 /sbin/mkfs.ext2
-rwxr-xr-x  4 root root  96384 Apr 10  2018 /sbin/mkfs.ext3
-rwxr-xr-x  4 root root  96384 Apr 10  2018 /sbin/mkfs.ext4
-rwxr-xr-x  1 root root  37184 Jul 12  2018 /sbin/mkfs.minix
-rwxr-xr-x. 1 root root 368504 Feb 27  2018 /sbin/mkfs.xfs

Mounting a Filesystem

On Unix, the process of attaching a filesystem is called mounting. When the system boots, the kernel reads some configuration data and mounts root (/) based on the configuration data.

When mounting a filesystem, the common terminology is mount a device on a mount point.

To see current system mount status:

mount 
...
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
/dev/mapper/rhel-root on / type xfs (rw,relatime,attr2,inode64,noquota)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/vda1 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=800956k,mode=700)
...

There are 3 key fields:

The filesystem’s device, such as a disk partition; where the actual file-system data resides
The filesystem type
The mount point—that is, the place in the current system’s directory hierarchy where the filesystem will be attached.

For example, to mount the Fourth Extended filesystem /dev/sdf2 on /home/extra, use this command:

1	mount -t ext4 /dev/sdf2 /home/extra

To unmount (detach) a filesystem, use the umount command:

1	umount mountpoint

Filesystem UUID

You can identify and mount filesystems by their Universally Unique Identifier (UUID), a software standard. The UUID is a type of serial number, and each one should be different.

For example, if you know the UUID of /dev/sdf2 is a9011c2b-1c03-4288-b3fe-8ba961ab0898, so you can mount it as:

1	mount UUID=a9011c2b-1c03-4288-b3fe-8ba961ab0898 /home/extra

Here no -t ext4 option, because mount know that.

To view a list of devices and the corresponding filesystems and UUIDs on your system, use the blkid (block ID) program:

blkid

/dev/sdf2: UUID="a9011c2b-1c03-4288-b3fe-8ba961ab0898" TYPE="ext4"
/dev/sda1: UUID="70ccd6e7-6ae6-44f6-812c-51aab8036d29" TYPE="ext4"
/dev/sda5: UUID="592dcfd1-58da-4769-9ea8-5f412a896980" TYPE="swap"
/dev/sde1: SEC_TYPE="msdos" UUID="3762-6138" TYPE="vfat"

For one thing, they’re the preferred way to automatically mount filesystems in /etc/fstab at boot time.

Disk Buffering, Caching, and Filesystems

Linux, like other versions of Unix, buffers writes to the disk. This means that the kernel usually doesn’t immediately write changes to filesystems when processes request changes. Instead it stores the changes in RAM until the kernel can conveniently make the actual change to the disk. This buffering system is transparent to the user and improves performance.

This is the reason why before we remove the USB, we need to unmount it in case of data lose.

When you unmount a filesystem with umount, the kernel automatically synchronizes with the disk. At any other time, you can force the kernel to write the changes in its buffer to the disk by running the sync command.

The /etc/fstab Filesystem Table

I encounter this when write /etc/fstab file with NFS when developing k8s.

/dev/mapper/rhel-root   /                       xfs     defaults        0 0
UUID=a44461e9-e1d7-45fd-a387-255fafd14746 /boot                   xfs     defaults        0 0
/dev/mapper/rhel-swap   swap                    swap    defaults        0 0
halos1.fyre.ibm.com:/data /mnt nfs defaults,timeo=10,retrans=3,rsize=1048576,wsize=1048576 0 0

To mount filesystems at boot time and take the drudgery out of the mount command, Linux systems keep a permanent list of filesystems and options in /etc/fstab.

The device or UUID. Most current Linux systems no longer use the device in /etc/fstab, preferring the UUID.
The mount point. Indicates where to attach the filesystem.
The filesystem type.
Options. Use long mount options separated by commas.
Backup information for use by the dump command. You should always use a 0 in this field.
The filesystem integrity test order. To ensure that fsck always runs on the root first, always set this to 1 for the root filesystem and 2 for any other filesystems on a hard disk. Use 0 to disable the bootup check for everything else, including CD-ROM drives, swap, and the /proc file-system

You can also try to mount all entries at once in /etc/fstab that do not contain the noauto option with this command:

mount -a

Let’s see some commonly use options:

defaults. This uses the mount defaults: read-write mode, enable device files, executables, the setuid bit, and so on. Use this when you don’t want to give the filesystem any special options but you do want to fill all fields in /etc/fstab.
noauto. This option tells a mount -a command to ignore the entry.

Filesystem Capacity

To view the size and utilization of your currently mounted filesystems, use the df command.

df -BM

Filesystem                1M-blocks   Used Available Use% Mounted on
/dev/mapper/rhel-root       245640M 75636M   170005M  31% /
devtmpfs                      7931M     0M     7931M   0% /dev
tmpfs                         7943M     0M     7943M   0% /dev/shm
tmpfs                         7943M   835M     7109M  11% /run
tmpfs                         7943M     0M     7943M   0% /sys/fs/cgroup
/dev/vda1                     1014M   183M      832M  19% /boot
...

Checking and Repairing Filesystems

Filesystem errors are usually due to a user shutting down the system in a rude way (for example, by pulling out the power cord). In such cases, the filesystem cache in memory may not match the data on the disk, and the system also may be in the process of altering the filesystem when you happen to give the computer a kick. Although a new generation of filesystems supports journals to make filesystem corruption far less common, you should always shut the system down properly. And regardless of the filesystem in use, filesystem checks are still necessary every now and to maintain sanity.

The tool to check a filesystem is fsck.

In the worst cases, you can try:

You can try to extract the entire filesystem image from the disk with dd and transfer it to a partition on another disk of the same size.
You can try to patch the filesystem as much as possible, mount it in read-only mode, and salvage what you can.
You can try debugfs.

Special-Purpose Filesystems

Not all filesystems represent storage on physical media. Specifically, most versions of Unix have filesystems that serve as system interfaces. That is, rather than serving only as a means to store data on a device, a filesystem can represent system information such as process IDs and kernel diagnostics.

The special filesystem types in common use on Linux include the following:

proc. Mounted on /proc. The name proc is actually an abbreviation for process. Each numbered directory inside /proc is actually the process ID of a current process on the system; the files in those directories represent various aspects of the processes. The file /proc/self represents the current process.
sysfs. Mounted on /sys.
tmpfs. Mounted on /run and other locations. With tmpfs, you can use your physical memory and swap space as temporary storage, stored in volatile memory instead of a persistent storage device.

Swap Space

Not every partition on a disk contains a filesystem. It’s also possible to augment the RAM on a machine with disk space. The disk area used to store memory pages is called swap space (or just swap for short).

you can use free command to see the swap usage:

free -m

              total        used        free      shared  buff/cache   available
Mem:          32010       10894        3992        1605       17123       18811
Swap:          8063          64        7999

you can use a disk partition and a regular file as swap space, for disk:

Ensure partition is empty
Run mkswap dev, dev is the partition device
Execute swapon dev to register the space with the kernel.
Register in /etc/fstab file

Use these commands to create an empty file, initialize it as swap, and add it to the swap pool:

1
2
3

dd if=/dev/zero of=swap_file bs=1024k count=num_mb
mkswap swap_file
swapon swap_file

Here, swap_file is the name of the new swap file, and num_mb is the desired size, in megabytes.

To remove a swap partition or file from the kernel’s active pool, use the swapoff command.

Note some administrators configure certain systems with no swap space at all. For example, high-performance network servers should never dip into swap space and should avoid disk access if at all possible.

It’s dangerous to do this on a general-purpose machine. If a machine completely runs out of both real memory and swap space, the Linux kernel invokes the out-of-memory (OOM) killer to kill a process in order to free up some memory. You obviously don’t want this to happen to your desktop applications. On the other hand, high-performance servers include sophisticated monitoring and load-balancing systems to ensure that they never reach the danger zone.

Looking Forward: Disks and User Space

In disk-related components on a Unix system, the boundaries between user space and the kernel can be difficult to characterize. As you’ve seen, the kernel handles raw block I/O from the devices, and user-space tools can use the block I/O through device files. However, user space typically uses the block I/O only for initializing operations such as partitioning, file-system creation, and swap space creation.

In normal use, user space uses only the filesystem support that the kernel provides on top of the block I/O.

Chapter 5. How the Linux Kernel Boots

You’ll learn how the kernel moves into memory up to the point where the first user process starts.

A simplified view of the boot process looks like this:

The machine’s BIOS or boot firmware loads and runs a boot loader.
The boot loader finds the kernel image on disk, loads it into memory, and starts it.
The kernel initializes the devices and its drivers.
The kernel mounts the root filesystem.
The kernel starts a program called init with a process ID of 1. This point is the user space start.
init sets the rest of the system processes in motion.
At some point, init starts a process allowing you to log in, usually at the end or near the end of the boot.

Startup Messages

There are two ways to view the kernel’s boot and runtime diagnostic messages:

Look at the kernel system log file. You’ll often find this in /var/log/ kern.log, but depending on how your system is configured, it might also be lumped together with a lot of other system logs in /var/log/messages or elsewhere.
Use the dmesg command, but be sure to pipe the output to less because there will be much more than a screen’s worth. The dmesg command uses the kernel ring buffer, which is of limited size, but most newer kernels have a large enough buffer to hold boot messages for a long time.

Kernel Initialization and Boot Options

Upon startup, the Linux kernel initializes in this general order:

CPU inspection
Memory inspection
Device bus discovery
Device discovery
Auxiliary kernel subsystem setup (networking, and so on)
Root filesystem mount
User space start

The following memory management messages are a good indication that the user-space handoff is about to happen because this is where the kernel protects its own memory from user-space processes:

[    0.972934] Freeing unused kernel memory: 1844k freed
[    0.973411] Write protecting the kernel read-only data: 12288k
[    0.975623] Freeing unused kernel memory: 832k freed
[    0.977405] Freeing unused kernel memory: 676k freed

Kernel Parameters

I just encountered an issue about kernel parameters for Db2… Let’s see.

When running the Linux kernel, the boot loader passes in a set of text-based kernel parameters that tell the kernel how it should start. The parameters specify many different types of behavior, such as the amount of diagnostic output the kernel should produce and device driver–specific options.

You can view the kernel parameters from your system’s boot by looking at the /proc/cmdline file:

1	BOOT_IMAGE=/vmlinuz-3.10.0-862.14.4.el7.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet elevator=noop LANG=en_US.UTF-8

The root=/dev/mapper/rhel-root is where root filesystem resides.

Boot Loader

Other boot loader intro

At the start of the boot process, before the kernel and init start, a boot loader starts the kernel. The task of a boot loader sounds simple: It loads the kernel into memory, and then starts the kernel with a set of kernel parameters.

Kernel and its parameters are usually somewhere on the root filesystem.

On PCs, boot loaders use the Basic Input/Output System (BIOS) or Unified Extensible Firmware Interface (UEFI) to access disks. Nearly all disk hardware has firmware that allows the BIOS to access attached storage hardware with Linear Block Addressing (LBA). Although it exhibits poor performance, this mode of access does allow universal access to disks. Boot loaders are often the only programs to use the BIOS for disk access; the kernel uses its own high-performance drivers.

Most modern boot loaders can read partition tables and have built-in support for read-only access to filesystems.

Boot loader tasks

Select among multiple kernels.
Switch between sets of kernel parameters.
Allow the user to manually override and edit kernel image names and parameters
Provide support for booting other operating systems.

Boot loader typres

GRUB. A near-universal standard on Linux systems (mainly talks about this)
LILO. One of the first Linux boot loaders.
LOADLIN. Boots a kernel from MS-DOS

GRUB Introduction

GRUB stands for Grand Unified Boot Loader. We’ll cover GRUB 2.

This section talks about GRUB menu and look into some boot options, actually, if you check /boot directory, you will see kernel image file and initial RAM filesystem:

...
-rwxr-xr-x. 1 root root  6381872 Mar 21  2018 vmlinuz-3.10.0-862.el7.x86_64
-rw-r--r--. 1 root root   304926 Mar 21  2018 symvers-3.10.0-862.el7.x86_64.gz
drwx------. 5 root root       97 Oct  1  2018 grub2
-rw-------  1 root root 21096334 Oct  1  2018 initramfs-3.10.0-862.9.1.el7.x86_64.img
...

Not interested in the rest of the content in this chapter.

Chapter 6. How User Space Starts

The point where the kernel starts its first user-space process, init, is significant—not just because that’s where the memory and CPU are finally ready for normal system operation, but because that’s where you can see how the rest of the system builds up as a whole.

User space is far more modular. It’s much easier to see what goes into the user space startup and operation.

User space starts in roughly this order:

init
Essential low-level services such as udevd and syslogd
Network configuration
Mid- and high-level services (cron, printing, and so on)
Login prompts, GUIs, and other high-level applications

Introduction to init

wiki init

The init program is a user-space program like any other program on the Linux system, and you’ll find it in /sbin along with many of the other system binaries. Its main purpose is to start and stop the essential service processes on the system, but newer versions have more responsibilities.

In my vm /sbin directory:

1	lrwxrwxrwx 1 root root 22 Oct 1 2018 init -> ../lib/systemd/systemd

There are three major implementations of init in Linux distributions:

System V init. A traditional sequenced init (Sys V, usually pronounced “sys-five”). Red Hat Enterprise Linux and several other distributions use this version.
systemd. The emerging standard for init. Many distributions have moved to systemd, and most that have not yet done so are planning to move to it.
Upstart. The init on Ubuntu installations. However, as of this writing, Ubuntu has also planned to migrate to systemd.

There are many different implementations of init because System V init and other older versions relied on a sequence that performed only one startup task at a time. systemd and Upstart attempt to remedy the performance issue by allowing many services to start in parallel thereby speeding up the boot process.

System V Runlevels

wiki Runlevel

At any given time on a Linux system, a certain base set of processes is running. In System V init, this state of the machine is called its runlevel, which is denoted by a number from 0 through 6. A system spends most of its time in a single runlevel, but when you shut the machine down, init switches to a different runlevel in order to terminate the system services in an orderly fashion and to tell the kernel to stop.

You can check your system’s runlevel with the who -r command:

1
2
3

who -r

run-level 3  2019-04-17 13:49

Runlevels serve various purposes, but the most common one is to distinguish between system startup, shutdown, single-user mode, and console mode states.

But runlevels are becoming a thing of the past. Even though all three init versions in this book support them, systemd and Upstart consider runlevels obsolete as end states for the system.

Identifying Your init

If your system has /usr/lib/systemd and /etc/systemd directories, you have systemd.
If you have an /etc/init directory that contains several .conf files, you’re probably running Upstart
If neither of the above is true, but you have an /etc/inittab file, you’re probably running System V init.

Here I focus on systemd

systemd

The systemd init is one of the newest init implementations on Linux. In addition to handling the regular boot process, systemd aims to incorporate a number of standard Unix services such as cron and inetd. One of its most significant features is its ability to defer the start of services and operating system features until they are necessary.

Let’s outline what happens when systemd runs at boot time:

systemd loads its configuration.
systemd determines its boot goal, which is usually named default.target.
systemd determines all of the dependencies of the default boot goal, dependencies of these dependencies, and so on.
systemd activates the dependencies and the boot goal.
After boot, systemd can react to system events (such as uevents) and activate additional components.

Units and Unit Types

One of the most interesting things about systemd is that it does not just operate processes and services; it can also mount filesystems, monitor network sockets, run timers, and more. Each type of capability is called a unit type, and each specific capability is called a unit. When you turn on a unit, you activate it.

The default boot goal is usually a target unit that groups together a number of service and mount units as dependencies.

understand systemd unit and unit files

systemd Dependencies

To accommodate the need for flexibility and fault tolerance, systemd offers a myriad of dependency types and styles:

Requires Strict dependencies. When activating a unit with a Requires dependency unit, systemd attempts to activate the dependency unit. If the dependency unit fails, systemd deactivates the dependent unit.
Wants. Dependencies for activation only. Upon activating a unit, systemd activates the unit’s Wants dependencies, but it doesn’t care if those dependencies fail.
Requisite. Units that must already be active.
Conflicts. Negative dependencies. When activating a unit with a Conflict dependency, systemd automatically deactivates the dependency if it is active.

There are many other dependency syntax, like ordering, conditional, etc…

systemd Configuration

The systemd configuration files are spread among many directories across the system, so you typically won’t find the files for all of the units on a system in one place.

That said, there are two main directories for systemd configuration: the system unit directory (globally configured, usually /usr/lib/systemd/system) and a system configuration directory (local definitions, usually /etc/systemd/system).

Note: Avoid making changes to the system unit directory because your distribution will maintain it for you. Make your local changes to the system configuration directory.

To see the system unit and configuration directories on your system, use the following commands:

1	pkg-config systemd --variable=systemdsystemunitdir

Let’s see Unit files in /usr/lib/systemd/system, there is a sshd.service file:

[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target sshd-keygen.service
Wants=sshd-keygen.service

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/sshd
ExecStart=/usr/sbin/sshd -D $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

The [Unit] section gives some details about the unit and contains description and dependency information.

You’ll find the details about the service in the [Service] section, including how to prepare, start, and reload the service.

During normal operation, systemd ignores the [Install] section. However, consider the case when sshd.service is disabled on your system and you would like to turn it on. When you enable a unit, systemd reads the [Install] section.

The [Install] section is usually responsible for the the .wants and .requires directories in the system configuration directory (/etc/systemd/system), see:

basic.target.wants                                       getty.target.wants           remote-fs.target.wants
default.target                                           local-fs.target.wants        sockets.target.wants
default.target.wants                                     multi-user.target.wants      sysinit.target.wants
dev-virtio\x2dports-org.qemu.guest_agent.0.device.wants  network-online.target.wants  system-update.target.wants

the $OPTIONS in unit file is the variable, also specifier is another variable-like feature often found in unit files, like %n and %H.

systemd Operation

You’ll interact with systemd primarily through the systemctl command, which allows you to activate and deactivate services, list status, reload the configuration, and much more.

List of active units:

systemctl

  UNIT                                           LOAD   ACTIVE SUB       DESCRIPTION
...
  sys-kernel-debug.mount                         loaded active mounted   Debug File System
  var-lib-nfs-rpc_pipefs.mount                   loaded active mounted   RPC Pipe File System
  brandbot.path                                  loaded active waiting   Flexible branding
...

List all units, includes inactives:

1	systemctl --all

Get status of a unit:

1	systemctl status sshd.service

To activate, deactivate, and restart units, use the systemd start, stop, and restart commands. However, if you’ve changed a unit configuration file, you can tell systemd to reload the file in one of two ways:

1	systemctl reload unit #Reloads just the configuration for unit.

1	systemctl daemon-reload #Reloads all unit configurations.

systemd Process Tracking and Synchronization

systemd wants a reasonable amount of information and control over every process that it starts. The main problem that it faces is that a service can start in different ways; it may fork new instances of itself or even daemonize and detach itself from the original process.

To minimize the work that a package developer or administrator needs to do in order to create a working unit file, systemd uses control groups (cgroups), an optional Linux kernel feature that allows for finer tracking of a process hierarchy.

systemd On-Demand and Resource-Parallelized Startup

One of systemd’s most significant features is its ability to delay a unit startup until it is absolutely needed.

systemd Auxiliary Programs

When starting out with systemd, you may notice the exceptionally large number of programs in /lib/systemd. These are primarily support programs for units. For example, udevd is part of systemd, and you’ll find it there as systemd-udevd. Another, the systemd-fsck program, works as a middleman between systemd and fsck.

Shutting Down Your System

init controls how the system shuts down and reboots. The commands to shut down the system are the same regardless of which version of init you run. The proper way to shut down a Linux machine is to use the shutdown command.

to shutdown machine immediately:

1	shutdown -h now

to reboot the machine now:

1	shutdown -r now

When system shutdown time finally arrives, shutdown tells init to begin the shutdown process. On systemd, it means activating the shutdown units; and on System V init, it means changing the runlevel to 0 or 6.

The Initial RAM Filesystem

The initramfs is in /boot directory.

ls -ltr | grep init

-rw-------. 1 root root 55376391 Apr 13  2018 initramfs-0-rescue-e57cfe9136e9430587366e04f14195e1.img
-rw-------. 1 root root 13131435 Apr 13  2018 initramfs-3.10.0-862.el7.x86_64kdump.img
-rw-------  1 root root 21098233 Jul 23  2018 initramfs-3.10.0-862.el7.x86_64.img
-rw-------  1 root root 21134858 Oct  1  2018 initramfs-3.10.0-862.14.4.el7.x86_64.img
-rw-------  1 root root 21096334 Oct  1  2018 initramfs-3.10.0-862.9.1.el7.x86_64.img

The problem stems from the availability of many different kinds of storage hardware. Remember, the Linux kernel does not talk to the PC BIOS or EFI interfaces to get data from disks, so in order to mount its root file-system, it needs driver support for the underlying storage mechanism.

The workaround is to gather a small collection of kernel driver modules along with a few other utilities into an archive. The boot loader loads this archive into memory before running the kernel.

Chapter 7. System Configuration

When you first look in the /etc directory, you might feel a bit overwhelmed. Although most of the files that you see affect a system’s operations to some extent, a few are fundamental.

The Structure of /etc

Most system configuration files on a Linux system are found in /etc. Historically, each program had one or more configuration files there, and because there are so many packages on a Unix system, /etc would accumulate files quickly.

The trend for many years now has been to place system configuration files into subdirectories under /etc. There are still a few individual configuration files in /etc, but for the most part, if you run ls -F /etc, you’ll see that most of the items there are now subdirectories.

What kind of configuration files are found in /etc? The basic guideline is that customizable configurations for a single machine. And you’ll often find that noncustomizable system configuration files may be found elsewhere, as with the prepackaged systemd unit files in /usr/lib/systemd.

System Logging

Most system programs write their diagnostic output to the syslog service. The traditional syslogd daemon waits for messages and, depending on the type of message received, funnels the output to a file, the screen, users, or some combination of these, or just ignores it.

The System Logger

Most Linux distributions run a new version of syslogd called rsyslogd that does much more than simply write log messages to files. For example, in my vm:

systemctl status rsyslog

   rsyslog.service - System Logging Service
   Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-04-17 13:49:14 PDT; 2 weeks 6 days ago
...

Many of the files in /var/log aren’t maintained by the system logger. The only way to know for sure which ones belong to rsyslogd is to look at its configuration file.

Configuration Files

The base rsyslogd configuration file is /etc/rsyslog.conf, but you’ll find certain configurations in other directories, such as /etc/rsyslog.d.

It talks about the syntax in the configuration file: The configuration format is a blend of traditional rules and rsyslog-specific extensions. One rule of thumb is that anything beginning with a dollar sign ($) is an extension.

User Management Files

Unix systems allow for multiple independent users. At the kernel level, users are simply numbers (user IDs).

The /etc/passwd File

The plaintext file /etc/passwd maps usernames to user IDs.

root:x:0:0:Superuser:/root:/bin/sh
######
daemon:*:1:1:daemon:/usr/sbin:/bin/sh
#or
daemon:x:2:2:daemon:/sbin:/sbin/nologin
######
bin:*:2:2:bin:/bin:/bin/sh
sys:*:3:3:sys:/dev:/bin/sh
nobody:*:65534:65534:nobody:/home:/bin/false
juser:x:3119:1000:J. Random User:/home/juser:/bin/bash
beazley:x:143:1000:David Beazley:/home/beazley:/bin/bash

The fields are as follows:

The username.
The user’s encrypted password. On most Linux systems, the password is not actually stored in the passwd file, but rather, in the shadow file . Normal users do not have read permission for shadow. The second field in passwd or shadow is the encrypted password, Unix passwords are never stored as clear text.
An x in the second passwd file field indicates that the encrypted password is stored in the shadow file. A * indicates that the user cannot log in, and if the field is blank (that is, you see two colons in a row, like ::), no password is required to log in. (Beware of blank passwords. You should never have a user without a password.)
The user ID (UID), which is the user’s representation in the kernel.
The group ID (GID). This should be one of the numbered entries in the /etc/group file. Groups determine file permissions and little else. This group is also called the user’s primary group.
The user’s real name. You’ll sometimes find commas in this field, denoting room and telephone numbers.
The user’s home directory.
The user’s shell (the program that runs when the user runs a terminal session).

Special Users

The superuser (root) always has UID 0 and GID 0. Some users, such as daemon, have no login privileges. The nobody user is an underprivileged user. Some processes run as nobody because the nobody user cannot write to anything on the system.

The users that cannot log in are called pseudo-users. Although they can’t log in, the system can start processes with their user IDs. Pseudo-users such as nobody are usually created for security reasons.

The /etc/shadow File

The shadow password file /etc/shadow on a Linux system normally contains user authentication information, including the encrypted passwords and password expiration information that correspond to the users in /etc/passwd.

Regular users interact with /etc/passwd using the passwd command. By default, passwd changes the user’s password. The passwd command is an suid-root program, because only the superuser can change the /etc/passwd file.

1	-rwsr-xr-x. 1 root root 27832 Jan 29 2014 /usr/bin/passwd

in /etc/shells file have multiple shell types:

> /bin/sh
> /bin/bash
> /sbin/nologin
> /usr/bin/sh
> /usr/bin/bash
> /usr/sbin/nologin
> /bin/ksh
> /bin/rksh
>

Because /etc/passwd is plaintext, the superuser may use any text editor to make changes. To add a user, simply add an appropriate line and create a home directory for the user; to delete, do the opposite. However, to edit the file, you’ll most likely want to use the vipw program

Use adduser and userdel to add and remove users. Run passwd user as the superuser.

Working with Groups

Groups in Unix offer a way to share files with certain users but deny access to all others. The idea is that you can set read or write permission bits for a particular group, excluding everyone else.

The /etc/group file defines the group IDs:

root:*:0:juser
daemon:*:1:
bin:*:2:
disk:*:6:juser,beazley
nogroup:*:65534:
user:*:1000:

The group name.
The group password. This is hardly ever used, nor should you use it. Use * or any other default value.
The group ID (a number). The GID must be unique within the group file. This number goes into a user’s group field in that user’s /etc/passwd entry.
An optional list of users that belong to the group. In addition to the users listed here, users with the corresponding group ID in their passwd file entries also belong to the group.

Linux distributions often create a new group for each new user added, with the same name as the user.

Setting the Time

Unix machines depend on accurate timekeeping. The kernel maintains the system clock, which is the clock that is consulted when you run commands like date.

PC hardware has a battery-backed real-time clock (RTC). The RTC isn’t the best clock in the world, but it’s better than nothing. The kernel usually sets its time based on the RTC at boot time, and you can reset the system clock to the current hardware time with hwclock.

You should not try to fix the time drift with hwclock because time-based system events can get lost or mangled. Usually it’s best to keep your system time correct with a network time daemon.

Network Time

If your machine is permanently connected to the Internet, you can run a Network Time Protocol (NTP) daemon to maintain the time using a remote server. Many distributions have built-in support for an NTP daemon, but it may not be enabled by default. You might need to install an ntpd package to get it to work.

Scheduling Recurring Tasks with cron

The Unix cron service runs programs repeatedly on a fixed schedule. Most experienced administrators consider cron to be vital to the system because it can perform automatic system maintenance. For example, cron runs log file rotation utilities to ensure that your hard drive doesn’t fill up with old log files. You should know how to use cron because it’s just plain useful.

Also see cronjob in k8s doc.

You can run any program with cron at whatever times suit you. The program running through cron is called a cron job. To install a cron job, you’ll create an entry line in your crontab file, usually by running the crontab command.

for example:

1	15 09 * * * /home/juser/bin/spmake

Minute (0 through 59). The cron job above is set for minute 15.
Hour (0 through 23). The job above is set for the ninth hour.
Day of month (1 through 31).
Month (1 through 12).
Day of week (0 through 7). The numbers 0 and 7 are Sunday.

A * in any field means to match every value. The preceding example runs spmake daily because the day of month, month, and day of week fields are all filled with stars, which cron reads as “run this job every day, of every month, of every week.”

also can be 5th and the 14th day of each month:

1	15 09 5,14 * * /home/juser/bin/spmake

Installing Crontab Files

Each user can have his or her own crontab file, which means that every system may have multiple crontabs, usually found in /var/spool/cron/ folder. the crontab command installs, lists, edits, and removes a user’s crontab.

The easiest way to install a crontab is to put your crontab entries into a file and then use crontab file to install file as your current crontab.

Actually, there is a default place for every user crontab file includes root. Once you create a crontab file for the user, the corresponding folder is put under /var/spool/cron/`.

For example, run as root, I want to set a recurring task for user dsadm:

1	crontab -u dsadm -e

Then edit like this:

1	00 21 * * * /home/dsadm/test.sh > /tmp/cron-log 2>&1

after run the job, go to /tmp folder you will see the log file.

to list the dsadm cron job:

1	crontab -l -u dsadm

to remove cron job for dsadm

1	crontab -r -u dsadm

System Crontab Files

Linux distributions normally have an /etc/crontab file. You can also edit here, but the format is a little bit difference:

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed

Understanding User IDs and User Switching

We’ve discussed how setuid programs such as sudo and su allow you to change users:

1 2	---s--x--x 1 root root 143248 May 28 2018 /usr/bin/sudo -rwsr-xr-x 1 root root 32184 Jul 12 2018 /usr/bin/su

In reality, every process has more than one user ID. When you run a setuid program, Linux sets the effective user ID to the program’s owner during execution, but it keeps your original user ID in the real user ID.

Think of the effective user ID as the actor and the real user ID as the owner. The real user ID defines the user that can interact with the running process—most significantly, which user can kill and send signals to a process. For example, if user A starts a new process that runs as user B (based on setuid permissions), user A still owns the process and can kill it.

On normal Linux systems, most processes have the same effective user ID and real user ID. I verify this by a test.sh script run as dsadm, the euser and ruser are the same:

1	-rwsr-xr-x 1 root root 56 May 11 22:17 test.sh

By default, ps and other system diagnostic programs show the effective user ID.

In conductor container, I run many su commands, you can see this, the euser and ruser are different:

ps -eo pid,euser,ruser,comm

1574 root     dsadm    su
2735 root     dsadm    su
4535 root     dsadm    su

PAM

In 1995 Sun Microsystems proposed a new standard called Pluggable Authentication Modules (PAM), a system of shared libraries for authentication. To authenticate a user, an application hands the user to PAM to determine whether the user can successfully identify itself.

Because there are many kinds of authentication scenarios, PAM employs a number of dynamically loadable authentication modules. Each module performs a specific task; for example, the pam_unix.so module can check a user’s password.

PAM Configuration

You’ll normally find PAM’s application configuration files in the /etc/pam.d directory (older systems may use a single /etc/pam.conf file).

Let’s see an example:

1	auth requisite pam_shells.so

Each configuration line has three fields: function type, control argument, and module:

Function type. The function that a user application asks PAM to perform. Here, it’s auth, the task of authenticating the user.
Control argument. This setting controls what PAM does after success or failure of its action for the current line (requisite in this example).
Module. The authentication module that runs for this line, determining what the line actually does. Here, the pam_shells.so module checks to see whether the user’s current shell is listed in /etc/shells.

PAM configuration is detailed on the pam.conf(5) manual page:

1	man 5 pam.conf

Chapter 8. A Closer Look at Processes and Resource Utilization

This chapter takes you deeper into the relationships between processes, the kernel, and system resources.

Many of the tools that you see in this chapter are often thought of as performance-monitoring tools. They’re particularly helpful if your system is slowing to a crawl and you’re trying to figure out why.

Tracking Processes

The top program is often more useful than ps because it displays the current system status as well as many of the fields in a ps listing, and it updates the display every second.

You can send commands to top with keystrokes. When you enter top command:

Tasks: 382 total,   2 running, 380 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.2 us,  0.5 sy,  0.0 ni, 96.4 id,  0.4 wa,  0.0 hi,  0.2 si,  1.3 st
KiB Mem :  8009536 total,   441360 free,   930236 used,  6637940 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  6284448 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10438 root      20   0  519148 282140  19392 S   4.6  3.5 586:19.28 kube-apiserver
10449 root      20   0  275308  80184  14584 S   4.3  1.0 516:46.86 kube-controller
 9691 root      20   0 1648968 159104  30708 S   2.6  2.0 492:06.83 kubelet
10206 root      20   0   10.1g  56568   7760 S   2.0  0.7 262:08.45 etcd
19459 root      20   0   82100  53780   9392 S   1.7  0.7 209:35.52 calico-node
...

If you see task %CPU is larger than 100, it must be a multi-thread and leverages multi-core. You can use H to toggle thread display rather than task, you will see multi-thead and each of them %CPU.

Note: if you want to see memory in MB, GB… Typing shift + e/E cycle through.

then type followings:

Spacebar: Updates the display immediately.
H: show threads in stead of tasks.
M: Sorts by current resident memory usage.
T: Sorts by total (cumulative) CPU usage.
P: Sorts by current CPU usage (the default).
u: Displays only one user’s processes.
f: Selects different statistics to display and sort. (use arrow to move and space to select)
?: Displays a usage summary for all top commands.

Finding Open Files by lsof

One use for this command is when a disk cannot be unmounted because (unspecified) files are in use. The listing of open files can be consulted (suitably filtered if necessary) to identify the process that is using the files.

The lsof command lists open files and the processes using them. lsof doesn’t stop at regular files, it can list network resources, dynamic libraries, pipes, and more.

For example: Display entries for open files in /usr directory and successors.

1	lsof /usr/*

List open files for a particular PID:

1	lsof -p 1623

Tracing Program Execution and System Calls

The most common use is to start a program using strace, which prints a list of system calls made by the program. This is useful if the program continually crashes, or does not behave as expected; for example using strace may reveal that the program is attempting to access a file which does not exist or cannot be read.

The strace (system call trace) and ltrace (library trace) commands can help you discover what a program attempts to do. These tools produce extraordinarily large amounts of output, but once you know what to look for, you’ll have more tools at your disposal for tracking down problems.

For example:

1	strace cat not_a_file

you get errors in open("not_a_file", O_RDONLY) line:

execve("/usr/bin/cat", ["cat", "not_a_file"], [/* 23 vars */]) = 0
brk(NULL)                               = 0x81f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc85159d000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
...
...
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
open("not_a_file", O_RDONLY)            = -1 ENOENT (No such file or directory)
write(2, "cat: ", 5cat: )                    = 5
write(2, "not_a_file", 10not_a_file) 
...

Threads

In Linux, some processes are divided into pieces called threads.

To display the thread information in ps, add the m option. For example:

1	ps axm -o pid,tid,command

PID   TID   COMMAND
1891     - db2ckpwd 0
    -  1891 -
1892     - db2ckpwd 0
    -  1892 -
3501     - db2fmp ( ,1,0,0,0,0,0,00000000,0,0,0,0000000000000000,0000000000000000,00000000,00000000,00000000,0000000
    -  3501 -
    -  3502 -
    -  3503 -
...

The main thread ID is the as the process ID

Introduction to Resource Monitoring

To monitor one or more specific processes over time, use the -p option to top, with this syntax:

1	top -p <pid>

Adjusting Process Priorities

You can change the way the kernel schedules a process in order to give the process more or less CPU time than other processes.

The kernel runs each process according to its scheduling priority, which is a number between –20 and 20, with –20 being the foremost priority.

ps axl

F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4  1000     1     0  20   0  15120  1596 do_wai Ss   ?          0:00 /bin/bash /opt/IBM/InformationServer/initScripts
4     0  1882     1  20   0 1225076 48936 futex_ Sl  ?          0:00 db2wdog 0 [db2inst1]
4  1000  1884  1882  20   0 10302284 1699292 futex_ Sl ?       58:25 db2sysc 0
5     0  1890  1882  20   0 1227236 18292 do_msg S   ?          0:13 db2ckpwd 0

top

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1884 db2inst1  20   0    9.8g   1.6g   1.6g S   1.3 10.4  58:25.82 db2sysc
    1 db2inst1  20   0   15120   1596   1360 S   0.0  0.0   0:00.01 startcontainer.
 1882 root      20   0 1225076  48936  33072 S   0.0  0.3   0:00.13 db2syscr

PR is the priority value. NI (nice value), high nice value means nicer, more likely to give up CPU time.

Alter the nice value:

1	renice <value> <pid>

Load Averages

The load average is the average number of processes currently ready to run. Keep in mind that most processes on your system are usually waiting for input (from the keyboard, mouse, or network, for example), meaning that most processes are not ready to run and should contribute nothing to the load average. Only processes that are actually doing something affect the load average.

1 2	# uptime ... up 91 days, ... load average: 0.08, 0.03, 0.01

The three numbers are the load averages for the past 1 minute, 5 minutes, and 15 minutes, respectively. An average of only 0.01 processes have been running across all processors for the past 15 minutes.

If a load average goes up to around 1, a single process is probably using the CPU nearly all of the time. To identify that process, use the top command; the process will usually rise to the the top of the display.

If you have two cores, a load average of 1 means that only one of the cores is likely active at any given time, and a load average of 2 means that both cores have just enough to do all of the time.

A high load average does not necessarily mean that your system is having trouble. A system with enough memory and I/O resources can easily handle many running processes. If your load average is high and your system still responds well, don’t panic

However, if you sense that the system is slow and the load average is high, you might be running into memory performance problems.

Memory

CPU has a memory management unit (MMU) that translates the virtual memory addresses used by processes into real ones. The kernel assists the MMU by breaking the memory used by processes into smaller chunks called pages.

The kernel maintains a data structure, called a page table, that contains a mapping of a processes’ virtual page addresses to real page addresses in memory. As a process accesses memory, the MMU translates the virtual addresses used by the process into real addresses based on the kernel’s page table.

A user process does not actually need all of its pages to be immediately available in order to run. The kernel generally loads and allocates pages as a process needs them; this system is known as on-demand paging or just demand paging.

Page Faults

If a memory page is not ready when a process wants to use it, the process triggers a page fault.

MINOR PAGE FAULTS A minor page fault occurs when the desired page is actually in main memory but the MMU doesn’t know where it is. This can happen when the process requests more memory or when the MMU doesn’t have enough space to store all of the page locations for a process. In this case, the kernel tells the MMU about the page and permits the process to continue. Minor page faults aren’t such a big deal, and many occur as a process runs. Unless you need maximum performance from some memory-intensive program, you probably shouldn’t worry about them.
MAJOR PAGE FAULTS A major page fault occurs when the desired memory page isn’t in main memory at all, which means that the kernel must load it from the disk or some other slow storage mechanism. Some major page faults are unavoidable, such as those that occur when you load the code from disk when running a program for the first time.

Let’s see the page faults:

# /usr/bin/time netstat > /dev/null

0.05user 0.02system 0:01.74elapsed 4%CPU (0avgtext+0avgdata 2556maxresident)k
1752inputs+0outputs (3major+781minor)pagefaults 0swaps

There are 3 major page faults and 781 minor page faults when running netstat program. The major page faults occurred when the kernel needed to load the program from the disk for the first time. If you ran the command again, you probably wouldn’t get any major page faults because the kernel would have cached the pages from the disk:

# /usr/bin/time netstat > /dev/null

0.04user 0.02system 0:01.61elapsed 3%CPU (0avgtext+0avgdata 2552maxresident)k
0inputs+0outputs (0major+783minor)pagefaults 0swaps

Note that time here is not the shell built-in time command! If you run
1
2
> type -a time
>

you will see

1
2
3

> time is a shell keyword
> time is /usr/bin/time
>

see this doc

If you’d rather see the number of page faults of processes as they’re running, use top or ps. When running top, use f to add the displayed fields and space to display the nMaj and nMin.

# top

  PID USER      %CPU  PR  NI    VIRT    RES    SHR S %MEM     TIME+ COMMAND                                nMaj nMin
 1303 dsadm      1.7  20   0 4753196  82704  13412 S  0.5 158:31.05 java                                      0  25k
 1929 dsadm      0.3  20   0  203344   2556   2116 S  0.0   2:16.80 ResTrackApp                               0 1028

When using ps, you can use a custom output format to view the page faults for a particular process:

# ps -o pid,min_flt,maj_flt 1

  PID  MINFL  MAJFL
    1   2059      6

Monitoring CPU and Memory Performance

Among the many tools available to monitor system performance, the vmstat command is one of the oldest, with minimal overhead. You’ll find it handy for getting a high-level view of how often the kernel is swapping pages in and out, how busy the CPU is, and IO utilization.

vmstat 2

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 174452   1064 6874152    0    0    17    55    4    0  2  2 95  0  1
 2  0      0 173900   1064 6874168    0    0     0    16 10362 12869  5  3 90  0  2
 3  0      0 173932   1064 6874180    0    0     0   291 9110 10761  2  1 95  1  1
 0  0      0 174056   1064 6874180    0    0     0    59 9126 12447  3  3 92  0  1
 0  0      0 174228   1064 6874184    0    0     0   167 7100 9601  1  1 97  0  1

Not easy to understand, can dig deeper into it by reading vmstat(8) manual page.

I/O Monitoring

Like vmstat and netstat (talk later), we have iostat.

iostat 2 -d -p ALL

Linux 3.10.0-862.14.4.el7.x86_64 (dstest1.fyre.ibm.com)         05/21/2019      _x86_64_        (8 CPU)

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda              18.12        87.72       201.61  270042131  620650624
vda1              0.00         0.00         0.00      11165       2270
vda2             13.59        87.71       201.61  270022634  620648353
dm-0             17.99        86.13       201.61  265157850  620648353
dm-1              0.06         1.58         0.00    4860836          0

This means update every 2 seconds, show device only and show all partitions.

If you need to dig even deeper to see I/O resources used by individual processes, the iotop tool can help. Using iotop is similar to using top.

iotop

Total DISK READ:         4.76 K/s | Total DISK WRITE:     333.31 K/s
  TID  PRIO  USER       DISK READ DISK  WRITE  SWAPIN     IO> COMMAND
  260 be/3 root          0.00 B/s  38.09 K/s  0.00 %  6.98 % [jbd2/sda1-8]
 2611 be/4 juser         4.76 K/s  10.32 K/s  0.00 %  0.21 % zeitgeist-daemon
 ...

It shows TID (thread ID) instead of PID, PRIO (priority) indicates the IO priority, be/3 is more important than be/4. The kernel uses the scheduling class to add more control for I/O scheduling. You’ll see three scheduling classes from iotop:

be Best-effort. The kernel does its best to fairly schedule I/O for this class. Most processes run under this I/O scheduling class.
rt Real-time. The kernel schedules any real-time I/O before any other class of I/O, no matter what.
idle Idle. The kernel performs I/O for this class only when there is no other I/O to be done. There is no priority level for the idle scheduling class.

Per-Process Monitoring

The pidstat utility allows you to see the resource consumption of a process over time in the style of vmstat.

# pidstat -p 27946 1

Linux 3.10.0-862.14.4.el7.x86_64 (myk8s1.fyre.ibm.com)  05/21/2019      _x86_64_        (4 CPU)

08:16:27 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
08:16:28 PM  1002     27946    0.00    0.00    0.00    0.00     3  tail
08:16:29 PM  1002     27946    0.00    0.00    0.00    0.00     3  tail
08:16:30 PM  1002     27946    0.00    0.00    0.00    0.00     3  tail

The CPU column tells you about this process is running on which CPU.

Chapter 9. Network and Configuration

Note that the ifconfig command, as well some of the others you’ll see later in this chapter (such as route and arp), has been technically supplanted with the newer ip command. The ip command can do more than the old commands, and it is preferable when writing scripts. However, most people still use the old commands when manually working with the network, and these commands can also be used on other versions of Unix. For this reason, we’ll use the old-style commands.

Routes and the Kernel Routing Table

Let’s see the routing table by route command, -n means show numerical address instead of hostname:

# route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         9.30.94.1       0.0.0.0         UG    0      0        0 eth1
9.30.94.0       0.0.0.0         255.255.254.0   U     0      0        0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1003   0        0 eth1
172.16.0.0      0.0.0.0         255.255.0.0     U     0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 *
192.168.0.2     0.0.0.0         255.255.255.255 UH    0      0        0 calib4daf4f1db0
192.168.0.3     0.0.0.0         255.255.255.255 UH    0      0        0 cali987b4d0c33f
192.168.1.0     172.16.182.156  255.255.255.0   UG    0      0        0 eth0
192.168.2.0     172.16.182.187  255.255.255.0   UG    0      0        0 eth0

The Destination column tells you a network prefix (outside network), and the Genmask column is the netmask corresponding to that network. Each network has a U under its Flags column, indicating that the route is active (“up”).

There is a G in the Flags column, meaning that communication for this network must be sent through the gateway in the Gateway column, for example, for networl 0.0.0.0/0 send through it’s gateway 9.30.94.1. If no G in Flags, indicating that the network is directly connected in some way.

An entry for 0.0.0.0/0 in the routing table has special significance because it matches any address on the Internet. This is the default route, and the address configured under the Gateway column (in the route -n output) in the default route is the default gateway.

Basic ICMP and DNS Tools

ping

# ping baidu.com

PING baidu.com (123.125.114.144) 56(84) bytes of data.
64 bytes from 123.125.114.144 (123.125.114.144): icmp_seq=1 ttl=40 time=212 ms
64 bytes from 123.125.114.144 (123.125.114.144): icmp_seq=2 ttl=40 time=212 ms
64 bytes from 123.125.114.144 (123.125.114.144): icmp_seq=3 ttl=40 time=212 ms
^C
--- baidu.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 212.335/212.495/212.578/0.393 ms

56(84) bytes of data means send a 56 bytes packet (84 bytes when include header). icmp_seq is sequence number, sometimes you will have gap, usually means there’s some kind of connectivity problem. time is round-trip time.

traceroute

One of the best things about traceroute is that it reports return trip times at each step in the route:

## -n will not do hostname lookup for IP in output
# traceroute -n google.com

traceroute to google.com (172.217.1.206), 30 hops max, 60 byte packets
 1  9.30.94.3  0.626 ms  0.742 ms  0.845 ms
 2  9.30.156.13  0.529 ms  0.801 ms  0.918 ms
 3  9.55.129.109  0.668 ms  0.852 ms 9.55.129.105  0.515 ms
 4  9.55.187.13  0.476 ms  0.255 ms  0.425 ms
 5  9.55.128.6  0.326 ms  0.433 ms  0.294 ms
 6  9.64.38.194  0.941 ms  0.898 ms  0.843 ms
 7  9.64.3.86  18.220 ms  18.230 ms  18.229 ms
 8  9.64.3.85  31.521 ms  31.527 ms  31.563 ms
 9  9.17.3.35  31.803 ms  31.613 ms  31.875 ms
 ...

DNS and host

To find the IP address behind a domain name, use the host command:

# host www.google.com

www.google.com has address 172.217.11.228
www.google.com has IPv6 address 2607:f8b0:400f:801::2004

You can also use host in reverse: Enter an IP address instead of a hostname to try to discover the hostname behind the IP address. But don’t expect this to work reliably. Many hostnames can represent a single IP address, and DNS doesn’t know how to determine which hostname should correspond to an IP address.

Kernel Network Interfaces

Network interfaces have names that usually indicate the kind of hardware underneath, such as eth0 (the first Ethernet card in the computer) and wlan0 (a wireless interface).

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 9.30.94.85  netmask 255.255.254.0  broadcast 9.30.95.255
        ether 00:20:09:1e:5e:55  txqueuelen 1000  (Ethernet)
        RX packets 17164860  bytes 9046289828 (8.4 GiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 11669220  bytes 9566003426 (8.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

UP,RUNNING means this interface is active.

Resolving Hostnames

On most systems, you can override hostname lookups with the /etc/hosts file. Usually resolution will first check this file before resort to DNS server.

The traditional configuration file for DNS servers is /etc/resolv.conf:

## this is the search pattern:
search fyre.ibm.com. svl.ibm.com.
nameserver 172.16.200.52
nameserver 172.16.200.50

172.16.200.52 and 172.16.200.50 are the DNS server IP.

netstat command

This netstat command is extremely important and common in use. Ususally I use netstat -tunlp, let’s dig deeper into it:

-t: show TCP connection.
-u: show UDP connection.
-n: show numerical addresses.
-l: show only listening sockets.
-p: show PID belongs to.

Instead of ifconfig to see the interface, you can use:

# netstat -i

Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
cali9550  1440 60880456      0      6 0      37444050      0      0      0 BMRU
cali986f  1440        0      0      0 0             0      0      0      0 BMRU
docker0   1500        0      0      0 0             0      0      0      0 BMU
eth0      1500 5606028968      0      0 0      33647018      0      0      0 BMRU
eth1      1500 60880456      0      6 0      37444050      0      0      0 BMRU
lo       65536 431426103      0      0 0      431426103      0      0      0 LRU

Instead of route -n to see route table, you can use:

1	# netstat -rn

Show TCP connections (not include listening sockets):

1	# netstat -tn

To see well-known ports translate into names. check /etc/services file:

...
http            80/tcp          www www-http    # WorldWideWeb HTTP
http            80/udp          www www-http    # HyperText Transfer Protocol
http            80/sctp                         # HyperText Transfer Protoco
...

On Linux, only processes running as the superuser can use ports 1 through 1023. All user processes may listen on and create connections from ports 1024 and up.

I skip the rest of this chapter, majority is concept

Chapter 10. Network Applications and Services

Let’s mainly focus on some important commands here:

curl command

curl is a command line tool to transfer data to or from a server, using any of the supported protocols (HTTP, FTP, IMAP, POP3, SCP, SFTP, SMTP, TFTP, TELNET, LDAP or FILE). curl is powered by Libcurl. This tool is preferred for automation, since it is designed to work without user interaction. curl can transfer multiple file at once.

you can refer this article.

Diagnostic Tools

lsof(list open files) can track open files, but it can also list the programs currently using or listening to ports. Please read more when you need this tool.

tcpdump, a command tool version of wireshark.

netcat(or nc) I used it before for developing PXEngine, we use TCP to replace ssh connection between conductor and compute containers. netcat can connect to remote TCP/UDP ports, specify a local port, listen on ports, scan ports, redirect standard I/O to and from network connections, and more.

I remember I use nc to listen on a port and on other side connect to that port and transfer data.

## install
yum install -y nc
apt install -y netcat

## -l: listening mode
## -p: port
nc -l -p 1234
## client
nc localhost 1234

netcat can be used for TCP, UDP, Unix-domain sockets.

nmap scans all ports on a machine or network of machines looking for open ports, and it lists the ports it finds.

# nmap myk8s1.fyre.ibm.com

Starting Nmap 6.40 ( http://nmap.org ) at 2019-05-22 23:33 PDT
Nmap scan report for myk8s1.fyre.ibm.com (9.30.94.85)
Host is up (0.00024s latency).
Not shown: 995 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
179/tcp  open  bgp
2049/tcp open  nfs
5000/tcp open  upnp

Nmap done: 1 IP address (1 host up) scanned in 0.20 seconds

Chapter 11. Introduction to Shell Scripts

A shell script is a series of commands written in a file. The #! part is called a shebang.

When writing scripts and working on the command line, just remember what happens whenever the shell runs a command:

Before running the command, the shell looks for variables, globs, and other substitutions and performs the substitutions if they appear.
The shell passes the results of the substitutions to the command.

if you use single quote:

1	grep 'r.*t' /etc/passwd

This will prevent sheel from expanding the * in current directory.

1	grep 'r.*t /etc/passwd'

This will fail, because things wrapped by single/double quote treat as one parameter.

Double quotes (") work just like single quotes, except that the shell expands variables that appear within double quotes. It will not expand globs like * in double quotes!

Just like I saw, use shift to forward arguments passed in:

#!/bin/sh
echo $1
shift
echo $1

$# is the number of arguments passed in, used in loop to pick up parameters. $@ represents all of the script arguments. $$ holds the PID of current shell.

bad message should go to standard error, just like redriect standard error to standard output:

1	echo $0: bad option ... 1>&2

$? exit code: If you intend to use the exit code of a command, you must use or store the code immediately after running the command.

if condition

Let’s see an example, these 2 are good:

1 2	if [ "$1" = hi ]; then if [ x"$1" = x"hi" ]; then

Here, "" is vital, since user may not input $1, if no double quotes, it could be:

1	if [ = hi ]; then

the test ([) command aborts immediately.

Note that the stuff follows if is a command! so we have ; before then.

So you can use other commands instead of [ command, cool!

#!/bin/sh
if grep -q daemon /etc/passwd; then
    echo The daemon user is in the passwd file.
else
    echo There is a big problem. daemon is not in the passwd file.
fi

Let’s see && and || and test condition:

#!/bin/sh
if [ "$1" = hi ] || [ "$1" = bye ]; then
    echo 'The first argument was "'$1'"'
fi

The -a and -o flags are the logical and and or operators in test:

1	[ "$1" = hi -o "$1" = ho ]

test command

There are dozens of test operations, all of which fall into three general categories: file tests, string tests, and arithmetic tests.

file filter

-f: regular file return 0 -e: file exist return 0 -s: not empty file return 0 -d: directory return 0 -h: softlink return 0

File permission: -r: readable -w: writable -x: executable -u: setuid -g: setgid -k: sticky

The test command follows symbolic links (except for the -h test). That is, if link is a symbolic link to a regular file, [ -f link ] returns an exit code of true (0).

Finally, three binary operators (tests that need two files as arguments) are used in file tests, but they’re not terribly common. [ file1 -nt file2 ]: if file1 has a newer modification date than file2 return 0 [ file1 -ot file2 ]: if file1 has a older modification date than file2 return 0 [ file1 -ef file2 ]: compares two files and returns true if they share inode numbers and devices.

string test

=: equal !=: not equal -z: empty string return 0 -n: not empty return 0

arithmetic test

-eq: equal to -ne: not equal to -lt: less than -gt: greater than -le: less than or equal to -ge: greater than or equal to

case condition

The case keyword forms another conditional construct that is exceptionally useful for matching strings, it can do pattern matching:

#!/bin/sh
case $1 in
    bye)
        echo Fine, bye.
        ;;
    hi|hello)
        echo Nice to see you.
        ;;
    what*)
        echo Whatever.
        ;;
    *)
        echo 'Huh?'
        ;;
esac

Each case must end with a double semicolon (;;) or you risk a syntax error.

loop

for loop:

#!/bin/sh
for str in one two three four; do
    echo $str
done

while loop:

#!/bin/sh
FILE=/tmp/whiletest.$$;
echo firstline > $FILE
while tail -10 $FILE | grep -q firstline; do
    # add lines to $FILE until tail -10 $FILE no longer prints "firstline"
    echo -n Number of lines in $FILE:' '
    wc -l $FILE | awk '{print $1}'
    echo newline >> $FILE
done

rm -f $FILE

In fact, if you find that you need to use while, you should probably be using a language like awk or Python instead.

Command Substitution

You can use a command’s output as an argument to another command, or you can store the command output in a shell variable by enclosing a command in $().

Temporary File Management

Note the mktemp command:

#!/bin/sh
TMPFILE1=$(mktemp /tmp/im1.XXXXXX)
TMPFILE2=$(mktemp /tmp/im2.XXXXXX)

cat /proc/interrupts > $TMPFILE1
sleep 2
cat /proc/interrupts > $TMPFILE2
diff $TMPFILE1 $TMPFILE2
rm -f $TMPFILE1 $TMPFILE2

If the script is aborted, the temporary files could be left behind. In the preceding example, pressing CTRL-C before the second cat command leaves a temporary file in /tmp. Avoid this if possible. Instead, use the trap command to create a signal handler to catch the signal that CTRL-C generates and remove the temporary files, as in this handler:

#!/bin/sh
TMPFILE1=$(mktemp /tmp/im1.XXXXXX)
TMPFILE2=$(mktemp /tmp/im2.XXXXXX)
trap "rm -f $TMPFILE1 $TMPFILE2; exit 1" INT

You must use exit in the handler to explicitly end script execution, or the shell will continue running as usual after running the signal handler.

Note that in startcontainer.sh we also have trap and we use shell function there, now I understand!

Important Shell Script Utilities

basename

This one strip the extension of file name:

1
2
3

# basename example.html .html

example

This one git rid of directory in full path:

1
2
3

# basename /usr/local/bin/example

example

awk

The awk command is not a simple single-purpose command; it’s actually a powerful programming language. Unfortunately, awk usage is now something of a lost art, having been replaced by larger languages such as Python.

sed

The sed program (sed stands for stream editor) is an automatic text editor that takes an input stream (a file or the standard input), alters it according to some expression, and prints the results to standard output.

expr

The expr command is a clumsy, slow way of doing math. If you find yourself using it frequently, you should probably be using a language like Python instead of a shell script.

Subshells

An entirely new shell process that you can create just to run a command or two. The new shell has a copy of the original shell’s environment, and when the new shell exits, any changes you made to its shell environment disappear, leaving the initial shell to run as normal.

Using a subshell to make a single-use alteration to an environment variable is such a common task:

1	# (PATH=/usr/confusing:$PATH; ./runprogram.sh)

Chapter 12. Moving Files Across the Network

Quick copy browser

Go to target directory run:

1	python -m SimpleHTTPServer

This usually open 8000 port on your machine, then go to another machine open:

1 2	# use ifconfig to check the source machine IP 192.168.1.29:8000

you can see the content there.

rsync

Actually you can first enable Mac ssh access then use rsync to backup files: System Preference -> Sharing -> check remote login

To get rsync working between two hosts, the rsync program must be installed on both the source and destination, and you’ll need a way to access one machine from the other.

Copy files to remote home:

1 2	rsync files remote: rsync files user@remote:

If rsync isn’t in the remote path but is on the system, use --rsync-path=path to manually specify its location.

Unless you supply extra options, rsync copies only files. You will see:

1	skipping directory xxx

To transfer entire directory hierarchies, complete with symbolic links, permissions, modes, and devices—use the -a option.

1	rsync -nv files -a dir user@remote:

-n: dry-run, this is vital when you are not sure. -vv: verbose mode

To make an exact replica of the source directory, you must delete files in the destination directory that do not exist in the source directory:

1	rsync -v --delete -a dir user@remote:

Please use -n dry-run to see what will be deleted before performing command.

Be particular careful with tailing slash after dir:

1	rsync -a dir/ user@remote:dest

This will copy all files under dir to dest folder in remote instead of copy dir into dest.

You can also --exclude=, --exclude-from= and --include= in command.

To speed operation, rsync uses a quick check to determine whether any files on the transfer source are already on the destination. The quick check uses a combination of the file size and its last-modified date.

When the files on the source side are not identical to the files on the destination side, rsync transfers the source files and overwrites any files that exist on the remote side. The default behavior may be inadequate, though, because you may need additional reassurance that files are indeed the same before skipping over them in transfers, or you may want to put in some extra safeguards.

--checksum (abbreviation: -c) Compute checksums (mostly unique signatures) of the files to see if they’re the same. This consumes additional I/O and CPU resources during transfers, but if you’re dealing with sensitive data or files that often have uniform sizes, this option is a must. (This will focus on file content, not date stamp)
--ignore-existing Doesn’t clobber files already on the target side.
--backup (abbreviation: -b) Doesn’t clobber files already on the target but rather renames these existing files by adding a ~ suffix to their names before transferring the new files.
--suffix=s Changes the suffix used with --backup from ~ to s.
--update (abbreviation: -u) Doesn’t clobber any file on the target that has a later date than the corresponding file on the source.

You can also compress the dir when transfer:

1	rsync -az dir user@remote:

You can also reverse the process:

1	rsync -a user@remote:dir dest

The rest of this chapter talks samba for file sharing, I skip it.

Chapter 13. User Environments

Startup files play an important role at this point, because they set defaults for the shell and other interactive programs. They determine how the system behaves when a user logs in.

I see vi theme config in ~/.bashrc file.

The Command Path

The most important part of any shell startup file is the command path. The path should cover the directories that contain every application of interest to a regular user. At the very least, the path should contain these components, in order:

1
2
3

/usr/local/bin
/usr/bin
/bin

If the application is on another directory, use symbolic link to /usr/local/bin or you defined bin folder.

The prompt

I never use this so far, usually prompt shows hostname, username, current directory and sign ($ or #). you can change the color and more.

Alias

This is common use, sometimes I use shell functions too.

Permission mask

It depends on your needs:

1	umask 022/077

Startup file order

These startup files are used to create environment. Each script has a specific use and affects the login environment differently. Every subsequent script executed can override the values assigned by previous scripts.

The two main shell instance types are interactive and noninteractive, but of those, only interactive shells are of interest because noninteractive shells (such as those that run shell scripts) usually don’t read any startup files.

Interactive shells are the ones that you use to run commands from a terminal, they can be classified as login or non-login.

I know there are lots of startup files under each user’s home directory or in other system folder, how do they take effect? In what order: Reference Doc Difference between Login shell and Non login shell

Logging in remotely with SSH also gives you a login shell.

You can tell if a shell is a login shell by running echo $0; if the first character is a -, the shell’s a login shell.

When Bash is invoked as a Login shell:

Login process calls /etc/profile
/etc/profile calls the scripts in /etc/profile.d/
Login process calls ~/.bash_profile, ~/.bash_login and ~/.profile. running only the first one that it sees.

When bash is invoked as a Non-login shell;

Non-login process(shell) calls /etc/bashrc
then calls ~/.bashrc

Non-Login shells created using the below command syntax: examples: # su | # su USERNAME

Note that I can run bash or sh or csh in terminal, it will give me a new simple prompt without user profile or setting…

It seems if you use non-login like su dsadm, the export env vars are still there in env scope, I think the reason is it’s not login! still use current environment. But if you run su - dsadm, it is gone.