Great book for all Linux developers and administrators! Just note for future quick revisit!
Chapter1. The Big picture
The most effective way to understand how an operating system works is through abstraction—a fancy way of saying that you can ignore most of the details.
The kernel
is software residing in memory that tells the CPU what to do. The kernel manages the hardware and acts primarily as an interface between the hardware and any running program.
Processes—the running programs that the kernel manages—collectively make up the system’s upper level, called user space
.
1 | +-------------------------------------------------------------------------+ |
There is a critical difference between the ways that the kernel and user processes run: The kernel runs in kernel mode, and the user processes run in user mode. Code running in kernel mode has unrestricted access to the processor and main memory. This is a powerful but dangerous privilege that allows a kernel process to easily crash the entire system. The area that only the kernel can access is called kernel space.
User mode, in comparison, restricts access to a (usually quite small) subset of memory and safe CPU operations. User space refers to the parts of main memory that the user processes can access. If a process makes a mistake and crashes, the consequences are limited and can be cleaned up by the kernel. This means that if your web browser crashes, it probably won’t take down the scientific computation that you’ve been running in the background for days.
Hardware
A CPU is just an operator on memory; it reads its instructions and data from the memory and writes data back out to the memory.
You’ll often hear the term state
in reference to memory, processes, the kernel, and other parts of a computer system. Strictly speaking, a state is a particular arrangement of bits. For example, if you have four bits in your memory, 0110, 0001, and 1011 represent three different states.
The term image
refers to a particular physical arrangement of bits.
Kernel
Nearly everything that the kernel does revolves around main memory. One of the kernel’s tasks is to split memory into many subdivisions, and it must maintain certain state information about those subdivisions at all times. Each process gets its own share of memory, and the kernel must ensure that each process keeps to its share.
The kernel is in charge of managing tasks in four general system areas: Processes. The kernel is responsible for determining which processes are allowed to use the CPU.
Memory. The kernel needs to keep track of all memory—what is currently allocated to a particular process, what might be shared between processes, and what is free.
Device drivers. The kernel acts as an interface between hardware (such as a disk) and processes. It’s usually the kernel’s job to operate the hardware.
System calls and support. Processes normally use system calls to communicate with the kernel.
The act of one process giving up control of the CPU to another process is called a context switch.
The kernel is responsible for context switching. To understand how this works, let’s think about a situation in which a process is running in user mode but its time slice is up. Here’s what happens:
- The CPU (the actual hardware) interrupts the current process based on an internal timer, switches into kernel mode, and hands control back to the kernel.
- The kernel records the current state of the CPU and memory, which will be essential to resuming the process that was just interrupted.
- The kernel performs any tasks that might have come up during the preceding time slice (such as collecting data from input and output, or I/O, operations).
- The kernel is now ready to let another process run. The kernel analyzes the list of processes that are ready to run and chooses one.
- The kernel prepares the memory for this new process, and then prepares the CPU.
- The kernel tells the CPU how long the time slice for the new process will last.
- The kernel switches the CPU into user mode and hands control of the CPU to the process.
The context switch answers the important question of when the kernel runs. The answer is that it runs between process time slices during a context switch.
Modern CPUs include a memory management unit (MMU)
that enables a memory access scheme called virtual memory
. When using virtual memory, a process does not directly access the memory by its physical location in the hardware. Instead, the kernel sets up each process to act as if it had an entire machine to itself. When the process accesses some of its memory, the MMU intercepts the access and uses a memory address map to translate the memory location from the process into an actual physical memory location on the machine. The kernel must still initialize and continuously maintain and alter this memory address map. For example, during a context switch, the kernel has to change the map from the outgoing process to the incoming process.
The implementation of a memory address map is called a page table.
The kernel’s role with devices is pretty simple. A device is typically accessible only in kernel mode because improper access (such as a user process asking to turn off the power) could crash the machine. Another problem is that different devices rarely have the same programming interface, even if the devices do the same thing, such as two different network cards. Therefore, device drivers have traditionally been part of the kernel.
There are several other kinds of kernel features available to user processes. For example, system calls
(or syscalls) perform specific tasks that a user process alone cannot do well or at all. For example, the acts of opening, reading, and writing files all involve system calls.
Other than init
, all user processes on a Linux system start as a result of fork()
, and most of the time, you also run exec()
to start a new program instead of running a copy of an existing process.
User Space
As mentioned earlier, the main memory that the kernel allocates for user processes is called user space
. Because a process is simply a state (or image) in memory, user space also refers to the memory for the entire collection of running processes.
Users
A user
is an entity that can run processes and own files. A user is associated with a username. For example, a system could have a user named billyjoe. However, the kernel does not manage the usernames; instead, it identifies users by simple numeric identifiers called userids.
Users exist primarily to support permissions and boundaries.
In addition, as powerful as the root user is, it still runs in the operating system’s user mode, not kernel mode.
Groups
are sets of users. The primary purpose of groups is to allow a user to share file access to other users in a group.
Chapter 2. Basic Commands and Directory Hierarchy
Some resources:
<<UNIX for the Impatient>>
<<Learning the UNIX Operating System>>
The shell
is one of the most important parts of a Unix system. A shell is a program that runs commands. The shell also serves as a small programming environment.
Many important parts of the system are actually shell scripts
—text files that contain a sequence of shell commands.
There are many different Unix shells, but all derive several of their features from the Bourne shell
(/bin/sh), a standard shell developed at Bell Labs for early versions of Unix. Every Unix system needs the Bourne shell in order to function correctly, as you will see throughout this book.
Linux uses an enhanced version of the Bourne shell called bash
or the “Bourne-again” shell. The bash shell is the default shell on most Linux distributions, and /bin/sh is normally a link to bash on a Linux system.
cat
command: The command is called cat because it performs concatenation when it prints the contents of more than one file.
Pressing CTRL-D
on an empty line stops the current standard input entry from the terminal (and often terminates a program). Don’t confuse this with CTRL-C
, which terminates a program regardless of its input or output.
Unix filenames do not need extensions and often do not carry them.
shell globs
don’t match dot files unless you explicitly use a pattern such as .*
. This is why rm -rf ./*
doesn’t remove hidden objects.
You can run into problems with globs because
.*
matches.
and..
(the current and parent directories)
The shell can store temporary variables, called shell variables
, containing the values of text strings. Shell variables are very useful for keeping track of values in scripts, and some shell variables control the way the shell behaves.
An environment variable
is like a shell variable, but it’s not specific to the shell. All processes on Unix systems have environment variable storage. The main difference between environment and shell variables is that the operating system passes all of your shell’s environment variables
to programs that the shell runs (for example, the sub-script), whereas shell variables cannot be accessed in the commands that you run.
Assign an environment variable with the shell’s export
command. For example, if you’d like to make the $STUFF
shell variable into an environment variable, use the following:
1 | STUFF=123 |
PATH
is a special environment variable that contains the command path
(or path for short). A command path is a list of system directories that the shell searches when trying to locate a command.
resource:
<<Learning the vi and Vim Editor>>
Some kill process
ways.There are many types of signals. The default is TERM
, or terminate.
1 | kill -STOP pid |
To see if you’ve accidentally suspended any processes on your current terminal, run the jobs
command.
You can detach a process from the shell and put it in the “background” with the ampersand &
. The best way to make sure that a background process doesn’t bother you is to redirect its output (and possibly input).
Some executable files have an s
in the owner permissions
listing instead of an x. This indicates that the executable is setuid
, meaning that when you execute the program, it runs as though the file owner is the user instead of you. Many programs use this setuid
bit to run as root in order to get the privileges they need to change system files. One example is the passwd
program, which needs to change the /etc/passwd
file.
Directories also have permissions. You can list the contents of a directory if it’s readable, but you can only access a file in a directory if the directory is executable
. (One common mistake people make when setting the permissions of directories is to accidentally remove the execute permission when using absolute modes.)
You can specify a set of default permissions with the umask (user file-creation mode mask)
shell command, which applies a predefined set of permissions to any new file you create. In general, use umask 022
if you want everyone to be able to see all of the files and directories that you create, and use umask 077
if you don’t. (You’ll need to put the umask command with the desired mode in one of your startup files to make your new default permissions apply to later sessions).
How to calculate the
umask
?For directories, the base permissions are (rwxrwxrwx)
0777
and for files they are0666
(rw-rw-rw).You can simply subtract the umask from the base permissions to determine the final permission for file as follows: 666 – 022 = 644 subtract to get permissions of new file (666-022) : 644 (rw-r–r–)
You can simply subtract the umask from the base permissions to determine the final permission for directory as follows: 777 – 022 = 755 Subtract to get permissions of new directory (777-022) : 755 (rwxr-xr-x)
Another compression program in Unix is bzip2
, whose compressed files end with .bz2
. While marginally slower than gzip, bzip2 often compacts text files a little more, and it is therefore increasingly popular in the distribution of source code.
The bzip2
compression/decompression option for tar is j
:
1 | tar jcvf xx.bz2 file... |
Linux Directory Hierarchy Essentials
Simplified overview of the hierarchy
1 | +---------+ |
-
/bin
Contains ready-to-run programs (also known as an executables), including most of the basic Unix commands such as ls and cp. Most of the programs in /bin are in binary format, having been created by a C compiler, but some are shell scripts in modern systems. -
/dev
Contains device files. -
/etc
This core system configuration directory contains the user password, boot, device, networking, and other setup files. Many items in /etc are specific to the machine’s hardware. -
/home
Holds personal directories for regular users. -
/lib
An abbreviation for library, this directory holds library files containing code that executables can use. -
/proc
Provides system statistics through a browsable directory-and-file interface. The/proc
directory contains information about currently running processes as well as some kernel parameters. -
/sys
This directory is similar to /proc in that it provides a device and system interface. -
/sbin
The place for system executables. Programs in /sbin directories relate to system management. -
/tmp
A storage area for smaller, temporary files that you don’t care much about. If something is extremely important, don’t put it in /tmp because most distributions clear /tmp when the machine boots and some even remove its old files periodically. Also, don’t let /tmp fill up with garbage because its space is usually shared with something critical -
/usr
Although pronounced “user,” this subdirectory has no user files. Instead, it contains a large directory hierarchy, including the bulk of the Linux system. Many of the directory names in /usr are the same as those in the root directory (like /usr/bin and /usr/lib), and they hold the same type of files. (The reason that the root directory does not contain the complete system is primarily historic—in the past, it was to keep space requirements low for the root.) -
/var
The variable subdirectory, where programs record runtime information. System logging, user tracking, caches, and other files that system programs create and manage are here. -
/boot
Contains kernel boot loader files. These files pertain only to the very first stage of the Linux startup procedure. -
/media
A base attachment point for removable media such as flash drives that is found in many distributions. -
/opt
This may contain additional third-party software.
Kernel Location
On Linux systems, the kernel is normally in /vmlinuz
or /boot/vmlinuz
. A boot loader loads this file into memory and sets it in motion when the system boots.
Once the boot loader runs and sets the kernel in motion, the main kernel file is no longer used by the running system. However, you’ll find many modules that the kernel can load and unload on demand during the course of normal system operation. Called loadable kernel modules, they are located under /lib/modules
.
Chapter 3. Devices
It’s important to understand how the kernel interacts with user space when presented with new devices. The udev
system enables user-space programs to automatically configure and use new devices.
udev
(userspace /dev) is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory.
Device Files
It is easy to manipulate most devices on a Unix system because the kernel presents many of the device I/O interfaces to user processes as files. These device files are sometimes called device nodes
. Not only can a programmer use regular file operations to work with a device, but some devices are also accessible to standard programs like cat
. However, not all devices or device capabilities are accessible with standard file I/O.
Device files are in the /dev
directory, and running ls /dev
reveals more than a few files in /dev
.
if run
1 | ls -l |
if the first char in file mode is b
, c
, p
, or s
, the file is a device. These letters stand for block, character, pipe, and socket, respectively.
The numbers before the dates in the first two lines are the major
and minor
device numbers that help the kernel identify the device. Similar devices usually have the same major number.
Block device
Programs access data from a block device in fixed chunks. The sda1 in the preceding example is a disk device, a type of block device.
Character device
Character devices work with data streams. Printers directly attached to your computer are represented by character devices. It’s important to note that during character device interaction, the kernel cannot back up and reexamine the data stream after it has passed data to a device or process.
Pipe device
Named pipes are like character devices, with another process at the other end of the I/O stream instead of a kernel driver.
Socket device
Sockets are special-purpose interfaces that are frequently used for interprocess communication
.
Not all devices have device files because the block and character device I/O interfaces are not appropriate in all cases. For example,
network interfaces
don’t have device files. It is theoretically possible to interact with a network interface using a single character device, but because it would be exceptionally difficult, the kernel uses other I/O interfaces.
The sysfs Device Path
To provide a uniform view for attached devices based on their actual hardware attributes, the Linux kernel offers the sysfs
interface through a system of files and directories. The base path for devices is /sys/devices
(this is a real directory!).
1 | ls -ltr /sys/devices/ |
The /dev
file is there so that user processes can use the device, whereas the /sys/devices
path is used to view information and manage the device. In /dev
you can run:
1 | udevadm info --query=all --name=/dev/null |
this command will show the sysfs location /devices/virtual/mem/null
dd and Devices
The program dd
is extremely useful when working with block and character devices. This program’s sole function is to read from an input file or stream and write to an output file or stream, possibly doing some encoding conversion on the way.
I am not using it.
Device Name Summary
Not necessarily as described below, may be some variations:
- Hard Disks: /dev/sd*
Most hard disks attached to current Linux systems correspond to device names with an sd
prefix, such as /dev/sda
, /dev/sdb
, and so on. These devices represent entire disks; the kernel makes separate device files, such as /dev/sda1
and /dev/sda2
, for the partitions on a disk.
The
sd
portion of the name stands for SCSI disk.
Linux assigns devices to device files in the order in which its drivers encounter devices. This may cause problem when you remove one disk and insert another, because the device name changed for old disk. Most modern Linux systems use the Universally Unique Identifier (UUID
) for persistent disk device access.
- CD and DVD Drives: /dev/sr*
Linux recognizes most optical storage drives as the SCSI devices /dev/sr0, /dev/sr1, and so on.
-
PATA Hard Disks: /dev/hd*
-
Terminals: /dev/tty*, /dev/pts/*, and /dev/tty
Terminals are devices for moving characters between a user process and an I/O device, usually for text output to a terminal screen.
Pseudoterminal
devices are emulated terminals that understand the I/O features of real terminals.
Two common terminal devices are /dev/tty1
(the first virtual console) and /dev/pts/0
(the first pseudoterminal device). The /dev/tty
device is the controlling terminal of the current process.
teletypewriter,
tty
in shorthand
I am always confused, at least you need to know shell is the command line interpreter! What is the difference between Terminal, Console, Shell, and Command Line?
Linux has two primary display modes: text mode
and an X Window System server
(graphics mode, usually via a display manager). Although Linux systems traditionally booted in text mode, most distributions now use kernel parameters and interim graphical display mechanisms to completely hide text mode as the system is booting. In such cases, the system switches over to full graphics mode near the end of the boot process.
OK, skip rest of the content in Chapter 3.
Chapter 4. Disks and Filesystems
Schematic of a typical Linux disk:
Partitions
are subdivisions of the whole disk. On Linux, they’re denoted with a number after the whole block device, and therefore have device names such as /dev/sda1
and /dev/sdb3
.
Partitions are defined on a small area of the disk called a partition table
.
The next layer after the partition is the filesystem
, the database of files and directories that you’re accustomed to interacting with in user space.
To access data on a disk, the Linux kernel uses the system of layers like this:
Notice that you can work with the disk through the filesystem as well as directly through the disk devices.
Partitioning Disk Devices
You can view RedHat Doc for more information about partition
Let’s view the partition table:
1 | parted -l |
There are 2 different partition tables: MBR (msdos) and GPT (gpt). The MBR table in this example contains primary, extended, and logical partitions.
Changing Partition Tables
You can use parted
command to change partition. Check /proc/partitions
can get full partition information.
1 | cat /proc/partitions |
Filesystems
The last link between the kernel and user space for disks is typically the file-system; this is what you’re accustomed to interacting with when you run commands such as ls
and cd
. As previously mentioned, the filesystem is a form of database; it supplies the structure to transform a simple block device into the sophisticated hierarchy of files and subdirectories that users can understand.
Filesystem Types
- The
Fourth Extended filesystem (ext4)
is the current iteration of a line of filesystems native to Linux. TheSecond Extended filesystem (ext2)
was a longtime default for Linux systems inspired by traditional Unix filesystems such as the Unix File System (UFS) and the Fast File System (FFS). TheThird Extended filesystem (ext3)
added a journal feature (a small cache outside the normal filesystem data structure) to enhance data integrity and hasten booting. The ext4 filesystem is an incremental improvement with support for larger files than ext2 or ext3 support and a greater number of subdirectories.
Create a Filesystems
Once you’re done with the partitioning process, you’re ready to create filesystems. As with partitioning, you’ll do this in user space because a user-space process can directly access and manipulate a block device.
For example, you can create an ext4 partition on /dev/sdf2
1 | mkfs -t ext4 /dev/sdf2 |
Filesystem creation is a task that you should only need to perform after adding a new disk or repartitioning an old one. You should create a filesystem just once for each new partition that has no preexisting data (or that has data that you want to remove). Creating a new filesystem on top of an existing filesystem will effectively destroy the old data.
It turns out that mkfs is only a frontend for a series of filesystem creation programs:
1 | ls -l /sbin/mkfs.* |
Mounting a Filesystem
On Unix, the process of attaching a filesystem is called mounting
. When the system boots, the kernel reads some configuration data and mounts root (/) based on the configuration data.
When mounting a filesystem, the common terminology is mount a device on a mount point.
To see current system mount status:
1 | mount |
There are 3 key fields:
- The filesystem’s device, such as a disk partition; where the actual file-system data resides
- The filesystem type
- The mount point—that is, the place in the current system’s directory hierarchy where the filesystem will be attached.
For example, to mount the Fourth Extended filesystem /dev/sdf2 on /home/extra, use this command:
1 | mount -t ext4 /dev/sdf2 /home/extra |
To unmount (detach) a filesystem, use the umount command:
1 | umount mountpoint |
Filesystem UUID
You can identify and mount filesystems by their Universally Unique Identifier (UUID)
, a software standard. The UUID is a type of serial number, and each one should be different.
For example, if you know the UUID of /dev/sdf2 is a9011c2b-1c03-4288-b3fe-8ba961ab0898, so you can mount it as:
1 | mount UUID=a9011c2b-1c03-4288-b3fe-8ba961ab0898 /home/extra |
Here no -t ext4
option, because mount know that.
To view a list of devices and the corresponding filesystems and UUIDs on your system, use the blkid (block ID) program:
1 | blkid |
For one thing, they’re the preferred way to automatically mount filesystems in /etc/fstab
at boot time.
Disk Buffering, Caching, and Filesystems
Linux, like other versions of Unix, buffers writes to the disk. This means that the kernel usually doesn’t immediately write changes to filesystems when processes request changes. Instead it stores the changes in RAM until the kernel can conveniently make the actual change to the disk. This buffering system is transparent to the user and improves performance.
This is the reason why before we remove the USB, we need to unmount it in case of data lose.
When you unmount a filesystem with umount, the kernel automatically synchronizes with the disk. At any other time, you can force the kernel to write the changes in its buffer to the disk by running the sync
command.
The /etc/fstab Filesystem Table
I encounter this when write /etc/fstab
file with NFS when developing k8s.
1 | /dev/mapper/rhel-root / xfs defaults 0 0 |
To mount filesystems at boot time and take the drudgery out of the mount command, Linux systems keep a permanent list of filesystems and options in /etc/fstab
.
-
The device or UUID
. Most current Linux systems no longer use the device in /etc/fstab, preferring the UUID. -
The mount point
. Indicates where to attach the filesystem. -
The filesystem type
. -
Options
. Use long mount options separated by commas. -
Backup information for use by the dump command
. You should always use a 0 in this field. -
The filesystem integrity test order.
To ensure that fsck always runs on the root first, always set this to 1 for the root filesystem and 2 for any other filesystems on a hard disk. Use0
to disable the bootup check for everything else, including CD-ROM drives, swap, and the /proc file-system
You can also try to mount all entries at once in /etc/fstab that do not contain the noauto option with this command:
1 | mount -a |
Let’s see some commonly use options:
-
defaults
. This uses the mount defaults: read-write mode, enable device files, executables, the setuid bit, and so on. Use this when you don’t want to give the filesystem any special options but you do want to fill all fields in /etc/fstab. -
noauto
. This option tells a mount -a command to ignore the entry.
Filesystem Capacity
To view the size and utilization of your currently mounted filesystems, use the df
command.
1 | df -BM |
Checking and Repairing Filesystems
Filesystem errors are usually due to a user shutting down the system in a rude way (for example, by pulling out the power cord). In such cases, the filesystem cache in memory may not match the data on the disk, and the system also may be in the process of altering the filesystem when you happen to give the computer a kick. Although a new generation of filesystems supports journals to make filesystem corruption far less common, you should always shut the system down properly. And regardless of the filesystem in use, filesystem checks are still necessary every now and to maintain sanity.
The tool to check a filesystem is fsck
.
In the worst cases, you can try:
-
You can try to extract the entire filesystem image from the disk with
dd
and transfer it to a partition on another disk of the same size. -
You can try to patch the filesystem as much as possible, mount it in read-only mode, and salvage what you can.
-
You can try
debugfs
.
Special-Purpose Filesystems
Not all filesystems represent storage on physical media. Specifically, most versions of Unix have filesystems that serve as system interfaces. That is, rather than serving only as a means to store data on a device, a filesystem can represent system information such as process IDs and kernel diagnostics.
The special filesystem types in common use on Linux include the following:
-
proc
. Mounted on /proc. The name proc is actually an abbreviation for process. Each numbered directory inside /proc is actually the process ID of a current process on the system; the files in those directories represent various aspects of the processes. The file /proc/self represents the current process. -
sysfs
. Mounted on /sys. -
tmpfs
. Mounted on /run and other locations. With tmpfs, you can use your physical memory and swap space as temporary storage, stored in volatile memory instead of a persistent storage device.
Swap Space
Not every partition on a disk contains a filesystem. It’s also possible to augment the RAM on a machine with disk space. The disk area used to store memory pages is called swap space
(or just swap for short).
you can use free
command to see the swap usage:
1 | free -m |
you can use a disk partition and a regular file as swap space, for disk:
- Ensure partition is empty
- Run
mkswap dev
, dev is the partition device - Execute
swapon dev
to register the space with the kernel. - Register in
/etc/fstab
file
Use these commands to create an empty file, initialize it as swap, and add it to the swap pool:
1 | dd if=/dev/zero of=swap_file bs=1024k count=num_mb |
Here, swap_file
is the name of the new swap file, and num_mb
is the desired size, in megabytes.
To remove a swap partition or file from the kernel’s active pool, use the swapoff
command.
Note some administrators configure certain systems with no swap space at all. For example, high-performance network servers should never dip into swap space and should avoid disk access if at all possible.
It’s dangerous to do this on a general-purpose machine. If a machine completely runs out of both real memory and swap space, the Linux kernel invokes the out-of-memory (OOM)
killer to kill a process in order to free up some memory. You obviously don’t want this to happen to your desktop applications. On the other hand, high-performance servers include sophisticated monitoring and load-balancing systems to ensure that they never reach the danger zone.
Looking Forward: Disks and User Space
In disk-related components on a Unix system, the boundaries between user space and the kernel can be difficult to characterize. As you’ve seen, the kernel handles raw block I/O from the devices, and user-space tools can use the block I/O through device files. However, user space typically uses the block I/O only for initializing operations such as partitioning, file-system creation, and swap space creation.
In normal use, user space uses only the filesystem support that the kernel provides on top of the block I/O.
Chapter 5. How the Linux Kernel Boots
You’ll learn how the kernel moves into memory up to the point where the first user process starts.
A simplified view of the boot process looks like this:
- The machine’s BIOS or boot firmware loads and runs a boot loader.
- The boot loader finds the kernel image on disk, loads it into memory, and starts it.
- The kernel initializes the devices and its drivers.
- The kernel mounts the root filesystem.
- The kernel starts a program called init with a process ID of 1. This point is the user space start.
- init sets the rest of the system processes in motion.
- At some point, init starts a process allowing you to log in, usually at the end or near the end of the boot.
Startup Messages
There are two ways to view the kernel’s boot and runtime diagnostic messages:
-
Look at the kernel system log file. You’ll often find this in
/var/log/ kern.log
, but depending on how your system is configured, it might also be lumped together with a lot of other system logs in/var/log/messages
or elsewhere. -
Use the
dmesg
command, but be sure to pipe the output to less because there will be much more than a screen’s worth. Thedmesg
command uses the kernel ring buffer, which is of limited size, but most newer kernels have a large enough buffer to hold boot messages for a long time.
Kernel Initialization and Boot Options
Upon startup, the Linux kernel initializes in this general order:
- CPU inspection
- Memory inspection
- Device bus discovery
- Device discovery
- Auxiliary kernel subsystem setup (networking, and so on)
- Root filesystem mount
- User space start
The following memory management messages are a good indication that the user-space handoff is about to happen because this is where the kernel protects its own memory from user-space processes:
1 | [ 0.972934] Freeing unused kernel memory: 1844k freed |
Kernel Parameters
I just encountered an issue about kernel parameters for Db2… Let’s see.
When running the Linux kernel, the boot loader passes in a set of text-based kernel parameters that tell the kernel how it should start. The parameters specify many different types of behavior, such as the amount of diagnostic output the kernel should produce and device driver–specific options.
You can view the kernel parameters from your system’s boot by looking at the /proc/cmdline
file:
1 | BOOT_IMAGE=/vmlinuz-3.10.0-862.14.4.el7.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet elevator=noop LANG=en_US.UTF-8 |
The root=/dev/mapper/rhel-root
is where root filesystem resides.
Boot Loader
At the start of the boot process, before the kernel and init start, a boot loader starts the kernel. The task of a boot loader sounds simple: It loads the kernel into memory, and then starts the kernel with a set of kernel parameters.
Kernel and its parameters are usually somewhere on the root filesystem.
On PCs, boot loaders use the Basic Input/Output System (BIOS)
or Unified Extensible Firmware Interface (UEFI)
to access disks. Nearly all disk hardware has firmware that allows the BIOS to access attached storage hardware with Linear Block Addressing (LBA)
. Although it exhibits poor performance, this mode of access does allow universal access to disks. Boot loaders are often the only programs to use the BIOS for disk access; the kernel uses its own high-performance drivers.
Most modern boot loaders can read partition tables and have built-in support for read-only access to filesystems.
Boot loader tasks
- Select among multiple kernels.
- Switch between sets of kernel parameters.
- Allow the user to manually override and edit kernel image names and parameters
- Provide support for booting other operating systems.
Boot loader typres
- GRUB. A near-universal standard on Linux systems (mainly talks about this)
- LILO. One of the first Linux boot loaders.
- LOADLIN. Boots a kernel from MS-DOS
GRUB Introduction
GRUB stands for Grand Unified Boot Loader. We’ll cover GRUB 2.
This section talks about GRUB menu and look into some boot options, actually, if you check /boot
directory, you will see kernel image file and initial RAM filesystem:
1 | ... |
Not interested in the rest of the content in this chapter.
Chapter 6. How User Space Starts
The point where the kernel starts its first user-space process, init, is significant—not just because that’s where the memory and CPU are finally ready for normal system operation, but because that’s where you can see how the rest of the system builds up as a whole.
User space is far more modular. It’s much easier to see what goes into the user space startup and operation.
User space starts in roughly this order:
- init
- Essential low-level services such as udevd and syslogd
- Network configuration
- Mid- and high-level services (cron, printing, and so on)
- Login prompts, GUIs, and other high-level applications
Introduction to init
The init program is a user-space program like any other program on the Linux system, and you’ll find it in /sbin
along with many of the other system binaries. Its main purpose is to start and stop the essential service processes on the system, but newer versions have more responsibilities.
In my vm /sbin
directory:
1 | lrwxrwxrwx 1 root root 22 Oct 1 2018 init -> ../lib/systemd/systemd |
There are three major implementations of init in Linux distributions:
System V
init. A traditional sequenced init (Sys V, usually pronounced “sys-five”). Red Hat Enterprise Linux and several other distributions use this version.systemd
. The emerging standard for init. Many distributions have moved to systemd, and most that have not yet done so are planning to move to it.Upstart
. The init on Ubuntu installations. However, as of this writing, Ubuntu has also planned to migrate to systemd.
There are many different implementations of init because System V
init and other older versions relied on a sequence that performed only one startup task at a time. systemd
and Upstart
attempt to remedy the performance issue by allowing many services to start in parallel thereby speeding up the boot process.
System V Runlevels
At any given time on a Linux system, a certain base set of processes is running. In System V init, this state of the machine is called its runlevel
, which is denoted by a number from 0 through 6. A system spends most of its time in a single runlevel, but when you shut the machine down, init switches to a different runlevel in order to terminate the system services in an orderly fashion and to tell the kernel to stop.
You can check your system’s runlevel with the who -r
command:
1 | who -r |
Runlevels serve various purposes, but the most common one is to distinguish between system startup, shutdown, single-user mode, and console mode states.
But runlevels are becoming a thing of the past. Even though all three init versions in this book support them, systemd and Upstart consider runlevels obsolete as end states for the system.
Identifying Your init
-
If your system has /usr/lib/systemd and /etc/systemd directories, you have systemd.
-
If you have an /etc/init directory that contains several .conf files, you’re probably running Upstart
-
If neither of the above is true, but you have an /etc/inittab file, you’re probably running System V init.
Here I focus on systemd
systemd
The systemd init is one of the newest init implementations on Linux. In addition to handling the regular boot process, systemd aims to incorporate a number of standard Unix services such as cron and inetd. One of its most significant features is its ability to defer the start of services and operating system features until they are necessary.
Let’s outline what happens when systemd runs at boot time:
- systemd loads its configuration.
- systemd determines its boot goal, which is usually named default.target.
- systemd determines all of the dependencies of the default boot goal, dependencies of these dependencies, and so on.
- systemd activates the dependencies and the boot goal.
- After boot, systemd can react to system events (such as uevents) and activate additional components.
Units and Unit Types
One of the most interesting things about systemd
is that it does not just operate processes and services; it can also mount filesystems, monitor network sockets, run timers, and more. Each type of capability is called a unit type
, and each specific capability is called a unit
. When you turn on a unit, you activate it.
The default boot goal is usually a target unit
that groups together a number of service
and mount
units as dependencies.
understand systemd unit and unit files
systemd Dependencies
To accommodate the need for flexibility and fault tolerance, systemd offers a myriad of dependency types and styles:
-
Requires
Strict dependencies. When activating a unit with a Requires dependency unit, systemd attempts to activate the dependency unit. If the dependency unit fails, systemd deactivates the dependent unit. -
Wants
. Dependencies for activation only. Upon activating a unit, systemd activates the unit’s Wants dependencies, but it doesn’t care if those dependencies fail. -
Requisite
. Units that must already be active. -
Conflicts
. Negative dependencies. When activating a unit with a Conflict dependency, systemd automatically deactivates the dependency if it is active.
There are many other dependency syntax, like ordering, conditional, etc…
systemd Configuration
The systemd configuration files are spread among many directories across the system, so you typically won’t find the files for all of the units on a system in one place.
That said, there are two main directories for systemd configuration: the system unit directory (globally configured, usually /usr/lib/systemd/system
) and a system configuration directory (local definitions, usually /etc/systemd/system
).
Note: Avoid making changes to the system unit directory because your distribution will maintain it for you. Make your local changes to the system configuration directory.
To see the system unit and configuration directories on your system, use the following commands:
1 | pkg-config systemd --variable=systemdsystemunitdir |
Let’s see Unit files in /usr/lib/systemd/system
, there is a sshd.service file:
1 | [Unit] |
The [Unit] section gives some details about the unit and contains description and dependency information.
You’ll find the details about the service in the [Service] section, including how to prepare, start, and reload the service.
During normal operation, systemd ignores the [Install] section. However, consider the case when sshd.service is disabled on your system and you would like to turn it on. When you enable a unit, systemd reads the [Install] section.
The [Install] section is usually responsible for the the .wants and .requires directories in the system configuration directory (/etc/systemd/system
), see:
1 | basic.target.wants getty.target.wants remote-fs.target.wants |
the $OPTIONS in unit file is the variable, also specifier is another variable-like feature often found in unit files, like %n and %H.
systemd Operation
You’ll interact with systemd primarily through the systemctl
command, which allows you to activate and deactivate services, list status, reload the configuration, and much more.
List of active units:
1 | systemctl |
List all units, includes inactives:
1 | systemctl --all |
Get status of a unit:
1 | systemctl status sshd.service |
To activate, deactivate, and restart units, use the systemd start
, stop
, and restart
commands. However, if you’ve changed a unit configuration file, you can tell systemd to reload the file in one of two ways:
1 | systemctl reload unit #Reloads just the configuration for unit. |
1 | systemctl daemon-reload #Reloads all unit configurations. |
systemd Process Tracking and Synchronization
systemd wants a reasonable amount of information and control over every process that it starts. The main problem that it faces is that a service can start in different ways; it may fork new instances of itself or even daemonize and detach itself from the original process.
To minimize the work that a package developer or administrator needs to do in order to create a working unit file, systemd uses control groups (cgroups)
, an optional Linux kernel feature that allows for finer tracking of a process hierarchy.
systemd On-Demand and Resource-Parallelized Startup
One of systemd’s most significant features is its ability to delay a unit startup until it is absolutely needed.
systemd Auxiliary Programs
When starting out with systemd, you may notice the exceptionally large number of programs in /lib/systemd
. These are primarily support programs for units. For example, udevd
is part of systemd, and you’ll find it there as systemd-udevd
. Another, the systemd-fsck
program, works as a middleman between systemd and fsck.
Shutting Down Your System
init controls how the system shuts down and reboots. The commands to shut down the system are the same regardless of which version of init you run. The proper way to shut down a Linux machine is to use the shutdown
command.
to shutdown machine immediately:
1 | shutdown -h now |
to reboot the machine now:
1 | shutdown -r now |
When system shutdown time finally arrives, shutdown tells init to begin the shutdown process. On systemd, it means activating the shutdown units; and on System V init, it means changing the runlevel to 0 or 6.
The Initial RAM Filesystem
The initramfs
is in /boot
directory.
1 | ls -ltr | grep init |
The problem stems from the availability of many different kinds of storage hardware. Remember, the Linux kernel does not talk to the PC BIOS or EFI interfaces to get data from disks, so in order to mount its root file-system, it needs driver support for the underlying storage mechanism.
The workaround is to gather a small collection of kernel driver modules along with a few other utilities into an archive. The boot loader loads this archive into memory before running the kernel.
Chapter 7. System Configuration
When you first look in the /etc
directory, you might feel a bit overwhelmed. Although most of the files that you see affect a system’s operations to some extent, a few are fundamental.
The Structure of /etc
Most system configuration files on a Linux system are found in /etc
. Historically, each program had one or more configuration files there, and because there are so many packages on a Unix system, /etc would accumulate files quickly.
The trend for many years now has been to place system configuration files into subdirectories under /etc
. There are still a few individual configuration files in /etc, but for the most part, if you run ls -F /etc
, you’ll see that most of the items there are now subdirectories.
What kind of configuration files are found in /etc
? The basic guideline is that customizable configurations for a single machine. And you’ll often find that noncustomizable system configuration files may be found elsewhere, as with the prepackaged systemd unit files in /usr/lib/systemd
.
System Logging
Most system programs write their diagnostic output to the syslog
service. The traditional syslogd daemon waits for messages and, depending on the type of message received, funnels the output to a file, the screen, users, or some combination of these, or just ignores it.
The System Logger
Most Linux distributions run a new version of syslogd called rsyslogd
that does much more than simply write log messages to files. For example, in my vm:
1 | systemctl status rsyslog |
Many of the files in /var/log
aren’t maintained by the system logger. The only way to know for sure which ones belong to rsyslogd is to look at its configuration file.
Configuration Files
The base rsyslogd configuration file is /etc/rsyslog.conf
, but you’ll find certain configurations in other directories, such as /etc/rsyslog.d
.
It talks about the syntax in the configuration file:
The configuration format is a blend of traditional rules and rsyslog-specific
extensions. One rule of thumb is that anything beginning with a dollar sign ($) is an extension.
User Management Files
Unix systems allow for multiple independent users. At the kernel level, users are simply numbers (user IDs).
The /etc/passwd File
The plaintext file /etc/passwd maps usernames to user IDs.
1 | root:x:0:0:Superuser:/root:/bin/sh |
The fields are as follows:
-
The username.
-
The user’s encrypted password. On most Linux systems, the password is not actually stored in the
passwd
file, but rather, in theshadow
file . Normal users do not have read permission for shadow. The second field inpasswd
orshadow
is the encrypted password, Unix passwords are never stored as clear text. -
An
x
in the second passwd file field indicates that the encrypted password is stored in theshadow
file. A*
indicates that the user cannot log in, and if the field is blank (that is, you see two colons in a row, like ::), no password is required to log in. (Beware of blank passwords. You should never have a user without a password.) -
The user ID (UID), which is the user’s representation in the kernel.
-
The group ID (GID). This should be one of the numbered entries in the
/etc/group
file. Groups determine file permissions and little else. This group is also called the user’sprimary
group. -
The user’s real name. You’ll sometimes find commas in this field, denoting room and telephone numbers.
-
The user’s home directory.
-
The user’s shell (the program that runs when the user runs a terminal session).
Special Users
The superuser (root) always has UID 0 and GID 0. Some users, such as daemon, have no login privileges. The nobody user is an underprivileged user. Some processes run as nobody because the nobody user cannot write to anything on the system.
The users that cannot log in are called pseudo-users
. Although they can’t log in, the system can start processes with their user IDs. Pseudo-users such as nobody are usually created for security reasons.
The /etc/shadow File
The shadow password file /etc/shadow
on a Linux system normally contains user authentication information, including the encrypted passwords and password expiration information that correspond to the users in /etc/passwd
.
Regular users interact with /etc/passwd
using the passwd
command. By default, passwd
changes the user’s password. The passwd
command is an suid-root
program, because only the superuser can change the /etc/passwd
file.
1 | -rwsr-xr-x. 1 root root 27832 Jan 29 2014 /usr/bin/passwd |
in
/etc/shells
file have multiple shell types:
1
2
3
4
5
6
7
8
9 > /bin/sh
> /bin/bash
> /sbin/nologin
> /usr/bin/sh
> /usr/bin/bash
> /usr/sbin/nologin
> /bin/ksh
> /bin/rksh
>
Because /etc/passwd is plaintext, the superuser may use any text editor to make changes. To add a user, simply add an appropriate line and create a home directory for the user; to delete, do the opposite. However, to edit the file, you’ll most likely want to use the vipw
program
Use adduser
and userde
l to add and remove users. Run passwd user
as the superuser.
Working with Groups
Groups in Unix offer a way to share files with certain users but deny access to all others. The idea is that you can set read or write permission bits for a particular group, excluding everyone else.
The /etc/group
file defines the group IDs:
1 | root:*:0:juser |
-
The group name.
-
The group password. This is hardly ever used, nor should you use it. Use * or any other default value.
-
The group ID (a number). The GID must be unique within the group file. This number goes into a user’s group field in that user’s
/etc/passwd
entry. -
An optional list of users that belong to the group. In addition to the users listed here, users with the corresponding group ID in their passwd file entries also belong to the group.
Linux distributions often create a new group for each new user added, with the same name as the user.
Setting the Time
Unix machines depend on accurate timekeeping. The kernel maintains the system clock, which is the clock that is consulted when you run commands like date.
PC hardware has a battery-backed real-time clock (RTC)
. The RTC isn’t the best clock in the world, but it’s better than nothing. The kernel usually sets its time based on the RTC at boot time, and you can reset the system clock to the current hardware time with hwclock
.
You should not try to fix the time drift with hwclock
because time-based system events can get lost or mangled. Usually it’s best to keep your system time correct with a network time daemon.
Network Time
If your machine is permanently connected to the Internet, you can run a Network Time Protocol (NTP)
daemon to maintain the time using a remote server. Many distributions have built-in support for an NTP daemon, but it may not be enabled by default. You might need to install an ntpd package to get it to work.
Scheduling Recurring Tasks with cron
The Unix cron service runs programs repeatedly on a fixed schedule. Most experienced administrators consider cron to be vital to the system because it can perform automatic system maintenance. For example, cron runs log file rotation utilities to ensure that your hard drive doesn’t fill up with old log files. You should know how to use cron because it’s just plain useful.
Also see
cronjob
in k8s doc.
You can run any program with cron
at whatever times suit you. The program running through cron is called a cron job
. To install a cron job, you’ll create an entry line in your crontab
file, usually by running the crontab
command.
for example:
1 | 15 09 * * * /home/juser/bin/spmake |
-
Minute (0 through 59). The cron job above is set for minute 15.
-
Hour (0 through 23). The job above is set for the ninth hour.
-
Day of month (1 through 31).
-
Month (1 through 12).
-
Day of week (0 through 7). The numbers 0 and 7 are Sunday.
A *
in any field means to match every value. The preceding example runs spmake daily because the day of month, month, and day of week fields are all filled with stars, which cron reads as “run this job every day, of every month, of every week.”
also can be 5th and the 14th day of each month:
1 | 15 09 5,14 * * /home/juser/bin/spmake |
Installing Crontab Files
Each user can have his or her own crontab file, which means that every system may have multiple crontabs, usually found in /var/spool/cron/
folder. the crontab
command installs, lists, edits, and removes a user’s crontab.
The easiest way to install a crontab is to put your crontab entries into a file and then use crontab file
to install file as your current crontab.
Actually, there is a default place for every user crontab file includes root. Once you create a crontab file for the user, the corresponding folder is put under /var/spool/cron/`.
For example, run as root, I want to set a recurring task for user dsadm
:
1 | crontab -u dsadm -e |
Then edit like this:
1 | 00 21 * * * /home/dsadm/test.sh > /tmp/cron-log 2>&1 |
after run the job, go to /tmp
folder you will see the log file.
to list the dsadm
cron job:
1 | crontab -l -u dsadm |
to remove cron job for dsadm
1 | crontab -r -u dsadm |
System Crontab Files
Linux distributions normally have an /etc/crontab
file. You can also edit here, but the format is a little bit difference:
1 | # Example of job definition: |
Understanding User IDs and User Switching
We’ve discussed how setuid
programs such as sudo
and su
allow you to change users:
1 | ---s--x--x 1 root root 143248 May 28 2018 /usr/bin/sudo |
In reality, every process has more than one user ID. When you run a setuid program, Linux sets the effective user ID to the program’s owner during execution, but it keeps your original user ID in the real user ID.
Think of the effective user ID as the actor and the real user ID as the owner. The real user ID defines the user that can interact with the running process—most significantly, which user can kill and send signals to a process. For example, if user A starts a new process that runs as user B (based on setuid permissions), user A still owns the process and can kill it.
On normal Linux systems, most processes have the same effective user ID
and real user ID
. I verify this by a test.sh script run as dsadm
, the euser and ruser are the same:
1 | -rwsr-xr-x 1 root root 56 May 11 22:17 test.sh |
By default, ps
and other system diagnostic programs show the effective user ID
.
In conductor container, I run many su commands, you can see this, the euser and ruser are different:
1 | ps -eo pid,euser,ruser,comm |
PAM
In 1995 Sun Microsystems proposed a new standard called Pluggable Authentication Modules (PAM)
, a system of shared libraries for authentication. To authenticate a user, an application hands the user to PAM to determine whether the user can successfully identify itself.
Because there are many kinds of authentication scenarios, PAM employs a number of dynamically loadable authentication modules
. Each module performs a specific task; for example, the pam_unix.so
module can check a user’s password.
PAM Configuration
You’ll normally find PAM’s application configuration files in the /etc/pam.d
directory (older systems may use a single /etc/pam.conf
file).
Let’s see an example:
1 | auth requisite pam_shells.so |
Each configuration line has three fields: function type
, control argument
, and module
:
-
Function type. The function that a user application asks PAM to perform. Here, it’s
auth
, the task of authenticating the user. -
Control argument. This setting controls what PAM does after success or failure of its action for the current line (
requisite
in this example). -
Module. The authentication module that runs for this line, determining what the line actually does. Here, the
pam_shells.so
module checks to see whether the user’s current shell is listed in/etc/shells
.
PAM configuration is detailed on the pam.conf(5) manual page:
1 | man 5 pam.conf |
Chapter 8. A Closer Look at Processes and Resource Utilization
This chapter takes you deeper into the relationships between processes, the kernel, and system resources.
Many of the tools that you see in this chapter are often thought of as performance-monitoring tools. They’re particularly helpful if your system is slowing to a crawl and you’re trying to figure out why.
Tracking Processes
The top
program is often more useful than ps
because it displays the current system status as well as many of the fields in a ps
listing, and it updates the display every second.
You can send commands to top
with keystrokes. When you enter top
command:
1 | Tasks: 382 total, 2 running, 380 sleeping, 0 stopped, 0 zombie |
If you see task %CPU is larger than 100, it must be a multi-thread and leverages multi-core. You can use H
to toggle thread display rather than task, you will see multi-thead and each of them %CPU.
Note: if you want to see memory in MB, GB… Typing
shift + e/E
cycle through.
then type followings:
1 | Spacebar: Updates the display immediately. |
Finding Open Files by lsof
One use for this command is when a disk cannot be unmounted because (unspecified) files are in use. The listing of open files can be consulted (suitably filtered if necessary) to identify the process that is using the files.
The lsof
command lists open files and the processes using them. lsof
doesn’t stop at regular files, it can list network resources, dynamic libraries, pipes, and more.
For example:
Display entries for open files in /usr
directory and successors.
1 | lsof /usr/* |
List open files for a particular PID:
1 | lsof -p 1623 |
Tracing Program Execution and System Calls
The most common use is to start a program using strace
, which prints a list of system calls
made by the program. This is useful if the program continually crashes, or does not behave as expected; for example using strace may reveal that the program is attempting to access a file which does not exist or cannot be read.
The strace
(system call trace) and ltrace
(library trace) commands can help you discover what a program attempts to do. These tools produce extraordinarily large amounts of output, but once you know what to look for, you’ll have more tools at your disposal for tracking down problems.
For example:
1 | strace cat not_a_file |
you get errors in open("not_a_file", O_RDONLY)
line:
1 | execve("/usr/bin/cat", ["cat", "not_a_file"], [/* 23 vars */]) = 0 |
Threads
In Linux, some processes are divided into pieces called threads.
To display the thread information in ps
, add the m option.
For example:
1 | ps axm -o pid,tid,command |
1 | PID TID COMMAND |
The main thread ID is the as the process ID
Introduction to Resource Monitoring
To monitor one or more specific processes over time, use the -p
option to top
, with this syntax:
1 | top -p <pid> |
Adjusting Process Priorities
You can change the way the kernel schedules a process in order to give the process more or less CPU time than other processes.
The kernel runs each process according to its scheduling priority, which is a number between –20
and 20
, with –20
being the foremost priority.
1 | ps axl |
1 | top |
PR
is the priority value. NI
(nice value), high nice value means nicer, more likely to give up CPU time.
Alter the nice value:
1 | renice <value> <pid> |
Load Averages
The load average
is the average number of processes currently ready to run. Keep in mind that most processes on your system are usually waiting for input (from the keyboard, mouse, or network, for example), meaning that most processes are not ready to run and should contribute nothing to the load average. Only processes that are actually doing something affect the load average.
1 | # uptime |
The three numbers are the load averages for the past 1 minute, 5 minutes, and 15 minutes, respectively. An average of only 0.01 processes have been running across all processors for the past 15 minutes.
If a load average goes up to around 1, a single process is probably using the CPU nearly all of the time. To identify that process, use the top command; the process will usually rise to the the top of the display.
If you have two cores, a load average of 1 means that only one of the cores is likely active at any given time, and a load average of 2 means that both cores have just enough to do all of the time.
A high load average does not necessarily mean that your system is having trouble. A system with enough memory and I/O resources can easily handle many running processes. If your load average is high and your system still responds well, don’t panic
However, if you sense that the system is slow and the load average is high, you might be running into memory performance problems.
Memory
CPU has a memory management unit (MMU)
that translates the virtual memory addresses used by processes into real ones. The kernel assists the MMU by breaking the memory used by processes into smaller chunks called pages
.
The kernel maintains a data structure, called a page table
, that contains a mapping of a processes’ virtual page addresses to real page addresses in memory. As a process accesses memory, the MMU translates the virtual addresses used by the process into real addresses based on the kernel’s page table.
A user process does not actually need all of its pages to be immediately available in order to run. The kernel generally loads and allocates pages as a process needs them; this system is known as on-demand paging
or just demand paging
.
Page Faults
If a memory page is not ready when a process wants to use it, the process triggers a page fault
.
-
MINOR PAGE FAULTS A minor page fault occurs when the desired page is actually in main memory but the MMU doesn’t know where it is. This can happen when the process requests more memory or when the MMU doesn’t have enough space to store all of the page locations for a process. In this case, the kernel tells the MMU about the page and permits the process to continue. Minor page faults aren’t such a big deal, and many occur as a process runs. Unless you need maximum performance from some memory-intensive program, you probably shouldn’t worry about them.
-
MAJOR PAGE FAULTS A major page fault occurs when the desired memory page isn’t in main memory at all, which means that the kernel must load it from the disk or some other slow storage mechanism. Some major page faults are unavoidable, such as those that occur when you load the code from disk when running a program for the first time.
Let’s see the page faults:
1 | # /usr/bin/time netstat > /dev/null |
There are 3 major page faults and 781 minor page faults when running netstat
program. The major page faults occurred when the kernel needed to load the program from the disk for the first time. If you ran the command again, you probably wouldn’t get any major page faults because the kernel would have cached the pages from the disk:
1 | # /usr/bin/time netstat > /dev/null |
Note that
time
here is not the shell built-intime
command! If you run
1
2 > type -a time
>
you will see
1
2
3 > time is a shell keyword
> time is /usr/bin/time
>
see this doc
If you’d rather see the number of page faults of processes as they’re running, use top
or ps
. When running top
, use f
to add the displayed fields and space to display the nMaj
and nMin
.
1 | # top |
When using ps
, you can use a custom output format to view the page faults for a particular process:
1 | # ps -o pid,min_flt,maj_flt 1 |
Monitoring CPU and Memory Performance
Among the many tools available to monitor system performance, the vmstat
command is one of the oldest, with minimal overhead. You’ll find it handy for getting a high-level view of how often the kernel is swapping pages in and out, how busy the CPU is, and IO utilization.
1 | vmstat 2 |
Not easy to understand, can dig deeper into it by reading vmstat(8) manual page.
I/O Monitoring
Like vmstat
and netstat
(talk later), we have iostat
.
1 | iostat 2 -d -p ALL |
This means update every 2 seconds, show device only and show all partitions.
If you need to dig even deeper to see I/O resources used by individual processes, the iotop
tool can help. Using iotop
is similar to using top.
1 | iotop |
It shows TID (thread ID) instead of PID, PRIO (priority) indicates the IO priority, be/3
is more important than be/4
. The kernel uses the scheduling class
to add more control for I/O scheduling. You’ll see three scheduling classes from iotop
:
-
be Best-effort. The kernel does its best to fairly schedule I/O for this class. Most processes run under this I/O scheduling class.
-
rt Real-time. The kernel schedules any real-time I/O before any other class of I/O, no matter what.
-
idle Idle. The kernel performs I/O for this class only when there is no other I/O to be done. There is no priority level for the idle scheduling class.
Per-Process Monitoring
The pidstat
utility allows you to see the resource consumption of a process over time in the style of vmstat
.
1 | # pidstat -p 27946 1 |
The CPU
column tells you about this process is running on which CPU.
Chapter 9. Network and Configuration
Note that the
ifconfig
command, as well some of the others you’ll see later in this chapter (such asroute
andarp
), has been technically supplanted with the newerip
command. The ip command can do more than the old commands, and it is preferable when writing scripts. However, most people still use the old commands when manually working with the network, and these commands can also be used on other versions of Unix. For this reason, we’ll use the old-style commands.
Routes and the Kernel Routing Table
Let’s see the routing table by route
command, -n
means show numerical address instead of hostname:
1 | # route -n |
The Destination
column tells you a network prefix (outside network), and the Genmask
column is the netmask corresponding to that network. Each network has a U
under its Flags
column, indicating that the route is active (“up”).
There is a G
in the Flags
column, meaning that communication for this network must be sent through the gateway in the Gateway column, for example, for networl 0.0.0.0/0
send through it’s gateway 9.30.94.1
. If no G
in Flags
, indicating that the network is directly connected in some way.
An entry for 0.0.0.0/0
in the routing table has special significance because it matches any address on the Internet. This is the default route, and the address configured under the Gateway column (in the route -n
output) in the default route is the default gateway
.
Basic ICMP and DNS Tools
ping
1 | # ping baidu.com |
56(84) bytes of data
means send a 56 bytes packet (84 bytes when include header).
icmp_seq
is sequence number, sometimes you will have gap, usually means there’s some kind of connectivity problem.
time
is round-trip time.
traceroute
One of the best things about traceroute
is that it reports return trip times at each step in the route:
1 | ## -n will not do hostname lookup for IP in output |
DNS and host
To find the IP address behind a domain name, use the host
command:
1 | # host www.google.com |
You can also use host
in reverse: Enter an IP address instead of a hostname to try to discover the hostname behind the IP address. But don’t expect this to work reliably. Many hostnames can represent a single IP address, and DNS doesn’t know how to determine which hostname should correspond to an IP address.
Kernel Network Interfaces
Network interfaces have names that usually indicate the kind of hardware underneath, such as eth0
(the first Ethernet card in the computer) and wlan0
(a wireless interface).
1 | eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 |
UP,RUNNING
means this interface is active.
Resolving Hostnames
On most systems, you can override hostname lookups with the /etc/hosts
file.
Usually resolution will first check this file before resort to DNS server.
The traditional configuration file for DNS servers is /etc/resolv.conf
:
1 | ## this is the search pattern: |
172.16.200.52
and 172.16.200.50
are the DNS server IP.
netstat command
This netstat
command is extremely important and common in use. Ususally I use netstat -tunlp
, let’s dig deeper into it:
-t
: show TCP connection.-u
: show UDP connection.-n
: show numerical addresses.-l
: show only listening sockets.-p
: show PID belongs to.
Instead of ifconfig
to see the interface, you can use:
1 | # netstat -i |
Instead of route -n
to see route table, you can use:
1 | # netstat -rn |
Show TCP connections (not include listening sockets):
1 | # netstat -tn |
To see well-known ports translate into names. check /etc/services
file:
1 | ... |
On Linux, only processes running as the superuser can use ports 1 through 1023. All user processes may listen on and create connections from ports 1024 and up.
I skip the rest of this chapter, majority is concept
Chapter 10. Network Applications and Services
Let’s mainly focus on some important commands here:
curl command
curl
is a command line tool to transfer data to or from a server, using any of the supported protocols (HTTP, FTP, IMAP, POP3, SCP, SFTP, SMTP, TFTP, TELNET, LDAP or FILE). curl
is powered by Libcurl
. This tool is preferred for automation, since it is designed to work without user interaction. curl can transfer multiple file at once.
you can refer this article.
Diagnostic Tools
lsof
(list open files) can track open files, but it can also list the programs currently using or listening to ports. Please read more when you need this tool.
tcpdump
, a command tool version of wireshark.
netcat
(or nc
) I used it before for developing PXEngine, we use TCP to replace ssh connection between conductor and compute containers. netcat
can connect to remote TCP/UDP ports, specify a local port, listen on ports, scan ports, redirect standard I/O to and from network connections, and more.
I remember I use nc
to listen on a port and on other side connect to that port and transfer data.
1 | ## install |
netcat
can be used for TCP, UDP, Unix-domain sockets.
nmap
scans all ports on a machine or network of machines looking for open ports, and it lists the ports it finds.
1 | # nmap myk8s1.fyre.ibm.com |
Chapter 11. Introduction to Shell Scripts
A shell script is a series of commands written in a file.
The #!
part is called a shebang
.
When writing scripts and working on the command line, just remember what happens whenever the shell runs a command:
- Before running the command, the shell looks for variables, globs, and other substitutions and performs the substitutions if they appear.
- The shell passes the results of the substitutions to the command.
if you use single quote:
1 | grep 'r.*t' /etc/passwd |
This will prevent sheel from expanding the *
in current directory.
1 | grep 'r.*t /etc/passwd' |
This will fail, because things wrapped by single/double quote treat as one parameter.
Double quotes (") work just like single quotes, except that the shell expands variables that appear within double quotes. It will not expand globs like *
in double quotes!
Just like I saw, use shift
to forward arguments passed in:
1 | #!/bin/sh |
$#
is the number of arguments passed in, used in loop to pick up parameters.
$@
represents all of the script arguments.
$$
holds the PID of current shell.
bad message should go to standard error, just like redriect standard error to standard output:
1 | echo $0: bad option ... 1>&2 |
$?
exit code: If you intend to use the exit code of a command, you must use or store the code immediately after running the command.
if condition
Let’s see an example, these 2 are good:
1 | if [ "$1" = hi ]; then |
Here, ""
is vital, since user may not input $1
, if no double quotes, it could be:
1 | if [ = hi ]; then |
the test ([
) command aborts immediately.
Note that the stuff follows
if
is a command! so we have;
beforethen
.
So you can use other commands instead of [
command, cool!
1 | #!/bin/sh |
Let’s see &&
and ||
and test condition:
1 | #!/bin/sh |
The -a
and -o
flags are the logical and
and or
operators in test:
1 | [ "$1" = hi -o "$1" = ho ] |
test command
There are dozens of test operations, all of which fall into three general categories: file tests, string tests, and arithmetic tests.
file filter
-f
: regular file return 0
-e
: file exist return 0
-s
: not empty file return 0
-d
: directory return 0
-h
: softlink return 0
File permission:
-r
: readable
-w
: writable
-x
: executable
-u
: setuid
-g
: setgid
-k
: sticky
The test command follows symbolic links (except for the
-h
test). That is, if link is a symbolic link to a regular file, [-f
link ] returns an exit code of true (0).
Finally, three binary operators (tests that need two files as arguments) are used in file tests, but they’re not terribly common.
[ file1 -nt file2 ]
: if file1 has a newer modification date than file2 return 0
[ file1 -ot file2 ]
: if file1 has a older modification date than file2 return 0
[ file1 -ef file2 ]
: compares two files and returns true if they share inode numbers and devices.
string test
=
: equal
!=
: not equal
-z
: empty string return 0
-n
: not empty return 0
arithmetic test
-eq
: equal to
-ne
: not equal to
-lt
: less than
-gt
: greater than
-le
: less than or equal to
-ge
: greater than or equal to
case condition
The case keyword forms another conditional construct that is exceptionally useful for matching strings, it can do pattern matching:
1 | #!/bin/sh |
Each case must end with a double semicolon (;;) or you risk a syntax error.
loop
for loop:
1 | #!/bin/sh |
while loop:
1 | #!/bin/sh |
In fact, if you find that you need to use while, you should probably be using a language like awk or Python instead.
Command Substitution
You can use a command’s output as an argument to another command, or you can store the command output in a shell variable by enclosing a command in $()
.
Temporary File Management
Note the mktemp
command:
1 | #!/bin/sh |
If the script is aborted, the temporary files could be left behind. In the preceding example, pressing CTRL-C
before the second cat command leaves a temporary file in /tmp
. Avoid this if possible. Instead, use the trap
command to create a signal handler to catch the signal that CTRL-C
generates and remove the temporary files, as in this handler:
1 | #!/bin/sh |
You must use exit in the handler to explicitly end script execution, or the shell will continue running as usual after running the signal handler.
Note that in
startcontainer.sh
we also have trap and we use shell function there, now I understand!
Important Shell Script Utilities
basename
This one strip the extension of file name:
1 | # basename example.html .html |
This one git rid of directory in full path:
1 | # basename /usr/local/bin/example |
awk
The awk
command is not a simple single-purpose command; it’s actually a powerful programming language. Unfortunately, awk usage is now something of a lost art, having been replaced by larger languages such as Python.
sed
The sed program (sed stands for stream editor) is an automatic text editor that takes an input stream (a file or the standard input), alters it according to some expression, and prints the results to standard output.
expr
The expr
command is a clumsy, slow way of doing math. If you find yourself using it frequently, you should probably be using a language like Python instead of a shell script.
Subshells
An entirely new shell process that you can create just to run a command or two. The new shell has a copy of the original shell’s environment, and when the new shell exits, any changes you made to its shell environment disappear, leaving the initial shell to run as normal.
Using a subshell to make a single-use alteration to an environment variable is such a common task:
1 | # (PATH=/usr/confusing:$PATH; ./runprogram.sh) |
Chapter 12. Moving Files Across the Network
Quick copy browser
Go to target directory run:
1 | python -m SimpleHTTPServer |
This usually open 8000 port on your machine, then go to another machine open:
1 | # use ifconfig to check the source machine IP |
you can see the content there.
rsync
Actually you can first enable Mac ssh access then use rsync
to backup files:
System Preference -> Sharing -> check remote login
To get rsync
working between two hosts, the rsync program must be installed on both the source and destination, and you’ll need a way to access one machine from the other.
Copy files to remote home:
1 | rsync files remote: |
If rsync
isn’t in the remote path but is on the system, use --rsync-path=path
to manually specify its location.
Unless you supply extra options, rsync
copies only files. You will see:
1 | skipping directory xxx |
To transfer entire directory hierarchies, complete with symbolic links, permissions, modes, and devices—use the -a
option.
1 | rsync -nv files -a dir user@remote: |
-n
: dry-run, this is vital when you are not sure.
-vv
: verbose mode
To make an exact replica of the source directory, you must delete files in the destination directory that do not exist in the source directory:
1 | rsync -v --delete -a dir user@remote: |
Please use -n
dry-run to see what will be deleted before performing command.
Be particular careful with tailing slash after dir:
1 | rsync -a dir/ user@remote:dest |
This will copy all files under dir to dest folder in remote instead of copy dir into dest.
You can also --exclude=
, --exclude-from=
and --include=
in command.
To speed operation, rsync
uses a quick check to determine whether any files on the transfer source are already on the destination. The quick check uses a combination of the file size and its last-modified date.
When the files on the source side are not identical to the files on the destination side, rsync
transfers the source files and overwrites any files that exist on the remote side. The default behavior may be inadequate, though, because you may need additional reassurance that files are indeed the same before skipping over them in transfers, or you may want to put in some extra safeguards.
-
--checksum
(abbreviation:-c
) Compute checksums (mostly unique signatures) of the files to see if they’re the same. This consumes additional I/O and CPU resources during transfers, but if you’re dealing with sensitive data or files that often have uniform sizes, this option is a must. (This will focus on file content, not date stamp) -
--ignore-existing
Doesn’t clobber files already on the target side. -
--backup
(abbreviation:-b
) Doesn’t clobber files already on the target but rather renames these existing files by adding a~
suffix to their names before transferring the new files. -
--suffix=s
Changes the suffix used with --backup from~
tos
. -
--update
(abbreviation:-u
) Doesn’t clobber any file on the target that has a later date than the corresponding file on the source.
You can also compress the dir when transfer:
1 | rsync -az dir user@remote: |
You can also reverse the process:
1 | rsync -a user@remote:dir dest |
The rest of this chapter talks samba
for file sharing, I skip it.
Chapter 13. User Environments
Startup files play an important role at this point, because they set defaults for the shell and other interactive programs. They determine how the system behaves when a user logs in.
I see vi theme config in ~/.bashrc
file.
The Command Path
The most important part of any shell startup file is the command path. The path should cover the directories that contain every application of interest to a regular user. At the very least, the path should contain these components, in order:
1 | /usr/local/bin |
If the application is on another directory, use symbolic link to /usr/local/bin
or you defined bin
folder.
The prompt
I never use this so far, usually prompt shows hostname, username, current directory and sign ($
or #
). you can change the color and more.
Alias
This is common use, sometimes I use shell functions too.
Permission mask
It depends on your needs:
1 | umask 022/077 |
Startup file order
These startup files are used to create environment. Each script has a specific use and affects the login environment differently. Every subsequent script executed can override the values assigned by previous scripts.
The two main shell instance types
are interactive
and noninteractive
, but of those, only interactive shells are of interest because noninteractive shells (such as those that run shell scripts) usually don’t read any startup files.
Interactive shells are the ones that you use to run commands from a terminal, they can be classified as login
or non-login
.
I know there are lots of startup files under each user’s home directory or in other system folder, how do they take effect? In what order: Reference Doc Difference between Login shell and Non login shell
Logging in remotely with SSH
also gives you a login shell.
You can tell if a shell is a login shell by running echo $0
; if the first character is a -
, the shell’s a login shell.
When Bash is invoked as a Login
shell:
- Login process calls
/etc/profile
/etc/profile
calls the scripts in/etc/profile.d/
- Login process calls
~/.bash_profile
,~/.bash_login
and~/.profile
. running only the first one that it sees.
Login Shells created by explicitly telling to login.
examples: # su - | # su -l | # su --login | # su USERNAME - | # su -l USERNAME | # su --login USERNAME | # sudo -i
When bash is invoked as a Non-login
shell;
- Non-login process(shell) calls
/etc/bashrc
- then calls
~/.bashrc
Non-Login
shells created using the below command syntax:
examples: # su | # su USERNAME
Note that I can run bash
or sh
or csh
in terminal, it will give me a new simple prompt without user profile or setting…
It seems if you use non-login like su dsadm
, the export env vars are still there in env
scope, I think the reason is it’s not login! still use current environment. But if you run su - dsadm
, it is gone.