Understand the difference between source and execute a script is important, otherwise you will be confused why something doesn’t run as you expect.

Source Script

The file may not necessary to be a executable (chmod -x) but should be valid shell script. We usually use source to load shell functions and export environment variables into current shell process.

For example, both syntax are good:

1
2
source ./xx.env
. ./xx.env

注意source 在当前上下文中执行脚本,不会生成新的进程!!执行完毕后回到当前进程。

Note that . is not an alias for source, but rather the other way around. source is a bash extension, while . works in any POSIX compatible shell.

You can also put the file path in $PATH so that you don’t need to specify the path in command.

Execute Script

The file is executable (chmod +x) and you are in right permission to run it. And you need to specify the path even in current directory, or put path in $PATH:

1
./xx.sh

The current shell spawns a new shell to run the script. The script is running in the new shell and all changes to the environment only in the new shell. After the script is done all changes to the environment in the new shell are destroyed.

Summary

Use execution method will run the script as another process, so variables and functions in child script will not be accessible in parent shell.

The source method executes the child script in the parent script’s process, the parent process can access the variabels or functions in child script. If you are using exit in script, it will exit the parent script as well. Which will not happen in execution method.

exec Command

exec command, 这个命令在docker container的entry script中很常见:

此外还有su-exec 命令,但这个不是built-in的: https://github.com/ncopa/su-exec

Kubernetes version 1.13.2

In my article <<Linux IPC>>, I mentioned that there is a workaround to set IPC kernel parameters using sysctl in Kubernetes cluster if SYS_RESOURCE is not allowed.

Clarification

From the Kubernetes document, we see:

Sysctls are grouped into safe and unsafe sysctls. This means that setting a safe sysctl for one pod:

  • must not have any influence on any other pod on the node
  • must not allow to harm the node’s health
  • must not allow to gain CPU or memory resources outside of the resource limits of a pod.

By far, most of the namespaced sysctls are not necessarily considered safe (please check latest Kubernetes document to figure out), now it supports:

  • kernel.shm_rmid_forced,
  • net.ipv4.ip_local_port_range,
  • net.ipv4.tcp_syncookies.

This list will be extended in future Kubernetes versions when the kubelet supports better isolation mechanisms.

All safe sysctls are enabled by default (you can use it directly without additional configuration in kubelet).

All unsafe sysctls are disabled by default and must be allowed manually by the cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be scheduled, but will fail to launch:

If you describe the failed pod, you get:

A number of sysctls are namespaced in today’s Linux kernels. This means that they can be set independently for each pod on a node. Only namespaced sysctls are configurable via the pod securityContext within Kubernetes.

The following sysctls are known to be namespaced. This list could change in future versions of the Linux kernel.

  • kernel.shm*
  • kernel.msg*
  • kernel.sem
  • fs.mqueue.*
  • net.*

Sysctls with no namespace are called node-level sysctls. If you need to set them, you must manually configure them on each node’s operating system, or by using a DaemonSet with privileged containers.

As with node-level sysctls it is recommended to use taints and toleration feature or taints on nodes to schedule those pods onto the right nodes.

Use the pod securityContext to configure namespaced sysctls. The securityContext applies to all containers in the same pod.

Configure kubelet

If you need to use unsafe sysctls, configure kubelet in target node (configure the node that the unsafe sysctls pod will reside) is a must. Go to edit 10-kubeadm.conf file in /etc/systemd/system/kubelet.service.d/, add

1
2
Environment="KUBELET_UNSAFE_SYSCTLS=--allowed-unsafe-sysctls='kernel.shm*,kernel.sem,kernel.msg*'"

Here I need kernel.shm*, kernel.sem and kernel.msg*.

then run:

1
2
systemctl daemon-reload
systemctl restart kubelet

verify changes, you can see --allowed-unsafe-sysctls is there:

1
ps aux | grep kubelet

A brief digress: the kubelet service unit file is in /etc/systemd/system/kubelet.service.

Then you can edit YAML file to add sysctls option:

Sometimes you need to disable hostIPC, if not you will get this problem:

After things done, get into the container to check the kernel parameter vaule, for example:

1
sysctl -a | grep -i kernel.sem

Resources

kubernetes 1.4 new feature: support sysctls configure kernel parameters in k8s cluster

I haven’t got time to learn Docker systematically so far, but I still gain a lot from daily tasks. This post is a brief summary for what I have done to upgrade the container from owned by root to noon-root for security reason required by our customers.

I will talk the development workflow instead of the detail about how to set and modify the configurations inside the container:

I have a base image at beginning, let’s say root_is_engine.tar.gz, load it:

1
docker load -i root_is_engine.tar.gz

you can check the loaded image by running:

1
docker images

Now I am going to create the container from this image, but wait, I don’t want the container to run any script or program when it starts, what I need is it just hangs without doing anything.

That means I need to override the default entrypoint (entrypoint sets the command and parameters that will be executed first when spin up a container) in docker run command:

1
2
3
4
5
6
7
8
9
docker run --detach \
--cap-add=SYS_ADMIN \
--privileged=false --name=${ENGINE_HOST} --hostname=${ENGINE_HOST} \
--restart=always --add-host="${SERVICES_HOST} ${DB2_XMETA_HOST} ${ENGINE_HOST}":${ENGINE_HOST_IP} \
-p 8449:8449 \
-v ${DEDICATED_ENGINE_VOLPATH}/${ENGINE_HOST}/EngineClients/db2_client/dsadm:/home/dsadm \
--entrypoint=/bin/sh \
${DOCKER_IMAGE_TAG_ENGINE}:${DOCKER_IMAGE_VERSION} \
-c 'tail -f /dev/null'

Note that place the arguments to your entrypoint at the end of your docker command -c 'tail -f /dev/null'

Run docker ps command, you can see under COMMAND column the entrypoint is what I specified:

1
2
3
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS                     NAMES
b462f6123684 is-engine-image:1 "/bin/sh -c 'tail -f..." 2 days ago Up 2 days 0.0.0.0:8449->8449/tcp is-en-conductor-0.en-cond

Then get into the container by running:

1
docker exec -it <container id or container name> [bash|sh]

Then if you check the process status, you can see the init process with PID #1 is running tail command to track /dev/null device file:

1
2
3
4
5
6
ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 5968 616 ? Ss 16:47 0:00 tail -f /dev/null
root 27 1.5 0.0 13420 1992 pts/0 Ss 16:50 0:00 bash
root 45 0.0 0.0 53340 1864 pts/0 R+ 16:50 0:00 ps aux

OK, now I can make changes, for example, create and switch to ordinary user with specified user id and group id, grant them privileges, modify the owner and permission of some files, run and update startup script line by line to see if the applications setup correctly with non-root.

Note if you have mount path in host machine, you may need to chown correct uid and gid, otherwise ordinary user in container may get permission denied issue.

After running some tests and they succeed, I need to commit the changes into a new image.

  • First understand how does docker commit work? what will be committed into new image?

    The container has a writable layer that stacks on top of the image layers. This writable layer allows you to make changes to the container since the lower layers in the image are read-only.

    From the docker documentation, it said: It can be useful to commit a container’s file changes or settings into a new image.

    The commit operation will not include any data contained in volumes mounted inside the container.

    By default, the container being committed and its processes will be paused while the image is committed. This reduces the likelihood of encountering data corruption during the process of creating the commit. If this behavior is undesired, set the --pause option to false.

    The --change option will apply Dockerfile instructions to the image that is created. Supported Dockerfile instructions: CMD|ENTRYPOINT|ENV|EXPOSE|LABEL|ONBUILD|USER|VOLUME|WORKDIR

  • How about processes in the running container?

    When you start container from image then process will start here - processes exists only in executing container, when container stops there are no processes anymore - only files from container’s filesystem.

Note that before committing, you need to quiesce the services and remove the mount path content to unlink all broken symbolic links.

Also remember to put the old entrypoint back:

1
2
3
docker commit \
--change 'ENTRYPOINT ["/bin/bash", "-c", "/opt/xx/initScripts/startcontainer.sh"]' \
<container ID> is-engine-image:1

Note that podman commit format is different.

Note that you may use bin/sh instead of /bin/bash.

OK, now start to run the new image with non-root user:

1
2
3
4
5
6
7
8
9
docker run --detach \
--user 1000 \
--cap-add=SYS_ADMIN \
--privileged=false --name=${ENGINE_HOST} --hostname=${ENGINE_HOST} \
--restart=always --add-host="${SERVICES_HOST} ${DB2_XMETA_HOST} ${ENGINE_HOST}":${ENGINE_HOST_IP} \
-p 8449:8449 \
-v ${DEDICATED_ENGINE_VOLPATH}/${ENGINE_HOST}/EngineClients/db2_client/dsadm:/home/dsadm \
${DOCKER_IMAGE_TAG_ENGINE}:${DOCKER_IMAGE_VERSION}

Let’s see the processes in the non-root container:

1
2
3
4
5
6
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
dsadm 1 0.0 0.0 13288 1604 ? Ss 18:29 0:00 /bin/bash /opt/xx/initScripts/startcontainer.sh
dsadm 540 0.0 0.0 5968 620 ? S 18:29 0:00 tail -f /dev/null
dsadm 568 0.1 0.3 2309632 24792 ? Sl 18:29 0:00 /opt/xx/../../jdk
dsadm 589 2.5 0.0 13420 2012 pts/0 Ss 18:36 0:00 bash
dsadm 610 0.0 0.0 53340 1868 pts/0 R+ 18:36 0:00 ps aux

If all things are in good shape, save the image into tar.gz format, of course you can use a new tag before saving:

1
docker save is-engine-image:1 | gzip > ~/nonroot_is_engine.tar.gz

Note that there is a gzip to compress the image.

Prequel

Recently I was dealing with Linux kernel parameters, which are new to me and in my case they are the key to performance of the Database (DB2).

In DB2 Kernel parameter requirements (Linux), the database manager uses a formula to automatically tune kernel parameter settings and eliminate the need for manual updates to these settings.

So the DB2 relies on good kernel parameters used for IPC to perform well.

When instances are started, if an interprocess communication (IPC) kernel parameter is below the enforced minimum value, the database manager updates it to the enforced minimum value.

There are severl Linux IPC kernel parameters need to be adjusted for DB2:

1
2
3
4
5
6
7
8
9
10
kernel.shmmni (SHMMNI)
kernel.shmmax (SHMMAX)
kernel.shmall (SHMALL)
kernel.sem (SEMMNI)
kernel.sem (SEMMSL)
kernel.sem (SEMMNS)
kernel.sem (SEMOPM)
kernel.msgmni (MSGMNI)
kernel.msgmax (MSGMAX)
kernel.msgmnb (MSGMNB)

By the time the case was we had to run DB2 as root in container(becuase it would tune kernel parameters), so to minimize the root user privilege we decide to remove some of Linux capibilities: SYS_RESOURCE and SYS_ADMIN, but these removed caps may impact the kernel parameters tuning, so we ran test suite to expose failures and errors on xmeta pods. 后来想想,当时应该查看一下 DB2 进程的 cap 用到了哪些, 比如用pscap or getpcaps command.

For example, if you check SYS_RESOURCE manual, you can see:

1
2
3
4
5
6
7
8
9
CAP_SYS_RESOURCE

* raise msg_qbytes limit for a System V message queue above
the limit in /proc/sys/kernel/msgmnb (see msgop(2) and
msgctl(2));
* use F_SETPIPE_SZ to increase the capacity of a pipe above
the limit specified by /proc/sys/fs/pipe-max-size;
* override /proc/sys/fs/mqueue/queues_max limit when creating
POSIX message queues (see mq_overview(7));

Without granting SYS_RESOURCE, msgmnb (maybe also other kernel parameters) cannot be changed properly (actually I doubt this after checking having and not having SYS_RESOURCE result).

About IPC

Let’s first understand what is IPC?

IPC Mechanisms IPC Mechanisms on Linux - Introduction

This post seems on dying, forwards it here (after I go through it, I can still remember some tech words from 402 Operating System, but I forget the detail).

Inter-Process-Communication (or IPC for short) are mechanisms provided by the kernel to allow processes to communicate with each other. On modern systems, IPCs form the web that bind together each process within a large scale software architecture.

The Linux kernel provides the following IPC mechanisms:

  1. Signals
  2. Anonymous Pipes
  3. Named Pipes or FIFOs
  4. SysV Message Queues
  5. POSIX Message Queues
  6. SysV Shared memory
  7. POSIX Shared memory
  8. SysV semaphores
  9. POSIX semaphores
  10. FUTEX locks
  11. File-backed and anonymous shared memory using mmap
  12. UNIX Domain Sockets
  13. Netlink Sockets
  14. Network Sockets
  15. Inotify mechanisms
  16. FUSE subsystem
  17. D-Bus subsystem

While the above list seems quite a lot, each IPC mechanism from the list describe above, is tailored to work better for a particular use-case scenario.

  • SIGNALS Signals are the cheapest forms of IPC provided by Linux. Their primary use is to notify processes of change in states or events that occur within the kernel or other processes. We use signals in real world to convey messages with least overhead - think of hand and body gestures. For example, in a crowded gathering, we raise a hand to gain attention, wave hand at a friend to greet and so on.

    On Linux, the kernel notifies a process when an event or state change occurs by interrupting the process’s normal flow of execution and invoking one of the signal handler functinos registered by the process or by the invoking one of the default signal dispositions supplied by the kernel, for the said event.

  • ANONYMOUS PIPES Anonymous pipes (or simply pipes, for short) provide a mechanism for one process to stream data to another. A pipe has two ends associated with a pair of file descriptors - making it a one-to-one messaging or communication mechanism. One end of the pipe is the read-end which is associated with a file-descriptor that can only be read, and the other end is the write-end which is associated with a file descriptor that can only be written. This design means that pipes are essentially half-duplex.

    Anonymous pipes can be setup and used only between processes that share parent-child relationship. Generally the parent process creates a pipe and then forks child processes. Each child process gets access to the pipe created by the parent process via the file descriptors that get duplicated into their address space. This allows the parent to communicate with its children, or the children to communicate with each other using the shared pipe.

    Pipes are generally used to implement Producer-Consumer design amongst processes - where one or more processes would produce data and stream them on one end of the pipe, while other processes would consume the data stream from the other end of the pipe.

  • NAMED PIPES OR FIFO Named pipes (or FIFO) are variants of pipe that allow communication between processes that are not related to each other. The processes communicate using named pipes by opening a special file known as a FIFO file. One process opens the FIFO file from writing while the other process opens the same file for reading. Thus any data written by the former process gets streamed through a pipe to the latter process. The FIFO file on disk acts as the contract between the two processes that wish to communicate.

  • MESSAGE QUEUES Message Queues are synonymous to mailboxes. One process writes a message packet on the message queue and exits. Another process can access the message packet from the same message queue at a latter point in time. The advantage of message queues over pipes/FIFOs are that the sender (or writer) processes do not have to wait for the receiver (or reader) processes to connect. Think of communication using pipes as similar to two people communicating over phone, while message queues are similar to two people communicating using mail or other messaging services.

    There are two standard specifications for message queues.

    • SysV message queues. The AT&T SysV message queues support message channeling. Each message packet sent by senders carry a message number. The receivers can either choose to receive message that match a particular message number, or receive all other messages excluding a particular message number or all messages.

    • POSIX message queues. The POSIX message queues support message priorities. Each message packet sent by the senders carry a priority number along with the message payload. The messages get ordered based on the priority number in the message queue. When the receiver tries to read a message at a later point in time, the messages with higher priority numbers get delivered first. POSIX message queues also support asynchronous message delivery using threads or signal based notification.

    Linux support both of the above standards for message queues.

  • SHARED MEMORY As the name implies, this IPC mechanism allows one process to share a region of memory in its address space with another. This allows two or more processes to communicate data more efficiently amongst themselves with minimal kernel intervention.

    There are two standard specifications for Shared memory.

    • SysV Shared memory. Many applications even today use this mechanism for historical reasons. It follows some of the artifacts of SysV IPC semantics.

    • POSIX Shared memory. The POSIX specifications provide a more elegant approach towards implementing shared memory interface. On Linux, POSIX Shared memory is actually implemented by using files backed by RAM-based filesystem. I recommend using this mechanism over the SysV semantics due to a more elegant file based semantics.

  • SEMAPHORES Semaphores are locking and synchronization mechanism used most widely when processes share resources. Linux supports both SysV semaphores and POSIX semaphores. POSIX semaphores provide a more simpler and elegant implementation and thus is most widely used when compared to SysV semaphores on Linux.

  • FUTEXES Futexes are high-performance low-overhead locking mechanisms provided by the kernel. Direct use of futexes is highly discouraged in system programs. Futexes are used internally by POSIX threading API for condition variables and its mutex implementations.

  • UNIX DOMAIN SOCKETS UNIX Domain Sockets provide a mechanism for implementing applications that communicate using the Client-Server architecture. They support both stream and datagram oriented communication, are full-duplex and support a variety of options. They are very widely used for developing many large-scale frameworks.

  • NETLINK SOCKETS Netlink sockets are similar to UNIX Domain Sockets in its API semantics - but used mainly for two purposes:

    For communication between a process in user-space to a thread in kernel-space For communication amongst processes in user-space using broadcast mode.

  • NETWORK SOCKETS Based on the same API semantics like UNIX Domain Sockets, Network Sockets API provide mechanisms for communication between processes that run on different hosts on a network. Linux has rich support for features and various protocol stacks for using network sockets API. For all kinds of network programming and distributed programming - network socket APIs form the core interface.

  • INOTIFY MECHANISMS The Inotify API on Linux provides a method for processes to know of any changes on a monitored file or a directory asynchronously. By adding a file to inotify watch-list, a process will be notified by the kernel on any changes to the file like open, read, write, changes to file stat, deleting a file and so on.

  • FUSE SUBSYSTEM FUSE provides a method to implement a fully functional filesystem in user-space. Various operations on the mounted FUSE filesystem would trigger functions registered by the user-space filesystem handler process. This technique can also be used as an IPC mechanism to implement Client-Server architecture without using socket API semantics.

  • D-BUS SUBSYSTEM D-Bus is a high-level IPC mechanism built generally on top of socket API that provides a mechanism for multiple processes to communicate with each other using various messaging patterns. D-Bus is a standards specification for processes communicating with each other and very widely used today by GUI implementations on Linux following Freedesktop.org specifications.

you can use ipcs command to show IPC facilities information: shared memory segments, message queues, and semaphore arrays.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096 // SHMMNI
max seg size (kbytes) = 32768 // SHMMAX
max total shared memory (kbytes) = 8388608 // SHMALL
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 1024 // SEMMNI
max semaphores per array = 250 // SEMMSL
max semaphores system wide = 256000 // SEMMNS
max ops per semop call = 32 // SEMOPM
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 1024 // MSGMNI
max size of message (bytes) = 65536 // MSGMAX
default max size of queue (bytes) = 65536 // MSGMNB

Also, you can use sysctl command to view kernel parameters:

1
2
#sysctl -a | grep -i shmmni
kernel.shmmni = 4096

or

1
2
#sysctl kernel.shmmni
kernel.shmmni = 4096

Modify Kernel Parameters

From this post Db2 Modify Kernel Parameters.

Modify the kernel parameters that you have to adjust by editing the /etc/sysctl.conf file. If this file does not exist, create it. The following lines are examples of what must be placed into the file:

1
2
3
4
5
6
7
8
kernel.shmmni=4096
kernel.shmmax=17179869184
kernel.shmall=8388608
#kernel.sem=<SEMMSL> <SEMMNS> <SEMOPM> <SEMMNI>
kernel.sem=4096 1024000 250 4096
kernel.msgmni=16384
kernel.msgmax=65536
kernel.msgmnb=65536

Reload settings from the default file /etc/sysctl.conf:

1
sysctl -p

For RedHat The rc.sysinit initialization script reads the /etc/sysctl.conf file automatically after each reboot.

You can also make the change inpermanent, for example:

1
2
sysctl -w kernel.shmmni=4096
sysctl -w kernel.sem="4096 1024000 250 4096"

Or directly wirte it to procfs file:

1
echo "4096 1024000 250 4096" > /proc/sys/kernel/sem

Experience and good learnings from past.

  • Be familiar with the product, ask for quota/env to use it like the customer, learn the customers experience.

  • Need to understand why we need the feature, not in a hurry to jump into the implementation.

  • External dependencies are always risks.

  • Different levels of test are necessary: unit test, functional test, integration test, e2e test and UAT, CUJ test(if it is customer facing product).

  • Considering hare weekly status on your projects with manager/skip manager, so they know what you are working on.

  • Be more proactive/leadership, such as meeting driving to make progress, action items figure out, and confirm the stakeholders understand the role before your absence.

  • Always note down meeting summary and share with stakeholders, link to calendar tab.

  • Know what is going on around you, set bi-weekly sync up with colleagues to understand what are they working on, this could inspire you to get solution on your projects.

  • Always consider buffers on project timeline.

  • Document your projects’ progress, as detailed as possible: time, context, screenshot, link, what have done so far, TODOs. This can help you be better at multitasking.

  • Domain knowledge sharing, contribute to team tech talks.

  • Use engineer survival guide suggestions and staff engineer path.

Great book for all Linux developers and administrators! Just note for future quick revisit!

Chapter1. The Big picture

The most effective way to understand how an operating system works is through abstraction—a fancy way of saying that you can ignore most of the details.

The kernel is software residing in memory that tells the CPU what to do. The kernel manages the hardware and acts primarily as an interface between the hardware and any running program.

Processes—the running programs that the kernel manages—collectively make up the system’s upper level, called user space.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
+-------------------------------------------------------------------------+
| |
| User process |
| |
| +-------------------+ +-----------------+ +---------------+ |
| | GUI | | Servers | | Shell | |
| | | | | | | |
| +-------------------+ +-----------------+ +---------------+ |
| |
+-------------------------------------------------------------------------+


+-------------------------------------------------------------------------+
| |
| Linux kernel |
| +--------------+ +--------------------------+ |
| | system calls| | process management | |
| +--------------+ +--------------------------+ |
| |
| +---------------------+ +-------------------------------+ |
| | device driver | | memory management | |
| +---------------------+ +-------------------------------+ |
+-------------------------------------------------------------------------+

+-------------------------------------------------------------------------+
| |
| Hardware |
| |
| +-------------------+ +-------------------+ +---------------+ |
| | CPU | | RAM | | Disk | |
| +-------------------+ +-------------------+ +---------------+ |
| +---------------------+ |
| | Network | |
| +---------------------+ |
+-------------------------------------------------------------------------+

There is a critical difference between the ways that the kernel and user processes run: The kernel runs in kernel mode, and the user processes run in user mode. Code running in kernel mode has unrestricted access to the processor and main memory. This is a powerful but dangerous privilege that allows a kernel process to easily crash the entire system. The area that only the kernel can access is called kernel space.

User mode, in comparison, restricts access to a (usually quite small) subset of memory and safe CPU operations. User space refers to the parts of main memory that the user processes can access. If a process makes a mistake and crashes, the consequences are limited and can be cleaned up by the kernel. This means that if your web browser crashes, it probably won’t take down the scientific computation that you’ve been running in the background for days.

Hardware

A CPU is just an operator on memory; it reads its instructions and data from the memory and writes data back out to the memory.

You’ll often hear the term state in reference to memory, processes, the kernel, and other parts of a computer system. Strictly speaking, a state is a particular arrangement of bits. For example, if you have four bits in your memory, 0110, 0001, and 1011 represent three different states.

The term image refers to a particular physical arrangement of bits.

Kernel

Nearly everything that the kernel does revolves around main memory. One of the kernel’s tasks is to split memory into many subdivisions, and it must maintain certain state information about those subdivisions at all times. Each process gets its own share of memory, and the kernel must ensure that each process keeps to its share.

The kernel is in charge of managing tasks in four general system areas: Processes. The kernel is responsible for determining which processes are allowed to use the CPU.

Memory. The kernel needs to keep track of all memory—what is currently allocated to a particular process, what might be shared between processes, and what is free.

Device drivers. The kernel acts as an interface between hardware (such as a disk) and processes. It’s usually the kernel’s job to operate the hardware.

System calls and support. Processes normally use system calls to communicate with the kernel.

The act of one process giving up control of the CPU to another process is called a context switch.

The kernel is responsible for context switching. To understand how this works, let’s think about a situation in which a process is running in user mode but its time slice is up. Here’s what happens:

  1. The CPU (the actual hardware) interrupts the current process based on an internal timer, switches into kernel mode, and hands control back to the kernel.
  2. The kernel records the current state of the CPU and memory, which will be essential to resuming the process that was just interrupted.
  3. The kernel performs any tasks that might have come up during the preceding time slice (such as collecting data from input and output, or I/O, operations).
  4. The kernel is now ready to let another process run. The kernel analyzes the list of processes that are ready to run and chooses one.
  5. The kernel prepares the memory for this new process, and then prepares the CPU.
  6. The kernel tells the CPU how long the time slice for the new process will last.
  7. The kernel switches the CPU into user mode and hands control of the CPU to the process.

The context switch answers the important question of when the kernel runs. The answer is that it runs between process time slices during a context switch.

Modern CPUs include a memory management unit (MMU) that enables a memory access scheme called virtual memory. When using virtual memory, a process does not directly access the memory by its physical location in the hardware. Instead, the kernel sets up each process to act as if it had an entire machine to itself. When the process accesses some of its memory, the MMU intercepts the access and uses a memory address map to translate the memory location from the process into an actual physical memory location on the machine. The kernel must still initialize and continuously maintain and alter this memory address map. For example, during a context switch, the kernel has to change the map from the outgoing process to the incoming process.

The implementation of a memory address map is called a page table.

The kernel’s role with devices is pretty simple. A device is typically accessible only in kernel mode because improper access (such as a user process asking to turn off the power) could crash the machine. Another problem is that different devices rarely have the same programming interface, even if the devices do the same thing, such as two different network cards. Therefore, device drivers have traditionally been part of the kernel.

There are several other kinds of kernel features available to user processes. For example, system calls (or syscalls) perform specific tasks that a user process alone cannot do well or at all. For example, the acts of opening, reading, and writing files all involve system calls.

Other than initall user processes on a Linux system start as a result of fork(), and most of the time, you also run exec() to start a new program instead of running a copy of an existing process.

User Space

As mentioned earlier, the main memory that the kernel allocates for user processes is called user space. Because a process is simply a state (or image) in memory, user space also refers to the memory for the entire collection of running processes.

Users

A user is an entity that can run processes and own files. A user is associated with a username. For example, a system could have a user named billyjoe. However, the kernel does not manage the usernames; instead, it identifies users by simple numeric identifiers called userids.

Users exist primarily to support permissions and boundaries.

In addition, as powerful as the root user is, it still runs in the operating system’s user mode, not kernel mode.

Groups are sets of users. The primary purpose of groups is to allow a user to share file access to other users in a group.

Chapter 2. Basic Commands and Directory Hierarchy

Some resources: <<UNIX for the Impatient>> <<Learning the UNIX Operating System>>

The shell is one of the most important parts of a Unix system. A shell is a program that runs commands. The shell also serves as a small programming environment.

Many important parts of the system are actually shell scripts—text files that contain a sequence of shell commands.

There are many different Unix shells, but all derive several of their features from the Bourne shell (/bin/sh), a standard shell developed at Bell Labs for early versions of Unix. Every Unix system needs the Bourne shell in order to function correctly, as you will see throughout this book.

Linux uses an enhanced version of the Bourne shell called bash or the “Bourne-again” shell. The bash shell is the default shell on most Linux distributions, and /bin/sh is normally a link to bash on a Linux system.

cat command: The command is called cat because it performs concatenation when it prints the contents of more than one file.

Pressing CTRL-D on an empty line stops the current standard input entry from the terminal (and often terminates a program). Don’t confuse this with CTRL-C, which terminates a program regardless of its input or output.

Unix filenames do not need extensions and often do not carry them.

shell globs don’t match dot files unless you explicitly use a pattern such as .*. This is why rm -rf ./* doesn’t remove hidden objects.

You can run into problems with globs because .* matches . and .. (the current and parent directories)

The shell can store temporary variables, called shell variables, containing the values of text strings. Shell variables are very useful for keeping track of values in scripts, and some shell variables control the way the shell behaves.

An environment variable is like a shell variable, but it’s not specific to the shell. All processes on Unix systems have environment variable storage. The main difference between environment and shell variables is that the operating system passes all of your shell’s environment variables to programs that the shell runs (for example, the sub-script), whereas shell variables cannot be accessed in the commands that you run.

Assign an environment variable with the shell’s export command. For example, if you’d like to make the $STUFF shell variable into an environment variable, use the following:

1
2
STUFF=123
export STUFF

PATH is a special environment variable that contains the command path (or path for short). A command path is a list of system directories that the shell searches when trying to locate a command.

resource: <<Learning the vi and Vim Editor>>

Some kill process ways.There are many types of signals. The default is TERM, or terminate.

1
2
3
kill -STOP pid
kill -CONT pid
kill -KILL pid # the same as kill -9 pid

To see if you’ve accidentally suspended any processes on your current terminal, run the jobs command.

You can detach a process from the shell and put it in the “background” with the ampersand &. The best way to make sure that a background process doesn’t bother you is to redirect its output (and possibly input).

Some executable files have an s in the owner permissions listing instead of an x. This indicates that the executable is setuid, meaning that when you execute the program, it runs as though the file owner is the user instead of you. Many programs use this setuid bit to run as root in order to get the privileges they need to change system files. One example is the passwd program, which needs to change the /etc/passwd file.

Directories also have permissions. You can list the contents of a directory if it’s readable, but you can only access a file in a directory if the directory is executable. (One common mistake people make when setting the permissions of directories is to accidentally remove the execute permission when using absolute modes.)

You can specify a set of default permissions with the umask (user file-creation mode mask) shell command, which applies a predefined set of permissions to any new file you create. In general, use umask 022 if you want everyone to be able to see all of the files and directories that you create, and use umask 077 if you don’t. (You’ll need to put the umask command with the desired mode in one of your startup files to make your new default permissions apply to later sessions).

How to calculate the umask?

For directories, the base permissions are (rwxrwxrwx) 0777 and for files they are 0666 (rw-rw-rw).

You can simply subtract the umask from the base permissions to determine the final permission for file as follows: 666 – 022 = 644 subtract to get permissions of new file (666-022) : 644 (rw-r–r–)

You can simply subtract the umask from the base permissions to determine the final permission for directory as follows: 777 – 022 = 755 Subtract to get permissions of new directory (777-022) : 755 (rwxr-xr-x)

Another compression program in Unix is bzip2, whose compressed files end with .bz2. While marginally slower than gzip, bzip2 often compacts text files a little more, and it is therefore increasingly popular in the distribution of source code.

The bzip2 compression/decompression option for tar is j:

1
2
tar jcvf xx.bz2 file...
tar jxvf xx.bz2

Linux Directory Hierarchy Essentials

Simplified overview of the hierarchy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
                                                  +---------+
| / |
+-----+---+
|
+-------------+-------------+-----------+----------------------+-----------+-----------+----------+
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
v v v v v v v v v
+---+----+ +---+----+ +----+---+ +----+---+ +----+---+ +---+---+ +----+---+ +----+----+ +---------+
| /bin | | /dev | | /etc | | /usr | | /home | | /lib | | /sbin | | /tmp | | /var |
+--------+ +--------+ +--------+ +----+---+ +--------+ +-------+ +--------+ +---------+ +----+----+
| |
| +----+-----+
| | |
+---------+----------+--------------------+----------+ | |
| | | | | | | |
v v v v v v v v
+----+--+ +---+--+ +----+---+ +-+----+ +---+---+ +---+---+ +----+---+ +---+----+
| bin/ | | man/ | | lib/ | |local/| | sbin/ | | share/| | log/ | | /tmp |
+-------+ +------+ +--------+ +------+ +-------+ +-------+ +--------+ +--------+

  • /bin Contains ready-to-run programs (also known as an executables), including most of the basic Unix commands such as ls and cp. Most of the programs in /bin are in binary format, having been created by a C compiler, but some are shell scripts in modern systems.

  • /dev Contains device files.

  • /etc This core system configuration directory contains the user password, boot, device, networking, and other setup files. Many items in /etc are specific to the machine’s hardware.

  • /home Holds personal directories for regular users.

  • /lib An abbreviation for library, this directory holds library files containing code that executables can use.

  • /proc Provides system statistics through a browsable directory-and-file interface. The /proc directory contains information about currently running processes as well as some kernel parameters.

  • /sys This directory is similar to /proc in that it provides a device and system interface.

  • /sbin The place for system executables. Programs in /sbin directories relate to system management.

  • /tmp A storage area for smaller, temporary files that you don’t care much about. If something is extremely important, don’t put it in /tmp because most distributions clear /tmp when the machine boots and some even remove its old files periodically. Also, don’t let /tmp fill up with garbage because its space is usually shared with something critical

  • /usr Although pronounced “user,” this subdirectory has no user files. Instead, it contains a large directory hierarchy, including the bulk of the Linux system. Many of the directory names in /usr are the same as those in the root directory (like /usr/bin and /usr/lib), and they hold the same type of files. (The reason that the root directory does not contain the complete system is primarily historic—in the past, it was to keep space requirements low for the root.)

  • /var The variable subdirectory, where programs record runtime information. System logging, user tracking, caches, and other files that system programs create and manage are here.

  • /boot Contains kernel boot loader files. These files pertain only to the very first stage of the Linux startup procedure.

  • /media A base attachment point for removable media such as flash drives that is found in many distributions.

  • /opt This may contain additional third-party software.

Kernel Location

On Linux systems, the kernel is normally in /vmlinuz or /boot/vmlinuz. A boot loader loads this file into memory and sets it in motion when the system boots.

Once the boot loader runs and sets the kernel in motion, the main kernel file is no longer used by the running system. However, you’ll find many modules that the kernel can load and unload on demand during the course of normal system operation. Called loadable kernel modules, they are located under /lib/modules.

Chapter 3. Devices

It’s important to understand how the kernel interacts with user space when presented with new devices. The udev system enables user-space programs to automatically configure and use new devices.

udev (userspace /dev) is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory.

Device Files

It is easy to manipulate most devices on a Unix system because the kernel presents many of the device I/O interfaces to user processes as files. These device files are sometimes called device nodes. Not only can a programmer use regular file operations to work with a device, but some devices are also accessible to standard programs like cat. However, not all devices or device capabilities are accessible with standard file I/O.

Device files are in the /dev directory, and running ls /dev reveals more than a few files in /dev.

if run

1
2
3
4
5
ls -l
brw-rw---- 1 root disk 8, 1 Sep 6 08:37 sda1
crw-rw-rw- 1 root root 1, 3 Sep 6 08:37 null
prw-r--r-- 1 root root 0 Mar 3 19:17 fdata
srw-rw-rw- 1 root root 0 Dec 18 07:43 log

if the first char in file mode is b, c, p, or s, the file is a device. These letters stand for block, character, pipe, and socket, respectively.

The numbers before the dates in the first two lines are the major and minor device numbers that help the kernel identify the device. Similar devices usually have the same major number.

Block device

Programs access data from a block device in fixed chunks. The sda1 in the preceding example is a disk device, a type of block device.

Character device

Character devices work with data streams. Printers directly attached to your computer are represented by character devices. It’s important to note that during character device interaction, the kernel cannot back up and reexamine the data stream after it has passed data to a device or process.

Pipe device

Named pipes are like character devices, with another process at the other end of the I/O stream instead of a kernel driver.

Socket device

Sockets are special-purpose interfaces that are frequently used for interprocess communication.

Not all devices have device files because the block and character device I/O interfaces are not appropriate in all cases. For example, network interfaces don’t have device files. It is theoretically possible to interact with a network interface using a single character device, but because it would be exceptionally difficult, the kernel uses other I/O interfaces.

The sysfs Device Path

To provide a uniform view for attached devices based on their actual hardware attributes, the Linux kernel offers the sysfs interface through a system of files and directories. The base path for devices is /sys/devices (this is a real directory!).

1
2
3
4
5
6
7
8
9
10
11
12
13
ls -ltr /sys/devices/

total 0
drwxr-xr-x 21 root root 0 Apr 25 23:18 virtual
drwxr-xr-x 3 root root 0 Apr 25 23:18 tracepoint
drwxr-xr-x 10 root root 0 Apr 25 23:18 system
drwxr-xr-x 3 root root 0 Apr 25 23:18 software
drwxr-xr-x 8 root root 0 Apr 25 23:18 pnp0
drwxr-xr-x 9 root root 0 Apr 25 23:18 platform
drwxr-xr-x 15 root root 0 Apr 25 23:18 pci0000:00
drwxr-xr-x 5 root root 0 Apr 25 23:18 msr
drwxr-xr-x 6 root root 0 Apr 25 23:18 LNXSYSTM:00
drwxr-xr-x 3 root root 0 Apr 25 23:18 breakpoint

The /dev file is there so that user processes can use the device, whereas the /sys/devices path is used to view information and manage the device. In /dev you can run:

1
2
3
4
5
6
7
8
9
10
udevadm info --query=all --name=/dev/null

P: /devices/virtual/mem/null
N: null
E: DEVMODE=0666
E: DEVNAME=/dev/null
E: DEVPATH=/devices/virtual/mem/null
E: MAJOR=1
E: MINOR=3
E: SUBSYSTEM=mem

this command will show the sysfs location /devices/virtual/mem/null

dd and Devices

The program dd is extremely useful when working with block and character devices. This program’s sole function is to read from an input file or stream and write to an output file or stream, possibly doing some encoding conversion on the way.

I am not using it.

Device Name Summary

Not necessarily as described below, may be some variations:

  • Hard Disks: /dev/sd*

Most hard disks attached to current Linux systems correspond to device names with an sd prefix, such as /dev/sda, /dev/sdb, and so on. These devices represent entire disks; the kernel makes separate device files, such as /dev/sda1 and /dev/sda2, for the partitions on a disk.

The sd portion of the name stands for SCSI disk.

Linux assigns devices to device files in the order in which its drivers encounter devices. This may cause problem when you remove one disk and insert another, because the device name changed for old disk. Most modern Linux systems use the Universally Unique Identifier (UUID) for persistent disk device access.

  • CD and DVD Drives: /dev/sr*

Linux recognizes most optical storage drives as the SCSI devices /dev/sr0, /dev/sr1, and so on.

  • PATA Hard Disks: /dev/hd*

  • Terminals: /dev/tty*, /dev/pts/*, and /dev/tty

Terminals are devices for moving characters between a user process and an I/O device, usually for text output to a terminal screen.

Pseudoterminal devices are emulated terminals that understand the I/O features of real terminals.

Two common terminal devices are /dev/tty1 (the first virtual console) and /dev/pts/0 (the first pseudoterminal device). The /dev/tty device is the controlling terminal of the current process.

teletypewriter, tty in shorthand

I am always confused, at least you need to know shell is the command line interpreter! What is the difference between Terminal, Console, Shell, and Command Line?

Linux has two primary display modes: text mode and an X Window System server (graphics mode, usually via a display manager). Although Linux systems traditionally booted in text mode, most distributions now use kernel parameters and interim graphical display mechanisms to completely hide text mode as the system is booting. In such cases, the system switches over to full graphics mode near the end of the boot process.

OK, skip rest of the content in Chapter 3.

Chapter 4. Disks and Filesystems

Schematic of a typical Linux disk:

Partitions are subdivisions of the whole disk. On Linux, they’re denoted with a number after the whole block device, and therefore have device names such as /dev/sda1 and /dev/sdb3.

Partitions are defined on a small area of the disk called a partition table.

The next layer after the partition is the filesystem, the database of files and directories that you’re accustomed to interacting with in user space.

To access data on a disk, the Linux kernel uses the system of layers like this:

Notice that you can work with the disk through the filesystem as well as directly through the disk devices.

Partitioning Disk Devices

You can view RedHat Doc for more information about partition

Let’s view the partition table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
parted -l

Model: ATA WDC WD3200AAJS-2 (scsi)
Disk /dev/sda: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 1049kB 316GB 316GB primary ext4 boot
2 316GB 320GB 4235MB extended
5 316GB 320GB 4235MB logical linux-swap(v1)

Model: FLASH Drive UT_USB20 (scsi)
Disk /dev/sdf: 4041MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.4kB 1000MB 1000MB myfirst
2 1000MB 4040MB 3040MB mysecond

There are 2 different partition tables: MBR (msdos) and GPT (gpt). The MBR table in this example contains primary, extended, and logical partitions.

Changing Partition Tables

You can use parted command to change partition. Check /proc/partitions can get full partition information.

1
2
3
4
5
6
7
8
cat /proc/partitions

major minor #blocks name
252 0 262144000 vda
252 1 1048576 vda1
252 2 260976640 vda2
253 0 252706816 dm-0
253 1 8257536 dm-1

Filesystems

The last link between the kernel and user space for disks is typically the file-system; this is what you’re accustomed to interacting with when you run commands such as ls and cd. As previously mentioned, the filesystem is a form of database; it supplies the structure to transform a simple block device into the sophisticated hierarchy of files and subdirectories that users can understand.

Filesystem Types

  • The Fourth Extended filesystem (ext4) is the current iteration of a line of filesystems native to Linux. The Second Extended filesystem (ext2) was a longtime default for Linux systems inspired by traditional Unix filesystems such as the Unix File System (UFS) and the Fast File System (FFS). The Third Extended filesystem (ext3) added a journal feature (a small cache outside the normal filesystem data structure) to enhance data integrity and hasten booting. The ext4 filesystem is an incremental improvement with support for larger files than ext2 or ext3 support and a greater number of subdirectories.

Create a Filesystems

Once you’re done with the partitioning process, you’re ready to create filesystems. As with partitioning, you’ll do this in user space because a user-space process can directly access and manipulate a block device.

For example, you can create an ext4 partition on /dev/sdf2

1
mkfs -t ext4 /dev/sdf2

Filesystem creation is a task that you should only need to perform after adding a new disk or repartitioning an old one. You should create a filesystem just once for each new partition that has no preexisting data (or that has data that you want to remove). Creating a new filesystem on top of an existing filesystem will effectively destroy the old data.

It turns out that mkfs is only a frontend for a series of filesystem creation programs:

1
2
3
4
5
6
7
8
9
ls -l /sbin/mkfs.*

-rwxr-xr-x. 1 root root 375240 Mar 7 2017 /sbin/mkfs.btrfs
-rwxr-xr-x 1 root root 37080 Jul 12 2018 /sbin/mkfs.cramfs
-rwxr-xr-x 4 root root 96384 Apr 10 2018 /sbin/mkfs.ext2
-rwxr-xr-x 4 root root 96384 Apr 10 2018 /sbin/mkfs.ext3
-rwxr-xr-x 4 root root 96384 Apr 10 2018 /sbin/mkfs.ext4
-rwxr-xr-x 1 root root 37184 Jul 12 2018 /sbin/mkfs.minix
-rwxr-xr-x. 1 root root 368504 Feb 27 2018 /sbin/mkfs.xfs

Mounting a Filesystem

On Unix, the process of attaching a filesystem is called mounting. When the system boots, the kernel reads some configuration data and mounts root (/) based on the configuration data.

When mounting a filesystem, the common terminology is mount a device on a mount point.

To see current system mount status:

1
2
3
4
5
6
7
8
9
mount 
...
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
/dev/mapper/rhel-root on / type xfs (rw,relatime,attr2,inode64,noquota)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/vda1 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=800956k,mode=700)
...

There are 3 key fields:

  • The filesystem’s device, such as a disk partition; where the actual file-system data resides
  • The filesystem type
  • The mount point—that is, the place in the current system’s directory hierarchy where the filesystem will be attached.

For example, to mount the Fourth Extended filesystem /dev/sdf2 on /home/extra, use this command:

1
mount -t ext4 /dev/sdf2 /home/extra

To unmount (detach) a filesystem, use the umount command:

1
umount mountpoint

Filesystem UUID

You can identify and mount filesystems by their Universally Unique Identifier (UUID), a software standard. The UUID is a type of serial number, and each one should be different.

For example, if you know the UUID of /dev/sdf2 is a9011c2b-1c03-4288-b3fe-8ba961ab0898, so you can mount it as:

1
mount UUID=a9011c2b-1c03-4288-b3fe-8ba961ab0898 /home/extra

Here no -t ext4 option, because mount know that.

To view a list of devices and the corresponding filesystems and UUIDs on your system, use the blkid (block ID) program:

1
2
3
4
5
6
blkid

/dev/sdf2: UUID="a9011c2b-1c03-4288-b3fe-8ba961ab0898" TYPE="ext4"
/dev/sda1: UUID="70ccd6e7-6ae6-44f6-812c-51aab8036d29" TYPE="ext4"
/dev/sda5: UUID="592dcfd1-58da-4769-9ea8-5f412a896980" TYPE="swap"
/dev/sde1: SEC_TYPE="msdos" UUID="3762-6138" TYPE="vfat"

For one thing, they’re the preferred way to automatically mount filesystems in /etc/fstab at boot time.

Disk Buffering, Caching, and Filesystems

Linux, like other versions of Unix, buffers writes to the disk. This means that the kernel usually doesn’t immediately write changes to filesystems when processes request changes. Instead it stores the changes in RAM until the kernel can conveniently make the actual change to the disk. This buffering system is transparent to the user and improves performance.

This is the reason why before we remove the USB, we need to unmount it in case of data lose.

When you unmount a filesystem with umount, the kernel automatically synchronizes with the disk. At any other time, you can force the kernel to write the changes in its buffer to the disk by running the sync command.

The /etc/fstab Filesystem Table

I encounter this when write /etc/fstab file with NFS when developing k8s.

1
2
3
4
5
/dev/mapper/rhel-root   /                       xfs     defaults        0 0
UUID=a44461e9-e1d7-45fd-a387-255fafd14746 /boot xfs defaults 0 0
/dev/mapper/rhel-swap swap swap defaults 0 0
halos1.fyre.ibm.com:/data /mnt nfs defaults,timeo=10,retrans=3,rsize=1048576,wsize=1048576 0 0

To mount filesystems at boot time and take the drudgery out of the mount command, Linux systems keep a permanent list of filesystems and options in /etc/fstab.

  • The device or UUID. Most current Linux systems no longer use the device in /etc/fstab, preferring the UUID.

  • The mount point. Indicates where to attach the filesystem.

  • The filesystem type.

  • Options. Use long mount options separated by commas.

  • Backup information for use by the dump command. You should always use a 0 in this field.

  • The filesystem integrity test order. To ensure that fsck always runs on the root first, always set this to 1 for the root filesystem and 2 for any other filesystems on a hard disk. Use 0 to disable the bootup check for everything else, including CD-ROM drives, swap, and the /proc file-system

You can also try to mount all entries at once in /etc/fstab that do not contain the noauto option with this command:

1
mount -a

Let’s see some commonly use options:

  • defaults. This uses the mount defaults: read-write mode, enable device files, executables, the setuid bit, and so on. Use this when you don’t want to give the filesystem any special options but you do want to fill all fields in /etc/fstab.

  • noauto. This option tells a mount -a command to ignore the entry.

Filesystem Capacity

To view the size and utilization of your currently mounted filesystems, use the df command.

1
2
3
4
5
6
7
8
9
10
df -BM

Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/rhel-root 245640M 75636M 170005M 31% /
devtmpfs 7931M 0M 7931M 0% /dev
tmpfs 7943M 0M 7943M 0% /dev/shm
tmpfs 7943M 835M 7109M 11% /run
tmpfs 7943M 0M 7943M 0% /sys/fs/cgroup
/dev/vda1 1014M 183M 832M 19% /boot
...

Checking and Repairing Filesystems

Filesystem errors are usually due to a user shutting down the system in a rude way (for example, by pulling out the power cord). In such cases, the filesystem cache in memory may not match the data on the disk, and the system also may be in the process of altering the filesystem when you happen to give the computer a kick. Although a new generation of filesystems supports journals to make filesystem corruption far less common, you should always shut the system down properly. And regardless of the filesystem in use, filesystem checks are still necessary every now and to maintain sanity.

The tool to check a filesystem is fsck.

In the worst cases, you can try:

  • You can try to extract the entire filesystem image from the disk with dd and transfer it to a partition on another disk of the same size.

  • You can try to patch the filesystem as much as possible, mount it in read-only mode, and salvage what you can.

  • You can try debugfs.

Special-Purpose Filesystems

Not all filesystems represent storage on physical media. Specifically, most versions of Unix have filesystems that serve as system interfaces. That is, rather than serving only as a means to store data on a device, a filesystem can represent system information such as process IDs and kernel diagnostics.

The special filesystem types in common use on Linux include the following:

  • proc. Mounted on /proc. The name proc is actually an abbreviation for process. Each numbered directory inside /proc is actually the process ID of a current process on the system; the files in those directories represent various aspects of the processes. The file /proc/self represents the current process.

  • sysfs. Mounted on /sys.

  • tmpfs. Mounted on /run and other locations. With tmpfs, you can use your physical memory and swap space as temporary storage, stored in volatile memory instead of a persistent storage device.

Swap Space

Not every partition on a disk contains a filesystem. It’s also possible to augment the RAM on a machine with disk space. The disk area used to store memory pages is called swap space (or just swap for short).

you can use free command to see the swap usage:

1
2
3
4
5
free -m

total used free shared buff/cache available
Mem: 32010 10894 3992 1605 17123 18811
Swap: 8063 64 7999

you can use a disk partition and a regular file as swap space, for disk:

  1. Ensure partition is empty
  2. Run mkswap dev, dev is the partition device
  3. Execute swapon dev to register the space with the kernel.
  4. Register in /etc/fstab file

Use these commands to create an empty file, initialize it as swap, and add it to the swap pool:

1
2
3
dd if=/dev/zero of=swap_file bs=1024k count=num_mb
mkswap swap_file
swapon swap_file

Here, swap_file is the name of the new swap file, and num_mb is the desired size, in megabytes.

To remove a swap partition or file from the kernel’s active pool, use the swapoff command.

Note some administrators configure certain systems with no swap space at all. For example, high-performance network servers should never dip into swap space and should avoid disk access if at all possible.

It’s dangerous to do this on a general-purpose machine. If a machine completely runs out of both real memory and swap space, the Linux kernel invokes the out-of-memory (OOM) killer to kill a process in order to free up some memory. You obviously don’t want this to happen to your desktop applications. On the other hand, high-performance servers include sophisticated monitoring and load-balancing systems to ensure that they never reach the danger zone.

Looking Forward: Disks and User Space

In disk-related components on a Unix system, the boundaries between user space and the kernel can be difficult to characterize. As you’ve seen, the kernel handles raw block I/O from the devices, and user-space tools can use the block I/O through device files. However, user space typically uses the block I/O only for initializing operations such as partitioning, file-system creation, and swap space creation.

In normal use, user space uses only the filesystem support that the kernel provides on top of the block I/O.

Chapter 5. How the Linux Kernel Boots

You’ll learn how the kernel moves into memory up to the point where the first user process starts.

A simplified view of the boot process looks like this:

  1. The machine’s BIOS or boot firmware loads and runs a boot loader.
  2. The boot loader finds the kernel image on disk, loads it into memory, and starts it.
  3. The kernel initializes the devices and its drivers.
  4. The kernel mounts the root filesystem.
  5. The kernel starts a program called init with a process ID of 1. This point is the user space start.
  6. init sets the rest of the system processes in motion.
  7. At some point, init starts a process allowing you to log in, usually at the end or near the end of the boot.

Startup Messages

There are two ways to view the kernel’s boot and runtime diagnostic messages:

  • Look at the kernel system log file. You’ll often find this in /var/log/ kern.log, but depending on how your system is configured, it might also be lumped together with a lot of other system logs in /var/log/messages or elsewhere.

  • Use the dmesg command, but be sure to pipe the output to less because there will be much more than a screen’s worth. The dmesg command uses the kernel ring buffer, which is of limited size, but most newer kernels have a large enough buffer to hold boot messages for a long time.

Kernel Initialization and Boot Options

Upon startup, the Linux kernel initializes in this general order:

  1. CPU inspection
  2. Memory inspection
  3. Device bus discovery
  4. Device discovery
  5. Auxiliary kernel subsystem setup (networking, and so on)
  6. Root filesystem mount
  7. User space start

The following memory management messages are a good indication that the user-space handoff is about to happen because this is where the kernel protects its own memory from user-space processes:

1
2
3
4
[    0.972934] Freeing unused kernel memory: 1844k freed
[ 0.973411] Write protecting the kernel read-only data: 12288k
[ 0.975623] Freeing unused kernel memory: 832k freed
[ 0.977405] Freeing unused kernel memory: 676k freed

Kernel Parameters

I just encountered an issue about kernel parameters for Db2… Let’s see.

When running the Linux kernel, the boot loader passes in a set of text-based kernel parameters that tell the kernel how it should start. The parameters specify many different types of behavior, such as the amount of diagnostic output the kernel should produce and device driver–specific options.

You can view the kernel parameters from your system’s boot by looking at the /proc/cmdline file:

1
2
BOOT_IMAGE=/vmlinuz-3.10.0-862.14.4.el7.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet elevator=noop LANG=en_US.UTF-8

The root=/dev/mapper/rhel-root is where root filesystem resides.

Boot Loader

Other boot loader intro

At the start of the boot process, before the kernel and init start, a boot loader starts the kernel. The task of a boot loader sounds simple: It loads the kernel into memory, and then starts the kernel with a set of kernel parameters.

Kernel and its parameters are usually somewhere on the root filesystem.

On PCs, boot loaders use the Basic Input/Output System (BIOS) or Unified Extensible Firmware Interface (UEFI) to access disks. Nearly all disk hardware has firmware that allows the BIOS to access attached storage hardware with Linear Block Addressing (LBA). Although it exhibits poor performance, this mode of access does allow universal access to disks. Boot loaders are often the only programs to use the BIOS for disk access; the kernel uses its own high-performance drivers.

Most modern boot loaders can read partition tables and have built-in support for read-only access to filesystems.

Boot loader tasks

  1. Select among multiple kernels.
  2. Switch between sets of kernel parameters.
  3. Allow the user to manually override and edit kernel image names and parameters
  4. Provide support for booting other operating systems.

Boot loader typres

  • GRUB. A near-universal standard on Linux systems (mainly talks about this)
  • LILO. One of the first Linux boot loaders.
  • LOADLIN. Boots a kernel from MS-DOS

GRUB Introduction

GRUB stands for Grand Unified Boot Loader. We’ll cover GRUB 2.

This section talks about GRUB menu and look into some boot options, actually, if you check /boot directory, you will see kernel image file and initial RAM filesystem:

1
2
3
4
5
6
7
...
-rwxr-xr-x. 1 root root 6381872 Mar 21 2018 vmlinuz-3.10.0-862.el7.x86_64
-rw-r--r--. 1 root root 304926 Mar 21 2018 symvers-3.10.0-862.el7.x86_64.gz
drwx------. 5 root root 97 Oct 1 2018 grub2
-rw------- 1 root root 21096334 Oct 1 2018 initramfs-3.10.0-862.9.1.el7.x86_64.img
...

Not interested in the rest of the content in this chapter.

Chapter 6. How User Space Starts

The point where the kernel starts its first user-space process, init, is significant—not just because that’s where the memory and CPU are finally ready for normal system operation, but because that’s where you can see how the rest of the system builds up as a whole.

User space is far more modular. It’s much easier to see what goes into the user space startup and operation.

User space starts in roughly this order:

  1. init
  2. Essential low-level services such as udevd and syslogd
  3. Network configuration
  4. Mid- and high-level services (cron, printing, and so on)
  5. Login prompts, GUIs, and other high-level applications

Introduction to init

wiki init

The init program is a user-space program like any other program on the Linux system, and you’ll find it in /sbin along with many of the other system binaries. Its main purpose is to start and stop the essential service processes on the system, but newer versions have more responsibilities.

In my vm /sbin directory:

1
lrwxrwxrwx  1 root root          22 Oct  1  2018 init -> ../lib/systemd/systemd

There are three major implementations of init in Linux distributions:

  • System V init. A traditional sequenced init (Sys V, usually pronounced “sys-five”). Red Hat Enterprise Linux and several other distributions use this version.
  • systemd. The emerging standard for init. Many distributions have moved to systemd, and most that have not yet done so are planning to move to it.
  • Upstart. The init on Ubuntu installations. However, as of this writing, Ubuntu has also planned to migrate to systemd.

There are many different implementations of init because System V init and other older versions relied on a sequence that performed only one startup task at a time. systemd and Upstart attempt to remedy the performance issue by allowing many services to start in parallel thereby speeding up the boot process.

System V Runlevels

wiki Runlevel

At any given time on a Linux system, a certain base set of processes is running. In System V init, this state of the machine is called its runlevel, which is denoted by a number from 0 through 6. A system spends most of its time in a single runlevel, but when you shut the machine down, init switches to a different runlevel in order to terminate the system services in an orderly fashion and to tell the kernel to stop.

You can check your system’s runlevel with the who -r command:

1
2
3
who -r

run-level 3 2019-04-17 13:49

Runlevels serve various purposes, but the most common one is to distinguish between system startup, shutdown, single-user mode, and console mode states.

But runlevels are becoming a thing of the past. Even though all three init versions in this book support them, systemd and Upstart consider runlevels obsolete as end states for the system.

Identifying Your init

  • If your system has /usr/lib/systemd and /etc/systemd directories, you have systemd.

  • If you have an /etc/init directory that contains several .conf files, you’re probably running Upstart

  • If neither of the above is true, but you have an /etc/inittab file, you’re probably running System V init.

Here I focus on systemd

systemd

The systemd init is one of the newest init implementations on Linux. In addition to handling the regular boot process, systemd aims to incorporate a number of standard Unix services such as cron and inetd. One of its most significant features is its ability to defer the start of services and operating system features until they are necessary.

Let’s outline what happens when systemd runs at boot time:

  1. systemd loads its configuration.
  2. systemd determines its boot goal, which is usually named default.target.
  3. systemd determines all of the dependencies of the default boot goal, dependencies of these dependencies, and so on.
  4. systemd activates the dependencies and the boot goal.
  5. After boot, systemd can react to system events (such as uevents) and activate additional components.

Units and Unit Types

One of the most interesting things about systemd is that it does not just operate processes and services; it can also mount filesystems, monitor network sockets, run timers, and more. Each type of capability is called a unit type, and each specific capability is called a unit. When you turn on a unit, you activate it.

The default boot goal is usually a target unit that groups together a number of service and mount units as dependencies.

understand systemd unit and unit files

systemd Dependencies

To accommodate the need for flexibility and fault tolerance, systemd offers a myriad of dependency types and styles:

  • Requires Strict dependencies. When activating a unit with a Requires dependency unit, systemd attempts to activate the dependency unit. If the dependency unit fails, systemd deactivates the dependent unit.

  • Wants. Dependencies for activation only. Upon activating a unit, systemd activates the unit’s Wants dependencies, but it doesn’t care if those dependencies fail.

  • Requisite. Units that must already be active.

  • Conflicts. Negative dependencies. When activating a unit with a Conflict dependency, systemd automatically deactivates the dependency if it is active.

There are many other dependency syntax, like ordering, conditional, etc…

systemd Configuration

The systemd configuration files are spread among many directories across the system, so you typically won’t find the files for all of the units on a system in one place.

That said, there are two main directories for systemd configuration: the system unit directory (globally configured, usually /usr/lib/systemd/system) and a system configuration directory (local definitions, usually /etc/systemd/system).

Note: Avoid making changes to the system unit directory because your distribution will maintain it for you. Make your local changes to the system configuration directory.

To see the system unit and configuration directories on your system, use the following commands:

1
pkg-config systemd --variable=systemdsystemunitdir

Let’s see Unit files in /usr/lib/systemd/system, there is a sshd.service file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target sshd-keygen.service
Wants=sshd-keygen.service

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/sshd
ExecStart=/usr/sbin/sshd -D $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

The [Unit] section gives some details about the unit and contains description and dependency information.

You’ll find the details about the service in the [Service] section, including how to prepare, start, and reload the service.

During normal operation, systemd ignores the [Install] section. However, consider the case when sshd.service is disabled on your system and you would like to turn it on. When you enable a unit, systemd reads the [Install] section.

The [Install] section is usually responsible for the the .wants and .requires directories in the system configuration directory (/etc/systemd/system), see:

1
2
3
4
basic.target.wants                                       getty.target.wants           remote-fs.target.wants
default.target local-fs.target.wants sockets.target.wants
default.target.wants multi-user.target.wants sysinit.target.wants
dev-virtio\x2dports-org.qemu.guest_agent.0.device.wants network-online.target.wants system-update.target.wants

the $OPTIONS in unit file is the variable, also specifier is another variable-like feature often found in unit files, like %n and %H.

systemd Operation

You’ll interact with systemd primarily through the systemctl command, which allows you to activate and deactivate services, list status, reload the configuration, and much more.

List of active units:

1
2
3
4
5
6
7
8
systemctl

UNIT LOAD ACTIVE SUB DESCRIPTION
...
sys-kernel-debug.mount loaded active mounted Debug File System
var-lib-nfs-rpc_pipefs.mount loaded active mounted RPC Pipe File System
brandbot.path loaded active waiting Flexible branding
...

List all units, includes inactives:

1
systemctl --all

Get status of a unit:

1
systemctl status sshd.service

To activate, deactivate, and restart units, use the systemd start, stop, and restart commands. However, if you’ve changed a unit configuration file, you can tell systemd to reload the file in one of two ways:

1
systemctl reload unit #Reloads just the configuration for unit.
1
systemctl daemon-reload #Reloads all unit configurations.

systemd Process Tracking and Synchronization

systemd wants a reasonable amount of information and control over every process that it starts. The main problem that it faces is that a service can start in different ways; it may fork new instances of itself or even daemonize and detach itself from the original process.

To minimize the work that a package developer or administrator needs to do in order to create a working unit file, systemd uses control groups (cgroups), an optional Linux kernel feature that allows for finer tracking of a process hierarchy.

systemd On-Demand and Resource-Parallelized Startup

One of systemd’s most significant features is its ability to delay a unit startup until it is absolutely needed.

systemd Auxiliary Programs

When starting out with systemd, you may notice the exceptionally large number of programs in /lib/systemd. These are primarily support programs for units. For example, udevd is part of systemd, and you’ll find it there as systemd-udevd. Another, the systemd-fsck program, works as a middleman between systemd and fsck.

Shutting Down Your System

init controls how the system shuts down and reboots. The commands to shut down the system are the same regardless of which version of init you run. The proper way to shut down a Linux machine is to use the shutdown command.

to shutdown machine immediately:

1
shutdown -h now

to reboot the machine now:

1
shutdown -r now

When system shutdown time finally arrives, shutdown tells init to begin the shutdown process. On systemd, it means activating the shutdown units; and on System V init, it means changing the runlevel to 0 or 6.

The Initial RAM Filesystem

The initramfs is in /boot directory.

1
2
3
4
5
6
7
8
ls -ltr | grep init

-rw-------. 1 root root 55376391 Apr 13 2018 initramfs-0-rescue-e57cfe9136e9430587366e04f14195e1.img
-rw-------. 1 root root 13131435 Apr 13 2018 initramfs-3.10.0-862.el7.x86_64kdump.img
-rw------- 1 root root 21098233 Jul 23 2018 initramfs-3.10.0-862.el7.x86_64.img
-rw------- 1 root root 21134858 Oct 1 2018 initramfs-3.10.0-862.14.4.el7.x86_64.img
-rw------- 1 root root 21096334 Oct 1 2018 initramfs-3.10.0-862.9.1.el7.x86_64.img

The problem stems from the availability of many different kinds of storage hardware. Remember, the Linux kernel does not talk to the PC BIOS or EFI interfaces to get data from disks, so in order to mount its root file-system, it needs driver support for the underlying storage mechanism.

The workaround is to gather a small collection of kernel driver modules along with a few other utilities into an archive. The boot loader loads this archive into memory before running the kernel.

Chapter 7. System Configuration

When you first look in the /etc directory, you might feel a bit overwhelmed. Although most of the files that you see affect a system’s operations to some extent, a few are fundamental.

The Structure of /etc

Most system configuration files on a Linux system are found in /etc. Historically, each program had one or more configuration files there, and because there are so many packages on a Unix system, /etc would accumulate files quickly.

The trend for many years now has been to place system configuration files into subdirectories under /etc. There are still a few individual configuration files in /etc, but for the most part, if you run ls -F /etc, you’ll see that most of the items there are now subdirectories.

What kind of configuration files are found in /etc? The basic guideline is that customizable configurations for a single machine. And you’ll often find that noncustomizable system configuration files may be found elsewhere, as with the prepackaged systemd unit files in /usr/lib/systemd.

System Logging

Most system programs write their diagnostic output to the syslog service. The traditional syslogd daemon waits for messages and, depending on the type of message received, funnels the output to a file, the screen, users, or some combination of these, or just ignores it.

The System Logger

Most Linux distributions run a new version of syslogd called rsyslogd that does much more than simply write log messages to files. For example, in my vm:

1
2
3
4
5
6
systemctl status rsyslog

rsyslog.service - System Logging Service
Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-04-17 13:49:14 PDT; 2 weeks 6 days ago
...

Many of the files in /var/log aren’t maintained by the system logger. The only way to know for sure which ones belong to rsyslogd is to look at its configuration file.

Configuration Files

The base rsyslogd configuration file is /etc/rsyslog.conf, but you’ll find certain configurations in other directories, such as /etc/rsyslog.d.

It talks about the syntax in the configuration file: The configuration format is a blend of traditional rules and rsyslog-specific extensions. One rule of thumb is that anything beginning with a dollar sign ($) is an extension.

User Management Files

Unix systems allow for multiple independent users. At the kernel level, users are simply numbers (user IDs).

The /etc/passwd File

The plaintext file /etc/passwd maps usernames to user IDs.

1
2
3
4
5
6
7
8
9
10
11
root:x:0:0:Superuser:/root:/bin/sh
######
daemon:*:1:1:daemon:/usr/sbin:/bin/sh
#or
daemon:x:2:2:daemon:/sbin:/sbin/nologin
######
bin:*:2:2:bin:/bin:/bin/sh
sys:*:3:3:sys:/dev:/bin/sh
nobody:*:65534:65534:nobody:/home:/bin/false
juser:x:3119:1000:J. Random User:/home/juser:/bin/bash
beazley:x:143:1000:David Beazley:/home/beazley:/bin/bash

The fields are as follows:

  • The username.

  • The user’s encrypted password. On most Linux systems, the password is not actually stored in the passwd file, but rather, in the shadow file . Normal users do not have read permission for shadow. The second field in passwd or shadow is the encrypted password, Unix passwords are never stored as clear text.

  • An x in the second passwd file field indicates that the encrypted password is stored in the shadow file. A * indicates that the user cannot log in, and if the field is blank (that is, you see two colons in a row, like ::), no password is required to log in. (Beware of blank passwords. You should never have a user without a password.)

  • The user ID (UID), which is the user’s representation in the kernel.

  • The group ID (GID). This should be one of the numbered entries in the /etc/group file. Groups determine file permissions and little else. This group is also called the user’s primary group.

  • The user’s real name. You’ll sometimes find commas in this field, denoting room and telephone numbers.

  • The user’s home directory.

  • The user’s shell (the program that runs when the user runs a terminal session).

Special Users

The superuser (root) always has UID 0 and GID 0. Some users, such as daemon, have no login privileges. The nobody user is an underprivileged user. Some processes run as nobody because the nobody user cannot write to anything on the system.

The users that cannot log in are called pseudo-users. Although they can’t log in, the system can start processes with their user IDs. Pseudo-users such as nobody are usually created for security reasons.

The /etc/shadow File

The shadow password file /etc/shadow on a Linux system normally contains user authentication information, including the encrypted passwords and password expiration information that correspond to the users in /etc/passwd.

Regular users interact with /etc/passwd using the passwd command. By default, passwd changes the user’s password. The passwd command is an suid-root program, because only the superuser can change the /etc/passwd file.

1
-rwsr-xr-x. 1 root root 27832 Jan 29  2014 /usr/bin/passwd

in /etc/shells file have multiple shell types:

1
2
3
4
5
6
7
8
/bin/sh
/bin/bash
/sbin/nologin
/usr/bin/sh
/usr/bin/bash
/usr/sbin/nologin
/bin/ksh
/bin/rksh

Because /etc/passwd is plaintext, the superuser may use any text editor to make changes. To add a user, simply add an appropriate line and create a home directory for the user; to delete, do the opposite. However, to edit the file, you’ll most likely want to use the vipw program

Use adduser and userdel to add and remove users. Run passwd user as the superuser.

Working with Groups

Groups in Unix offer a way to share files with certain users but deny access to all others. The idea is that you can set read or write permission bits for a particular group, excluding everyone else.

The /etc/group file defines the group IDs:

1
2
3
4
5
6
root:*:0:juser
daemon:*:1:
bin:*:2:
disk:*:6:juser,beazley
nogroup:*:65534:
user:*:1000:
  • The group name.

  • The group password. This is hardly ever used, nor should you use it. Use * or any other default value.

  • The group ID (a number). The GID must be unique within the group file. This number goes into a user’s group field in that user’s /etc/passwd entry.

  • An optional list of users that belong to the group. In addition to the users listed here, users with the corresponding group ID in their passwd file entries also belong to the group.

Linux distributions often create a new group for each new user added, with the same name as the user.

Setting the Time

Unix machines depend on accurate timekeeping. The kernel maintains the system clock, which is the clock that is consulted when you run commands like date.

PC hardware has a battery-backed real-time clock (RTC). The RTC isn’t the best clock in the world, but it’s better than nothing. The kernel usually sets its time based on the RTC at boot time, and you can reset the system clock to the current hardware time with hwclock.

You should not try to fix the time drift with hwclock because time-based system events can get lost or mangled. Usually it’s best to keep your system time correct with a network time daemon.

Network Time

If your machine is permanently connected to the Internet, you can run a Network Time Protocol (NTP) daemon to maintain the time using a remote server. Many distributions have built-in support for an NTP daemon, but it may not be enabled by default. You might need to install an ntpd package to get it to work.

Scheduling Recurring Tasks with cron

The Unix cron service runs programs repeatedly on a fixed schedule. Most experienced administrators consider cron to be vital to the system because it can perform automatic system maintenance. For example, cron runs log file rotation utilities to ensure that your hard drive doesn’t fill up with old log files. You should know how to use cron because it’s just plain useful.

Also see cronjob in k8s doc.

You can run any program with cron at whatever times suit you. The program running through cron is called a cron job. To install a cron job, you’ll create an entry line in your crontab file, usually by running the crontab command.

for example:

1
15 09 * * * /home/juser/bin/spmake
  • Minute (0 through 59). The cron job above is set for minute 15.

  • Hour (0 through 23). The job above is set for the ninth hour.

  • Day of month (1 through 31).

  • Month (1 through 12).

  • Day of week (0 through 7). The numbers 0 and 7 are Sunday.

A * in any field means to match every value. The preceding example runs spmake daily because the day of month, month, and day of week fields are all filled with stars, which cron reads as “run this job every day, of every month, of every week.”

also can be 5th and the 14th day of each month:

1
15 09 5,14 * * /home/juser/bin/spmake

Installing Crontab Files

Each user can have his or her own crontab file, which means that every system may have multiple crontabs, usually found in /var/spool/cron/ folder. the crontab command installs, lists, edits, and removes a user’s crontab.

The easiest way to install a crontab is to put your crontab entries into a file and then use crontab file to install file as your current crontab.

Actually, there is a default place for every user crontab file includes root. Once you create a crontab file for the user, the corresponding folder is put under /var/spool/cron/`.

For example, run as root, I want to set a recurring task for user dsadm:

1
crontab -u dsadm -e

Then edit like this:

1
00 21 * * * /home/dsadm/test.sh > /tmp/cron-log 2>&1

after run the job, go to /tmp folder you will see the log file.

to list the dsadm cron job:

1
crontab -l -u dsadm

to remove cron job for dsadm

1
crontab -r -u dsadm

System Crontab Files

Linux distributions normally have an /etc/crontab file. You can also edit here, but the format is a little bit difference:

1
2
3
4
5
6
7
8
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed

Understanding User IDs and User Switching

We’ve discussed how setuid programs such as sudo and su allow you to change users:

1
2
---s--x--x 1 root root 143248 May 28  2018 /usr/bin/sudo
-rwsr-xr-x 1 root root 32184 Jul 12 2018 /usr/bin/su

In reality, every process has more than one user ID. When you run a setuid program, Linux sets the effective user ID to the program’s owner during execution, but it keeps your original user ID in the real user ID.

Think of the effective user ID as the actor and the real user ID as the owner. The real user ID defines the user that can interact with the running process—most significantly, which user can kill and send signals to a process. For example, if user A starts a new process that runs as user B (based on setuid permissions), user A still owns the process and can kill it.

On normal Linux systems, most processes have the same effective user ID and real user ID. I verify this by a test.sh script run as dsadm, the euser and ruser are the same:

1
-rwsr-xr-x 1 root  root        56 May 11 22:17 test.sh

By default, ps and other system diagnostic programs show the effective user ID.

In conductor container, I run many su commands, you can see this, the euser and ruser are different:

1
2
3
4
5
ps -eo pid,euser,ruser,comm

1574 root dsadm su
2735 root dsadm su
4535 root dsadm su

PAM

In 1995 Sun Microsystems proposed a new standard called Pluggable Authentication Modules (PAM), a system of shared libraries for authentication. To authenticate a user, an application hands the user to PAM to determine whether the user can successfully identify itself.

Because there are many kinds of authentication scenarios, PAM employs a number of dynamically loadable authentication modules. Each module performs a specific task; for example, the pam_unix.so module can check a user’s password.

PAM Configuration

You’ll normally find PAM’s application configuration files in the /etc/pam.d directory (older systems may use a single /etc/pam.conf file).

Let’s see an example:

1
auth       requisite     pam_shells.so

Each configuration line has three fields: function type, control argument, and module:

  • Function type. The function that a user application asks PAM to perform. Here, it’s auth, the task of authenticating the user.

  • Control argument. This setting controls what PAM does after success or failure of its action for the current line (requisite in this example).

  • Module. The authentication module that runs for this line, determining what the line actually does. Here, the pam_shells.so module checks to see whether the user’s current shell is listed in /etc/shells.

PAM configuration is detailed on the pam.conf(5) manual page:

1
man 5 pam.conf

Chapter 8. A Closer Look at Processes and Resource Utilization

This chapter takes you deeper into the relationships between processes, the kernel, and system resources.

Many of the tools that you see in this chapter are often thought of as performance-monitoring tools. They’re particularly helpful if your system is slowing to a crawl and you’re trying to figure out why.

Tracking Processes

The top program is often more useful than ps because it displays the current system status as well as many of the fields in a ps listing, and it updates the display every second.

You can send commands to top with keystrokes. When you enter top command:

1
2
3
4
5
6
7
8
9
10
11
12
Tasks: 382 total,   2 running, 380 sleeping,   0 stopped,   0 zombie
%Cpu(s): 1.2 us, 0.5 sy, 0.0 ni, 96.4 id, 0.4 wa, 0.0 hi, 0.2 si, 1.3 st
KiB Mem : 8009536 total, 441360 free, 930236 used, 6637940 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6284448 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10438 root 20 0 519148 282140 19392 S 4.6 3.5 586:19.28 kube-apiserver
10449 root 20 0 275308 80184 14584 S 4.3 1.0 516:46.86 kube-controller
9691 root 20 0 1648968 159104 30708 S 2.6 2.0 492:06.83 kubelet
10206 root 20 0 10.1g 56568 7760 S 2.0 0.7 262:08.45 etcd
19459 root 20 0 82100 53780 9392 S 1.7 0.7 209:35.52 calico-node
...

If you see task %CPU is larger than 100, it must be a multi-thread and leverages multi-core. You can use H to toggle thread display rather than task, you will see multi-thead and each of them %CPU.

Note: if you want to see memory in MB, GB… Typing shift + e/E cycle through.

then type followings:

1
2
3
4
5
6
7
8
Spacebar: Updates the display immediately.
H: show threads in stead of tasks.
M: Sorts by current resident memory usage.
T: Sorts by total (cumulative) CPU usage.
P: Sorts by current CPU usage (the default).
u: Displays only one user’s processes.
f: Selects different statistics to display and sort. (use arrow to move and space to select)
?: Displays a usage summary for all top commands.

Finding Open Files by lsof

One use for this command is when a disk cannot be unmounted because (unspecified) files are in use. The listing of open files can be consulted (suitably filtered if necessary) to identify the process that is using the files.

The lsof command lists open files and the processes using them. lsof doesn’t stop at regular files, it can list network resources, dynamic libraries, pipes, and more.

For example: Display entries for open files in /usr directory and successors.

1
lsof /usr/*

List open files for a particular PID:

1
lsof -p 1623

Tracing Program Execution and System Calls

The most common use is to start a program using strace, which prints a list of system calls made by the program. This is useful if the program continually crashes, or does not behave as expected; for example using strace may reveal that the program is attempting to access a file which does not exist or cannot be read.

The strace (system call trace) and ltrace (library trace) commands can help you discover what a program attempts to do. These tools produce extraordinarily large amounts of output, but once you know what to look for, you’ll have more tools at your disposal for tracking down problems.

For example:

1
strace cat not_a_file

you get errors in open("not_a_file", O_RDONLY) line:

1
2
3
4
5
6
7
8
9
10
11
12
execve("/usr/bin/cat", ["cat", "not_a_file"], [/* 23 vars */]) = 0
brk(NULL) = 0x81f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc85159d000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
...
...
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
open("not_a_file", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "cat: ", 5cat: ) = 5
write(2, "not_a_file", 10not_a_file)
...

Threads

In Linux, some processes are divided into pieces called threads.

To display the thread information in ps, add the m option. For example:

1
ps axm -o pid,tid,command
1
2
3
4
5
6
7
8
9
10
PID   TID   COMMAND
1891 - db2ckpwd 0
- 1891 -
1892 - db2ckpwd 0
- 1892 -
3501 - db2fmp ( ,1,0,0,0,0,0,00000000,0,0,0,0000000000000000,0000000000000000,00000000,00000000,00000000,0000000
- 3501 -
- 3502 -
- 3503 -
...

The main thread ID is the as the process ID

Introduction to Resource Monitoring

To monitor one or more specific processes over time, use the -p option to top, with this syntax:

1
top -p <pid>

Adjusting Process Priorities

You can change the way the kernel schedules a process in order to give the process more or less CPU time than other processes.

The kernel runs each process according to its scheduling priority, which is a number between –20 and 20, with –20 being the foremost priority.

1
2
3
4
5
6
7
8
ps axl

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 1000 1 0 20 0 15120 1596 do_wai Ss ? 0:00 /bin/bash /opt/IBM/InformationServer/initScripts
4 0 1882 1 20 0 1225076 48936 futex_ Sl ? 0:00 db2wdog 0 [db2inst1]
4 1000 1884 1882 20 0 10302284 1699292 futex_ Sl ? 58:25 db2sysc 0
5 0 1890 1882 20 0 1227236 18292 do_msg S ? 0:13 db2ckpwd 0

1
2
3
4
5
6
top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1884 db2inst1 20 0 9.8g 1.6g 1.6g S 1.3 10.4 58:25.82 db2sysc
1 db2inst1 20 0 15120 1596 1360 S 0.0 0.0 0:00.01 startcontainer.
1882 root 20 0 1225076 48936 33072 S 0.0 0.3 0:00.13 db2syscr

PR is the priority value. NI (nice value), high nice value means nicer, more likely to give up CPU time.

Alter the nice value:

1
renice <value> <pid>

Load Averages

The load average is the average number of processes currently ready to run. Keep in mind that most processes on your system are usually waiting for input (from the keyboard, mouse, or network, for example), meaning that most processes are not ready to run and should contribute nothing to the load average. Only processes that are actually doing something affect the load average.

1
2
# uptime
... up 91 days, ... load average: 0.08, 0.03, 0.01

The three numbers are the load averages for the past 1 minute, 5 minutes, and 15 minutes, respectively. An average of only 0.01 processes have been running across all processors for the past 15 minutes.

If a load average goes up to around 1, a single process is probably using the CPU nearly all of the time. To identify that process, use the top command; the process will usually rise to the the top of the display.

If you have two cores, a load average of 1 means that only one of the cores is likely active at any given time, and a load average of 2 means that both cores have just enough to do all of the time.

A high load average does not necessarily mean that your system is having trouble. A system with enough memory and I/O resources can easily handle many running processes. If your load average is high and your system still responds well, don’t panic

However, if you sense that the system is slow and the load average is high, you might be running into memory performance problems.

Memory

CPU has a memory management unit (MMU) that translates the virtual memory addresses used by processes into real ones. The kernel assists the MMU by breaking the memory used by processes into smaller chunks called pages.

The kernel maintains a data structure, called a page table, that contains a mapping of a processes’ virtual page addresses to real page addresses in memory. As a process accesses memory, the MMU translates the virtual addresses used by the process into real addresses based on the kernel’s page table.

A user process does not actually need all of its pages to be immediately available in order to run. The kernel generally loads and allocates pages as a process needs them; this system is known as on-demand paging or just demand paging.

Page Faults

If a memory page is not ready when a process wants to use it, the process triggers a page fault.

  • MINOR PAGE FAULTS A minor page fault occurs when the desired page is actually in main memory but the MMU doesn’t know where it is. This can happen when the process requests more memory or when the MMU doesn’t have enough space to store all of the page locations for a process. In this case, the kernel tells the MMU about the page and permits the process to continue. Minor page faults aren’t such a big deal, and many occur as a process runs. Unless you need maximum performance from some memory-intensive program, you probably shouldn’t worry about them.

  • MAJOR PAGE FAULTS A major page fault occurs when the desired memory page isn’t in main memory at all, which means that the kernel must load it from the disk or some other slow storage mechanism. Some major page faults are unavoidable, such as those that occur when you load the code from disk when running a program for the first time.

Let’s see the page faults:

1
2
3
4
# /usr/bin/time netstat > /dev/null

0.05user 0.02system 0:01.74elapsed 4%CPU (0avgtext+0avgdata 2556maxresident)k
1752inputs+0outputs (3major+781minor)pagefaults 0swaps

There are 3 major page faults and 781 minor page faults when running netstat program. The major page faults occurred when the kernel needed to load the program from the disk for the first time. If you ran the command again, you probably wouldn’t get any major page faults because the kernel would have cached the pages from the disk:

1
2
3
4
# /usr/bin/time netstat > /dev/null

0.04user 0.02system 0:01.61elapsed 3%CPU (0avgtext+0avgdata 2552maxresident)k
0inputs+0outputs (0major+783minor)pagefaults 0swaps

Note that time here is not the shell built-in time command! If you run

1
type -a time

you will see

1
2
time is a shell keyword
time is /usr/bin/time

see this doc

If you’d rather see the number of page faults of processes as they’re running, use top or ps. When running top, use f to add the displayed fields and space to display the nMaj and nMin.

1
2
3
4
5
# top

PID USER %CPU PR NI VIRT RES SHR S %MEM TIME+ COMMAND nMaj nMin
1303 dsadm 1.7 20 0 4753196 82704 13412 S 0.5 158:31.05 java 0 25k
1929 dsadm 0.3 20 0 203344 2556 2116 S 0.0 2:16.80 ResTrackApp 0 1028

When using ps, you can use a custom output format to view the page faults for a particular process:

1
2
3
4
# ps -o pid,min_flt,maj_flt 1

PID MINFL MAJFL
1 2059 6

Monitoring CPU and Memory Performance

Among the many tools available to monitor system performance, the vmstat command is one of the oldest, with minimal overhead. You’ll find it handy for getting a high-level view of how often the kernel is swapping pages in and out, how busy the CPU is, and IO utilization.

1
2
3
4
5
6
7
8
9
vmstat 2

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 174452 1064 6874152 0 0 17 55 4 0 2 2 95 0 1
2 0 0 173900 1064 6874168 0 0 0 16 10362 12869 5 3 90 0 2
3 0 0 173932 1064 6874180 0 0 0 291 9110 10761 2 1 95 1 1
0 0 0 174056 1064 6874180 0 0 0 59 9126 12447 3 3 92 0 1
0 0 0 174228 1064 6874184 0 0 0 167 7100 9601 1 1 97 0 1

Not easy to understand, can dig deeper into it by reading vmstat(8) manual page.

I/O Monitoring

Like vmstat and netstat (talk later), we have iostat.

1
2
3
4
5
6
7
8
9
10
iostat 2 -d -p ALL

Linux 3.10.0-862.14.4.el7.x86_64 (dstest1.fyre.ibm.com) 05/21/2019 _x86_64_ (8 CPU)

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
vda 18.12 87.72 201.61 270042131 620650624
vda1 0.00 0.00 0.00 11165 2270
vda2 13.59 87.71 201.61 270022634 620648353
dm-0 17.99 86.13 201.61 265157850 620648353
dm-1 0.06 1.58 0.00 4860836 0

This means update every 2 seconds, show device only and show all partitions.

If you need to dig even deeper to see I/O resources used by individual processes, the iotop tool can help. Using iotop is similar to using top.

1
2
3
4
5
6
7
iotop

Total DISK READ: 4.76 K/s | Total DISK WRITE: 333.31 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
260 be/3 root 0.00 B/s 38.09 K/s 0.00 % 6.98 % [jbd2/sda1-8]
2611 be/4 juser 4.76 K/s 10.32 K/s 0.00 % 0.21 % zeitgeist-daemon
...

It shows TID (thread ID) instead of PID, PRIO (priority) indicates the IO priority, be/3 is more important than be/4. The kernel uses the scheduling class to add more control for I/O scheduling. You’ll see three scheduling classes from iotop:

  • be Best-effort. The kernel does its best to fairly schedule I/O for this class. Most processes run under this I/O scheduling class.

  • rt Real-time. The kernel schedules any real-time I/O before any other class of I/O, no matter what.

  • idle Idle. The kernel performs I/O for this class only when there is no other I/O to be done. There is no priority level for the idle scheduling class.

Per-Process Monitoring

The pidstat utility allows you to see the resource consumption of a process over time in the style of vmstat.

1
2
3
4
5
6
7
8
# pidstat -p 27946 1

Linux 3.10.0-862.14.4.el7.x86_64 (myk8s1.fyre.ibm.com) 05/21/2019 _x86_64_ (4 CPU)

08:16:27 PM UID PID %usr %system %guest %CPU CPU Command
08:16:28 PM 1002 27946 0.00 0.00 0.00 0.00 3 tail
08:16:29 PM 1002 27946 0.00 0.00 0.00 0.00 3 tail
08:16:30 PM 1002 27946 0.00 0.00 0.00 0.00 3 tail

The CPU column tells you about this process is running on which CPU.

Chapter 9. Network and Configuration

Note that the ifconfig command, as well some of the others you’ll see later in this chapter (such as route and arp), has been technically supplanted with the newer ip command. The ip command can do more than the old commands, and it is preferable when writing scripts. However, most people still use the old commands when manually working with the network, and these commands can also be used on other versions of Unix. For this reason, we’ll use the old-style commands.

Routes and the Kernel Routing Table

Let’s see the routing table by route command, -n means show numerical address instead of hostname:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# route -n

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 9.30.94.1 0.0.0.0 UG 0 0 0 eth1
9.30.94.0 0.0.0.0 255.255.254.0 U 0 0 0 eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 eth1
172.16.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 *
192.168.0.2 0.0.0.0 255.255.255.255 UH 0 0 0 calib4daf4f1db0
192.168.0.3 0.0.0.0 255.255.255.255 UH 0 0 0 cali987b4d0c33f
192.168.1.0 172.16.182.156 255.255.255.0 UG 0 0 0 eth0
192.168.2.0 172.16.182.187 255.255.255.0 UG 0 0 0 eth0

The Destination column tells you a network prefix (outside network), and the Genmask column is the netmask corresponding to that network. Each network has a U under its Flags column, indicating that the route is active (“up”).

There is a G in the Flags column, meaning that communication for this network must be sent through the gateway in the Gateway column, for example, for networl 0.0.0.0/0 send through it’s gateway 9.30.94.1. If no G in Flags, indicating that the network is directly connected in some way.

An entry for 0.0.0.0/0 in the routing table has special significance because it matches any address on the Internet. This is the default route, and the address configured under the Gateway column (in the route -n output) in the default route is the default gateway.

Basic ICMP and DNS Tools

ping

1
2
3
4
5
6
7
8
9
10
# ping baidu.com

PING baidu.com (123.125.114.144) 56(84) bytes of data.
64 bytes from 123.125.114.144 (123.125.114.144): icmp_seq=1 ttl=40 time=212 ms
64 bytes from 123.125.114.144 (123.125.114.144): icmp_seq=2 ttl=40 time=212 ms
64 bytes from 123.125.114.144 (123.125.114.144): icmp_seq=3 ttl=40 time=212 ms
^C
--- baidu.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 212.335/212.495/212.578/0.393 ms

56(84) bytes of data means send a 56 bytes packet (84 bytes when include header). icmp_seq is sequence number, sometimes you will have gap, usually means there’s some kind of connectivity problem. time is round-trip time.

traceroute

One of the best things about traceroute is that it reports return trip times at each step in the route:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
## -n will not do hostname lookup for IP in output
# traceroute -n google.com

traceroute to google.com (172.217.1.206), 30 hops max, 60 byte packets
1 9.30.94.3 0.626 ms 0.742 ms 0.845 ms
2 9.30.156.13 0.529 ms 0.801 ms 0.918 ms
3 9.55.129.109 0.668 ms 0.852 ms 9.55.129.105 0.515 ms
4 9.55.187.13 0.476 ms 0.255 ms 0.425 ms
5 9.55.128.6 0.326 ms 0.433 ms 0.294 ms
6 9.64.38.194 0.941 ms 0.898 ms 0.843 ms
7 9.64.3.86 18.220 ms 18.230 ms 18.229 ms
8 9.64.3.85 31.521 ms 31.527 ms 31.563 ms
9 9.17.3.35 31.803 ms 31.613 ms 31.875 ms
...

DNS and host

To find the IP address behind a domain name, use the host command:

1
2
3
4
# host www.google.com

www.google.com has address 172.217.11.228
www.google.com has IPv6 address 2607:f8b0:400f:801::2004

You can also use host in reverse: Enter an IP address instead of a hostname to try to discover the hostname behind the IP address. But don’t expect this to work reliably. Many hostnames can represent a single IP address, and DNS doesn’t know how to determine which hostname should correspond to an IP address.

Kernel Network Interfaces

Network interfaces have names that usually indicate the kind of hardware underneath, such as eth0 (the first Ethernet card in the computer) and wlan0 (a wireless interface).

1
2
3
4
5
6
7
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
inet 9.30.94.85 netmask 255.255.254.0 broadcast 9.30.95.255
ether 00:20:09:1e:5e:55 txqueuelen 1000 (Ethernet)
RX packets 17164860 bytes 9046289828 (8.4 GiB)
RX errors 0 dropped 6 overruns 0 frame 0
TX packets 11669220 bytes 9566003426 (8.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

UP,RUNNING means this interface is active.

Resolving Hostnames

On most systems, you can override hostname lookups with the /etc/hosts file. Usually resolution will first check this file before resort to DNS server.

The traditional configuration file for DNS servers is /etc/resolv.conf:

1
2
3
4
## this is the search pattern:
search fyre.ibm.com. svl.ibm.com.
nameserver 172.16.200.52
nameserver 172.16.200.50

172.16.200.52 and 172.16.200.50 are the DNS server IP.

netstat command

This netstat command is extremely important and common in use. Ususally I use netstat -tunlp, let’s dig deeper into it:

  • -t: show TCP connection.
  • -u: show UDP connection.
  • -n: show numerical addresses.
  • -l: show only listening sockets.
  • -p: show PID belongs to.

Instead of ifconfig to see the interface, you can use:

1
2
3
4
5
6
7
8
9
10
# netstat -i

Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
cali9550 1440 60880456 0 6 0 37444050 0 0 0 BMRU
cali986f 1440 0 0 0 0 0 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 BMU
eth0 1500 5606028968 0 0 0 33647018 0 0 0 BMRU
eth1 1500 60880456 0 6 0 37444050 0 0 0 BMRU
lo 65536 431426103 0 0 0 431426103 0 0 0 LRU

Instead of route -n to see route table, you can use:

1
# netstat -rn

Show TCP connections (not include listening sockets):

1
# netstat -tn

To see well-known ports translate into names. check /etc/services file:

1
2
3
4
5
...
http 80/tcp www www-http # WorldWideWeb HTTP
http 80/udp www www-http # HyperText Transfer Protocol
http 80/sctp # HyperText Transfer Protoco
...

On Linux, only processes running as the superuser can use ports 1 through 1023. All user processes may listen on and create connections from ports 1024 and up.

I skip the rest of this chapter, majority is concept

Chapter 10. Network Applications and Services

Let’s mainly focus on some important commands here:

curl command

curl is a command line tool to transfer data to or from a server, using any of the supported protocols (HTTP, FTP, IMAP, POP3, SCP, SFTP, SMTP, TFTP, TELNET, LDAP or FILE). curl is powered by Libcurl. This tool is preferred for automation, since it is designed to work without user interaction. curl can transfer multiple file at once.

you can refer this article.

Diagnostic Tools

lsof(list open files) can track open files, but it can also list the programs currently using or listening to ports. Please read more when you need this tool.

tcpdump, a command tool version of wireshark.

netcat(or nc) I used it before for developing PXEngine, we use TCP to replace ssh connection between conductor and compute containers. netcat can connect to remote TCP/UDP ports, specify a local port, listen on ports, scan ports, redirect standard I/O to and from network connections, and more.

I remember I use nc to listen on a port and on other side connect to that port and transfer data.

1
2
3
4
5
6
7
8
9
## install
yum install -y nc
apt install -y netcat

## -l: listening mode
## -p: port
nc -l -p 1234
## client
nc localhost 1234

netcat can be used for TCP, UDP, Unix-domain sockets.

nmap scans all ports on a machine or network of machines looking for open ports, and it lists the ports it finds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# nmap myk8s1.fyre.ibm.com

Starting Nmap 6.40 ( http://nmap.org ) at 2019-05-22 23:33 PDT
Nmap scan report for myk8s1.fyre.ibm.com (9.30.94.85)
Host is up (0.00024s latency).
Not shown: 995 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
179/tcp open bgp
2049/tcp open nfs
5000/tcp open upnp

Nmap done: 1 IP address (1 host up) scanned in 0.20 seconds

Chapter 11. Introduction to Shell Scripts

A shell script is a series of commands written in a file. The #! part is called a shebang.

When writing scripts and working on the command line, just remember what happens whenever the shell runs a command:

  1. Before running the command, the shell looks for variables, globs, and other substitutions and performs the substitutions if they appear.
  2. The shell passes the results of the substitutions to the command.

if you use single quote:

1
grep 'r.*t' /etc/passwd

This will prevent sheel from expanding the * in current directory.

1
grep 'r.*t /etc/passwd'

This will fail, because things wrapped by single/double quote treat as one parameter.

Double quotes (") work just like single quotes, except that the shell expands variables that appear within double quotes. It will not expand globs like * in double quotes!

Just like I saw, use shift to forward arguments passed in:

1
2
3
4
#!/bin/sh
echo $1
shift
echo $1

$# is the number of arguments passed in, used in loop to pick up parameters. $@ represents all of the script arguments. $$ holds the PID of current shell.

bad message should go to standard error, just like redriect standard error to standard output:

1
echo $0: bad option ... 1>&2

$? exit code: If you intend to use the exit code of a command, you must use or store the code immediately after running the command.

if condition

Let’s see an example, these 2 are good:

1
2
if [ "$1" = hi ]; then
if [ x"$1" = x"hi" ]; then

Here, "" is vital, since user may not input $1, if no double quotes, it could be:

1
if [ = hi ]; then

the test ([) command aborts immediately.

Note that the stuff follows if is a command! so we have ; before then.

So you can use other commands instead of [ command, cool!

1
2
3
4
5
6
#!/bin/sh
if grep -q daemon /etc/passwd; then
echo The daemon user is in the passwd file.
else
echo There is a big problem. daemon is not in the passwd file.
fi

Let’s see && and || and test condition:

1
2
3
4
#!/bin/sh
if [ "$1" = hi ] || [ "$1" = bye ]; then
echo 'The first argument was "'$1'"'
fi

The -a and -o flags are the logical and and or operators in test:

1
[ "$1" = hi -o "$1" = ho ]

test command

There are dozens of test operations, all of which fall into three general categories: file tests, string tests, and arithmetic tests.

file filter

-f: regular file return 0 -e: file exist return 0 -s: not empty file return 0 -d: directory return 0 -h: softlink return 0

File permission: -r: readable -w: writable -x: executable -u: setuid -g: setgid -k: sticky

The test command follows symbolic links (except for the -h test). That is, if link is a symbolic link to a regular file, [ -f link ] returns an exit code of true (0).

Finally, three binary operators (tests that need two files as arguments) are used in file tests, but they’re not terribly common. [ file1 -nt file2 ]: if file1 has a newer modification date than file2 return 0 [ file1 -ot file2 ]: if file1 has a older modification date than file2 return 0 [ file1 -ef file2 ]: compares two files and returns true if they share inode numbers and devices.

string test

=: equal !=: not equal -z: empty string return 0 -n: not empty return 0

arithmetic test

-eq: equal to -ne: not equal to -lt: less than -gt: greater than -le: less than or equal to -ge: greater than or equal to

case condition

The case keyword forms another conditional construct that is exceptionally useful for matching strings, it can do pattern matching:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/sh
case $1 in
bye)
echo Fine, bye.
;;
hi|hello)
echo Nice to see you.
;;
what*)
echo Whatever.
;;
*)
echo 'Huh?'
;;
esac

Each case must end with a double semicolon (;;) or you risk a syntax error.

loop

for loop:

1
2
3
4
#!/bin/sh
for str in one two three four; do
echo $str
done

while loop:

1
2
3
4
5
6
7
8
9
10
11
#!/bin/sh
FILE=/tmp/whiletest.$$;
echo firstline > $FILE
while tail -10 $FILE | grep -q firstline; do
# add lines to $FILE until tail -10 $FILE no longer prints "firstline"
echo -n Number of lines in $FILE:' '
wc -l $FILE | awk '{print $1}'
echo newline >> $FILE
done

rm -f $FILE

In fact, if you find that you need to use while, you should probably be using a language like awk or Python instead.

Command Substitution

You can use a command’s output as an argument to another command, or you can store the command output in a shell variable by enclosing a command in $().

Temporary File Management

Note the mktemp command:

1
2
3
4
5
6
7
8
9
#!/bin/sh
TMPFILE1=$(mktemp /tmp/im1.XXXXXX)
TMPFILE2=$(mktemp /tmp/im2.XXXXXX)

cat /proc/interrupts > $TMPFILE1
sleep 2
cat /proc/interrupts > $TMPFILE2
diff $TMPFILE1 $TMPFILE2
rm -f $TMPFILE1 $TMPFILE2

If the script is aborted, the temporary files could be left behind. In the preceding example, pressing CTRL-C before the second cat command leaves a temporary file in /tmp. Avoid this if possible. Instead, use the trap command to create a signal handler to catch the signal that CTRL-C generates and remove the temporary files, as in this handler:

1
2
3
4
#!/bin/sh
TMPFILE1=$(mktemp /tmp/im1.XXXXXX)
TMPFILE2=$(mktemp /tmp/im2.XXXXXX)
trap "rm -f $TMPFILE1 $TMPFILE2; exit 1" INT

You must use exit in the handler to explicitly end script execution, or the shell will continue running as usual after running the signal handler.

Note that in startcontainer.sh we also have trap and we use shell function there, now I understand!

Important Shell Script Utilities

basename

This one strip the extension of file name:

1
2
3
# basename example.html .html

example

This one git rid of directory in full path:

1
2
3
# basename /usr/local/bin/example

example

awk

The awk command is not a simple single-purpose command; it’s actually a powerful programming language. Unfortunately, awk usage is now something of a lost art, having been replaced by larger languages such as Python.

sed

The sed program (sed stands for stream editor) is an automatic text editor that takes an input stream (a file or the standard input), alters it according to some expression, and prints the results to standard output.

expr

The expr command is a clumsy, slow way of doing math. If you find yourself using it frequently, you should probably be using a language like Python instead of a shell script.

Subshells

An entirely new shell process that you can create just to run a command or two. The new shell has a copy of the original shell’s environment, and when the new shell exits, any changes you made to its shell environment disappear, leaving the initial shell to run as normal.

Using a subshell to make a single-use alteration to an environment variable is such a common task:

1
# (PATH=/usr/confusing:$PATH; ./runprogram.sh)

Chapter 12. Moving Files Across the Network

Quick copy browser

Go to target directory run:

1
python -m SimpleHTTPServer

This usually open 8000 port on your machine, then go to another machine open:

1
2
# use ifconfig to check the source machine IP
192.168.1.29:8000

you can see the content there.

rsync

Actually you can first enable Mac ssh access then use rsync to backup files: System Preference -> Sharing -> check remote login

To get rsync working between two hosts, the rsync program must be installed on both the source and destination, and you’ll need a way to access one machine from the other.

Copy files to remote home:

1
2
rsync files remote:
rsync files user@remote:

If rsync isn’t in the remote path but is on the system, use --rsync-path=path to manually specify its location.

Unless you supply extra options, rsync copies only files. You will see:

1
skipping directory xxx

To transfer entire directory hierarchies, complete with symbolic links, permissions, modes, and devices—use the -a option.

1
rsync -nv files -a dir user@remote:

-n: dry-run, this is vital when you are not sure. -vv: verbose mode

To make an exact replica of the source directory, you must delete files in the destination directory that do not exist in the source directory:

1
rsync -v --delete -a dir user@remote:

Please use -n dry-run to see what will be deleted before performing command.

Be particular careful with tailing slash after dir:

1
rsync -a dir/ user@remote:dest

This will copy all files under dir to dest folder in remote instead of copy dir into dest.

You can also --exclude=, --exclude-from= and --include= in command.

To speed operation, rsync uses a quick check to determine whether any files on the transfer source are already on the destination. The quick check uses a combination of the file size and its last-modified date.

When the files on the source side are not identical to the files on the destination side, rsync transfers the source files and overwrites any files that exist on the remote side. The default behavior may be inadequate, though, because you may need additional reassurance that files are indeed the same before skipping over them in transfers, or you may want to put in some extra safeguards.

  • --checksum (abbreviation: -c) Compute checksums (mostly unique signatures) of the files to see if they’re the same. This consumes additional I/O and CPU resources during transfers, but if you’re dealing with sensitive data or files that often have uniform sizes, this option is a must. (This will focus on file content, not date stamp)

  • --ignore-existing Doesn’t clobber files already on the target side.

  • --backup (abbreviation: -b) Doesn’t clobber files already on the target but rather renames these existing files by adding a ~ suffix to their names before transferring the new files.

  • --suffix=s Changes the suffix used with --backup from ~ to s.

  • --update (abbreviation: -u) Doesn’t clobber any file on the target that has a later date than the corresponding file on the source.

You can also compress the dir when transfer:

1
rsync -az dir user@remote:

You can also reverse the process:

1
rsync -a user@remote:dir dest

The rest of this chapter talks samba for file sharing, I skip it.

Chapter 13. User Environments

Startup files play an important role at this point, because they set defaults for the shell and other interactive programs. They determine how the system behaves when a user logs in.

I see vi theme config in ~/.bashrc file.

The Command Path

The most important part of any shell startup file is the command path. The path should cover the directories that contain every application of interest to a regular user. At the very least, the path should contain these components, in order:

1
2
3
/usr/local/bin
/usr/bin
/bin

If the application is on another directory, use symbolic link to /usr/local/bin or you defined bin folder.

The prompt

I never use this so far, usually prompt shows hostname, username, current directory and sign ($ or #). you can change the color and more.

Alias

This is common use, sometimes I use shell functions too.

Permission mask

It depends on your needs:

1
umask 022/077

Startup file order

These startup files are used to create environment. Each script has a specific use and affects the login environment differently. Every subsequent script executed can override the values assigned by previous scripts.

The two main shell instance types are interactive and noninteractive, but of those, only interactive shells are of interest because noninteractive shells (such as those that run shell scripts) usually don’t read any startup files.

Interactive shells are the ones that you use to run commands from a terminal, they can be classified as login or non-login.

I know there are lots of startup files under each user’s home directory or in other system folder, how do they take effect? In what order: Reference Doc Difference between Login shell and Non login shell

Logging in remotely with SSH also gives you a login shell.

You can tell if a shell is a login shell by running echo $0; if the first character is a -, the shell’s a login shell.

When Bash is invoked as a Login shell:

  1. Login process calls /etc/profile
  2. /etc/profile calls the scripts in /etc/profile.d/
  3. Login process calls ~/.bash_profile, ~/.bash_login and ~/.profile. running only the first one that it sees.

Login Shells created by explicitly telling to login. examples: # su - | # su -l | # su --login | # su USERNAME - | # su -l USERNAME | # su --login USERNAME | # sudo -i

When bash is invoked as a Non-login shell;

  1. Non-login process(shell) calls /etc/bashrc
  2. then calls ~/.bashrc

Non-Login shells created using the below command syntax: examples: # su | # su USERNAME

Note that I can run bash or sh or csh in terminal, it will give me a new simple prompt without user profile or setting…

It seems if you use non-login like su dsadm, the export env vars are still there in env scope, I think the reason is it’s not login! still use current environment. But if you run su - dsadm, it is gone.

首先了解softlink 解决了hardlink什么问题:

  • Hard links cannot span physical devices or partitions.
  • Hard links cannot reference directories, only files.

Same hard links are spotted by inode value, if a file has mutlipel hard links, then delete all of them in any order would delete the file. You can use stat <file> to check inode and hard links if a file.

A symbolic link, also known as a symlink or a soft link, is a special kind of file (entry) that points to the actual file or directory on a disk (like a shortcut in Windows).

Symbolic links are used all the time to link libraries and often used to link files and folders on mounted NFS (Network File System) shares.

Generally, the ln syntax is similar to the cp or mv syntax:

1
2
cp source destination
ln -s source destination

But if the destination is an directory, soft link will be created inside destination directory.

Create Softlink

For example, create a symbolic link to a file and directory: -s: soft link -n: treat LINK_NAME as a normal file if it is a symbolic link to a directory. 意思是如果LINK_NAME存在并且是一个文件夹softlink,那么还是当做一个普通文件对待,这样就不会再文件夹softlink内部去构造软连接了,一般配合-f使用去replace已经存在的软连接。 -f: remove existing link, otherwise if the link is exist, get error like this:

1
2
3
# you need to remove the link first
# but with -f, no need
ln: failed to create symbolic link ‘xxxx’: File exists

Link path can be absolute or relative, relative is more desirable since the path may be changed. Softlink size is the length of the link path, see in ls -l:

1
2
3
4
# link a file
ln -nfs <path to file> <path to link>
# link a directory
ln -nfs <path to dir> <path to link>

Delete Softlink

There are 2 ways to undo or delete the soft link:

1
2
unlink <link name>
rm [-rf] <link name>

Note that, the rm command just removes the link. It will not delete the origin.

Broken Softlink

Find broken symbolic link:

1
find . -xtype l

If you want to delete in one go:

1
find . -xtype l -delete

A bit of explanation: -xtype l tests for links that are broken (it is the opposite of -type) -delete deletes the files directly, no need for further bothering with xargs or -exec

Ownership of Softlink

How to change ownership from symbolic links? Change permissions for a symbolic link

On most system, softlink ownership does not matter, usually the link permission is 777, for example: lrwxrwxrwx, when using, the softlink origin’s permission will be checked.

How to change the ownership or permission of softlink, chown/chmod -h <link>, if no -h, the chown/chmod will change the ownership of the link origin.

Resources:

SymLink – HowTo: Create a Symbolic Link – Linux How can I find broken symlinks How to delete broken symlinks in one go?

Daily

Pushback

Pattern: common ground -> issue -> alternative

Create pushback

When the advice is not right to you:

1
2
3
4
5
I definitely see the value in doing x, and agree with you that ...

However, in past experience x rarely gave any good result because ...

For the sake of time, how about we focus on y instead?

If the advice does not align with stakeholder’s goal:

1
2
3
4
5
6
7
8
I agree with you that approach x has its benefits such as ...

but when I discussed those points with the project manager, he seemed to be
pretty committed to approach y because...

I see arguments on both sides, but to avoid complications maybe we should go
with approach y, I **am concerned that** too much back and forth will delay the
project timeline.

If one still insists, asking:

1
2
why you feel strongly about this?
what's your **rationality** behind your point?

Limited bandwidth but new task comes from others:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Thanks for telling me about this project, it is indeed relevant to our work on x.

But I am not sure if it is realistic with my workload because ...
I have other priorities at the moment.
I am expecting x to happen soon, so the next few weeks are not looking good.
I have a full plate.
I am **maxed out** on what I can take.

I don't think i will have time to pick this up soon, I unassign it for now in
case someone else has cycles to pick it up (or until I have cycles later to puck
it up)

Since I probably won't be able to **commit too much time**, would you mind asking
y to lead/dirve this one?
Do you agree it makes more sense for y to take the lead on this one as y spent
a month on it already?
Would you mind if I just focus on a specific part of the project?
Any chance we can postpone the project?
let me double check with my manager, but I really don't expect anything can be
**pushed** on my side.

Reprioritize x over y on limited bandwidth:

1
2
3
4
5
6
I understand that x is important and urgent

Assuming everything else goes as planned, I expect it to take a week, leaving
no room for y.

Would you mind if we push y to next week?

More confrontational case with colleague throws his weight around 指手画脚, to make them back off:

1
2
3
4
5
6
7
8
Thank you for your suggestions, they will be definitely taken into account in
future development.

However, pls keep in mind the current method was extensively discussed and
collectively decided, and any proposal will require sign-offs from all relevant
teams.

Pls feel free to reach out to those teams and kick off the process if needed.

Defend pushback

Pattern: create common interests/share credits

Move from no because -> yes if(what you need in order to do it), to get more concrete info and avoid perfunctory/pərˈfʌŋktəri/, ambiguous(ambiguity) /ˌæmbɪˈɡjuːəti/

1
2
3
4
5
6
We understand this is a difficult thing to do.

But if you were to make it happen, can you estimate and quantify how much work
is needed? Doable or not, we should have an idea.

We might indeed go another route if that makes more sense in comparison.

Deal with info gap, overstating, exaggeration:

1
2
3
4
5
6
7
8
Could you please specify what the issue is and help us quantify what you need to
make it work?

I believe this topic will be discussed in x meeting, and we would like to have a
good answer when it is brought up.

This is getting attention from senior/leadership and we would like to prepare a
solid answer in case it comes up again.

Challange in meeting

Pattern: Disclaimer -> invitation -> buffer(for who will jump in)

weigh in(influence)/chime/tʃaɪm/ in(插嘴) throw one’s weight around(指手画脚)/behind(鼎力支持, stronger than support, used for senior,leader)

1
2
3
4
5
6
7
I haven't been as close to that topic as some of my teammates, so x, please feel
free to jump in.

I'm not as familiar with this topic as some of my colleagues. X, please feel
free to jump in.

My understanding is that ...
1
2
3
4
I haven't spent as much time as I would like on this topic, but I know x and y
are the experts on this so please feel free to weigh in.

What I learned so far is that...
1
2
3
4
I still need to bring this up to **a few people** to get a **more complete picture**, if
anyone has a better idea please chime in.

My current understanding is that...
1
2
3
4
This was only discussed briefly last time and **things are constantly changing**,
so x please correct me if I am wrong.

What I remember from the last meeting is that...
1
2
3
4
I have a **rough idea** after talking to x about the potential concerns, and x pls
add more details to my interpretation.

What I understood is that ..., x would you like to share your thoughts?

If no one can help:

1
2
3
4
5
I'm not sure **off the top of my head**(without careful thought or investigation), 
can I **double check and get back to you**?

I need some time to **organize** my thoughts a little, can we come back to this
later?

Old Work Item

It concluded its run with the May 2025.

It’s been a while since I last worked on this project.

It’s been over half a year since I last contributed to this project.

Code Ramp up

I’m having trouble tracing the flow of this code. Can you guide me through how this particular block works?

Where is the entry point for this feature or workflow in the code?

let’s trace the execution flow of this function

We can step through this code line by line to see what’s happening

Let’s follow the data as it moves through the system.

Let’s debug this issue by inspecting the variable values at each step.

How does this class interact with the rest of the system?

Which part of the code handles the communication between these modules?

I’m trying to track this function’s execution path. What are the key entry and exit points?

Lack of Context

I realize I don’t have enough technical context yet to ask the right questions.

I am still getting up to speed on this project, so I’m mostly here to actively listen.

I am ramping up, but I’m starting to get a good sense of the system by making small contributions to some tasks: It conveys that you’re in the process of learning and getting up to speed on a new project or role, and that you may not be fully productive yet.

I am not in a position to object or offer any inputs as the lack of background.

Understand Big picture

What is the main goal of the project/system? (why) How does it fit into the overall architecture/product? (what is new components?) What are the biggest challenges or risks we foresee? (priorities) Are there any similar projects or systems we can learn from? (leverage)

Understand technical details

Can you explain the data flow or how control plane gets involved? (how things work) What technologies are used and why were them chosen? (tech insight) What are the key performance indicators (KPIs) for the system? Are there any existing design documents or specifications I can review? (get up to speed)

Hands-on

Can you walk me through the deployment process for this component? How does this component interact with other services or APIs? What are the testing strategies and how can I contribute to them? Are there any specific tools or frameworks used for development or debugging?

KT

Explain sth like I’m Five.

it is overly simplified and there’s a lot more to technical details, the gist of it is xxxx

When lost in the KT

I start feeling this session will be very helpful and I am a bit overwhelmed, can we move to a meeting room so I can record all of this knowledge presentation for my digestion and later revisit.

I appreciate overview, I understand the high-level goal and overall context, but I am still a bit fuzzy on some of the technical details. to make sure I am on the right track, could we dive deeper into xxx? For example, maybe draw something very specific(fake data in table) to help me visualize and understand their interaction better.

I want to highlight the knowledge gap as they are all new to me, I am eager to learn more about them, it would be helpful for me to check the code and run codelab on critical components if possible, so I can come prepared with more specific questions.

I’d like to take some time to review what we discussed, and I might have more specific questions afterward. Would it be possible to schedule a follow-up or share any documentation I can review in the meantime?

Since this is a new domain/men/ for me, I’ll need some time to absorb everything.

As this is a new area for me, I’m finding it a bit challenging to fully grasp all the technical details right away. I’d like to dig into this further to get a better understanding and come back with more targeted questions. Is there any additional information or documentation that could help me get up to speed?

How to control the meeting

Bring the topic back:

1
2
3
4
5
6
let's hold off on that for a minute, we will come back to it soon.

that's a great point, but can we save that for a different meeting where we can
discuss this in detail? We need to focus on our main issue now.

let's take it offline

Move to another topic:

1
2
3
Let's move on to the question about ...
let's put this aside for now, I propose we move on/get to another subject.
we can now shift/switch gears and talk about ...

For unimportants:

1
2
3
4
5
6
7
Since we are running short on time, let's quickly go through a few major points.

Just to do a quick time check, with 10 min left we should run through the remaining
points.

Let's spend a couple of mins on these first few points so that we can concentrate
on the last point.

For importants:

1
2
3
4
can you pls elaborate on that?
can you pls expand on that?

while we are on this topic/Speaking of which, can you pls talk a bit more about...

Defend blame game

推卸责任,甩锅: shift the blame/Pass the buck/blame game

  • Cherry picking 选出最有利的 避重就轻, somewhat negative
  • Spinning facts 偏向性的解读事实(to a favorable one) 忽悠

Echo 附和:

1
2
3
4
We definitely see your point on xxx/thanks for pointing out

In some sense we agree with your point becuase ...
there is some truth in what you said about

To deal with cherry picking

1
2
3
4
5
6
7
8
9
10
11
12
there are other contributing factors here, and we would like to highlight ...

this does not show us the full picture/
this is not the whole picture because we are missing ...

we cannot overlook the importance of xxx

that's an isolated data point and does not tell the whole story
个例不能代表全部

things are more complicated/nuanced then that/than how they are being described
let's not oversimplify things

First understand work “nuance”:

  • things are nuanced: call to look deeper, consider multiple perspectives, and avoid oversimplified conclusions or solutions
  • has a nuanced understanding of xx: 了解门道

To deal with spin:

1
2
3
4
5
6
7
8
9
10
I tend to interpret this a different way/
I'm leaning towards a different interpretation of what happened

we are putting too much emphasis on xxx
小题大做

we found string evidence to belive this is not due to xx, but rahter due to xxx

while it is tempting to believe that xxx, I think we need to be more realistic
about it.

Last, give action items.

Discussion Wrap-up

I’ll make sure to follow up on the action items we discussed.

I’ll keep you updated on the progress.

I’ll be sure to keep you posted, have a great rest of the day.

Question about project

example: the capacity BM project on hold

Customer Empathy

Customer Empathy refers to the ability to understand and share the feelings, needs, and pain points of customers when they interact with products or services.

For example, you can say:

1
2
3
4
We just show our customer empathy if they don't want to follow the compliance.

We need to approach this with more customer empathy, simplifying the onboarding
process and making it more intuitive.

End meeting

  • Thanks everyone, that’s all for now, Let’s follow up on these action items and meet again on xxx.

  • Thanks for the productive discussion, I need to head back to work now. I’ll follow up on the action items we discussed.

  • I appreciate everyone’s input, I’ll get started on xxx and reach out if I have any questions.

  • We hope you had a blast(an enjoyable experience) with all the content the team prepared.

Lead time

Lead time refers to the amount of time between the initiation of a process and its completion. In various contexts, especially in project management, supply chain, and manufacturing, lead time can mean the time it takes from the start of a process to the delivery of a final product or service.

First-party Service

Refers to a service that is developed, managed, and offered by the cloud provider itself.

Third-party Service

Services or app offered by companies other than cloud provider, often run on top of cloud.

Management/control/data plane

In terms of cloud computing

Plane Purpose Examples
Management Plane Overall management and administration of the cloud environment. Provides tools for users to interact with the cloud. Web consoles (AWS Management Console, Azure portal), CLIs (AWS CLI, Azure CLI), APIs
Control Plane Makes decisions about how data flows through the network. Handles routing, security policies, and other network functions. SDN controllers, Load balancers, Firewalls
Data Plane Where the actual data processing and forwarding happens. Executes computations, stores data, and moves data packets. Virtual machines (VMs), Storage services, Databases

Dail in

He is not on.

I can barely hear you.

(not hear clearly) Sorry, I miss some points. I didn’t hear you. Are your voice broken?

Who do we have on the call? Do you want to go first?

This is from SVL, we have x1 x2 and myself, we are waiting for x3 to dail/dail/ in. only three of us here

we are waiting for more folks to join.

xx is on her way. Even I saw her.

Meeting

Too much info for me to grasp, xxxx

We should lay out the current playbook so that they can really see the gain.

Am I the only one confused?

I’m just stating that we need to lay out the switchover to another plan in a way that they can understand that the switch is straight forward.

I don’t know it off hand (not immediately available in their memory)

minor opinion: held by a smaller number of people.

Are there new service offerings on the horizon? This shows your forward-thinking mindset and desire to contribute to innovation.

Are there specific features or less-known capabilities of our tools?

I don’t want to get sidetracked, I want to stay on track

Getting everybody to agree is hard

Making this offline in lieu of(instead of) online as the Quiet week

Do they freeze (video disconnected)

I have unpredictable/non-stopping meetings today.

we don’t always get thing right at beginning, so we need to evolve it.

we need to do sth to fix/ease the pain.

any questions or anything we are missing?

I think it’s a great start, it covers a lot of content, awesome.

besides what we have already gone though, can you go back to xxx slide.

for steps 2-5, how long does that typically take? it seems not a labor-intensive work

Thanks for your quick turnaround: an abrupt or unexpected change, especially one that results in a more favorable situation

It is fungible/'fʌndʒəbl/: replaceable by another identical item;

can you pls weigh in on this item, as it is blocking our development until we get an answer.

open to any ideas folks have about how that could work

they are individual users, not company users

I just want to send a reminder to everyone who is working on new features or modifying old ones, when you introduce or modify A resource please ensure B is met from the get-go. This is also true for any new data sets / objects introduced in Q stack and P stack. B is a mandatory compliance and we get flagged each time we violate it.

it could also result in a situation where manually deleting data to fix the violation is not in our control and we have to reach out to partner teams for it which adds additional toil on the folks who have to deal with these violations.

Please ensure you have build mechanism to handle X for all your resources.

For questions/doubts/explanations, plesse do not hesitate/ˈhezɪteɪt/ to reach out to xxx.

Since we are a pretty small team, so we can work closely.

I will set up another call early next week to converge on a proposed solution

I don’t remember that any more, and it is in a messy area with many experiments

I will give you my 2 cents on xxxx.

shout/ʃaʊt/ out to xxx: thanks to

sorry I must have missopke(misspeak) / misread!!!

We are setting up a war room chat and waiting for green light for rollout.

This time does not work for me.

Get started in a minute or 2, just give others the opportunity to join the call.

Why don’t we get started? We have a bunch of material to cover and we need to leave time for questions.

  1. Introduce yourself
  2. Articulate High level design and questions
  • feel free to interrupt if you have any questions.
  • let me know if you have questions so far.
  • I will be happy to answer your questions.
  1. If you can provide code fragment/ˈfræɡmənt/, resources
  2. Chat normally
  3. Bye, close the meeting

Something came up at last minute, sorry (to cancel the meeting)

he is running late, let’s wait 5 more minutes have something unplanned/ˌʌn’plænd/…

Any questions before we close (end the meeting).

Describe Problem

owing to/due to the important customer issue we’ve observed in region A and multiple ongoing stabilization efforts, we’ve held back from upgrading to 1.69 in the region.

please create a follow up ticket to track xxx

this pg is stock out(out of stock), another still has nodes in stock

the calls are fairly sparse/spɑːrs/ on xx side.

注意转折的链接,没有系动词, 介词: I think this slipped past regression test because we don’t have a long-standing PC in stg, we should have a PC doesn’t get deleted each regression run that we run expansion/contraction on.

this may have ripple effect in other projects in-progress within SVL.

there are some obscure problems came up.

the question is less about our individual role but more about how it fits into the e2e process and being able to go through the process during the testing as opposed to just seeing the use case in the document.

the playbook will be one time at the time of row creation.

that’s where the role of ops comes in, or bring the responsibility to network team, this is a responsibility shift we should be aware of, this is happening for the very first time.

the pg pool update is more dynamic.

that understanding we could get from Nixon.

I just think we need to go through some cycles of this to be comfortable with it and so far we just reviewed the use cases in the document.

we are open to get involved in the testing wherever you see fit.

some of the files must have setuid bit set in order to function properly.

Jaijeet can take the topic.

Kannu covered a lot we did last week.

(别再说 I don’t know了😂) it’s unknown to me. It’s a complete black box to me. I don’t have answer to this question. I don’t have answer to all of these now. I cannot make any promise now. It’s hard to have a binary answer yes or no.

I am not qualified to do that. Here’s the document to figure it out, can you try it for me?

I ran into the same issue.

I don’t want to clutter/mess up/screw up the cluster…

Is there any workaround for this? Run something without killing my cluster?

let us talk over on this on monday - seems convoluted (费解的).

not sure it’s possible or not

So what happens right now … what happening is that … How come the output file is missing …

Let me know if any issues. What is your opinion Badal?

Elaborate the context, open the git issue if necessary. We need to understand the problem exactly(without discrepancy/dɪsˈkrepənsi/).

I would say don’t do that. we do have this fix. I will take a look at it today.

They didn’t articulate/ɑːrˈtɪkjuleɪt/ what need to test.

I haven’t got time to stay with Deep to discuss…

That’s what I am discussing with you. We don’t have anything to lose.

we are still facing the issue. Something is failing intermittently, hard to figure it out what is the problem.

Network hiccups/'hikʌp/, glitch/ɡlɪtʃ/, transient issue, self-recovered

What am I supposed to conclude?

I take what I said back again

I am still struggleing myself for this issue.

Nothing is working, we need to engage UI team, API team.

I am able to open the defect (ticket/issue) but not able to assign.

It is much easier said than done.

It is a big list but by no means it is a exhausted list

Make sure that it stays a useful tool and doesn’t become a maintenance headache/nightmare.

check out(pull)/in(push) the code (github).

Other things are more or less similar. There are a few notable differences from …

go/read/step through … sth is populated from … 什么被什么填充

analogy /əˈnælədʒi/ to sth. 类比什么 After updates, take the appropriate next steps.

This is an evolving project. As such就这点而论, be sure to consult its GitHub repository to find the latest installation instructions

It is commonly understood as the elapsed time or lag time between a request and its completion.

Let’s give it some soak/soʊk/ time.

I don’t get your question? You get the point!

From the technical perspective,… As oppose to running locally using docker compose, use kubernetes right away so you can spot the problems early.

Sorry for confusion, …

If there is no task with high variety, I can finish it by …

we have some technical/engineering difficulties/complexity.

Progress

Please don’t undo the changes we are trying to do

Based on the meeting today, I’m holding onto this until we’re clear on what we want to do around xxx

I’ve been swamped last couple of days and have fallen a bit behind.

Let me merge those CLs into my client and I am going to rebuild my env.

you CL is rock solid!

I have some tickets that could be spillover to next sprint

there is a slow down as I am oncall this week. we lost some bandwidth due to oncall in last 2 weeks.

The task will be put on me, another task anyone can pick it up.

give me another couple of days we should have an answer.

start this thread to close the external dependency for…

I send message over holiday as it is convient to me, you are not obligated/ˈɑblɪˌɡet/ to respond on a hoilday.

please let us know if there are any concerns or deviation from xxx: the action of departing from an established course or accepted standard

not much movement since last week, let me ping people.

Ack, give me 1 hour please, in the middle of live debugging.

I have some pending items in my plate.

It’s a overkill for me.

I was primarily working on … I would suggest…

I did’t make much progress yesterday.

I am going to mimic what you did … I will continue on the effort to …

we need to do that standalone … I haven’t started, it will not take too long …

This is a ad hoc task. (Ad hoc tasks are work items that can be created unprepared that are not initially part of a modeled process flow)

Anything else on your side?

That’s my status. That’s all in my side.

(什么时候完成?) When do you think you can put these in place? When do you expect to have a draft for review?

targeting/ETA mondya EOD. I am going to close it push/slip it to next sprint

definitely (明确地)

This is our major block. We plan to finish it by 8/28 or schedule need to be extended. Please let me know ASAP as I have to report to …

I am done, can I go/drop? The things left for me are … it’s not urgent.

I will let someone know and ask … so we get immediate attention. more or less the same/similar Is this still something that needs attention? It is unclear from the comments. Please advise.

Also DM(direct message) me the cluster info if this still needs attention.

I need to go back and see all the notes. go over your notes

There may be some potential issues that we cannot see right now. To be complete, let me mention …

XX seems to be waiting on some inputs. Do you think a discuss with XX will help?

I am not sure what do we need to proceed here.

Let’s put debug details aside (放一边) and proceed with … For now we leave the … outside until we figure it out.

Meanwhile(in parallel) 与此同时做什么

Not sure how far did you get with debugging? Need some time, since I am not admin.

I am taking most of my time doing sth… could you please support about the … I cannot find …

I am actively working on x,y and c and I don’t have bandwith to do I have taken over the ownership of … I am not fully aware of the process.

its structure makes absolutely no sense to a new user.

it has been a while since I checked it… 做了什么事有一段时间了… = it’s been a while

since then, … it is on the roadmap: it is part of a planned strategy or schedule for development or implementation.

Office Talk

Hi team, just a heads-up, I will be OOO from sept 8th to 24th, inclusive. Please DM if anything urgent meanwhile. thanks

We are also committed to avoid unnecessary bureaucracy as we adopt AI.

It seems like an onerous/ˈōnərəs/ requirement.

dogfooding our new product will help us understand the problems.

I believe this is a long overdue but huge thanks for helping me!

We kept crossing paths today.

I am extremely skeptical, typically the last person to adopt new technology: the last adopter

where are your projection coming from? (an estimate or forecast of a future situation or trend)

waypoint: a stopping place on a journey

ramifications of imminent/ˈɪmɪnənt/ improvements of AI developments: a consequence of an action or event, especially when complex or unwelcome

bespoke pattern: made for a particular customer or user

ROI(return of investment) is uneven when you use AI tools

The inflection point of AI development passed 拐点

we need to be upskilling: 提升技能

We will phase out the old framework: withdraw sth gradually

You will get the hang/hæŋ/ of it 掌握窍门

The operation fails for several reasons, not least because of the change without unittest: particularly.

I am not familiar the system so I am afraid the change might have unforeseen repercussions/ˌriːpərˈkʌʃn/: negative consequences.

This way, we can have one spot for all the ops operations: single location or position

we can still get by and have a acceptable performance: survive with minimal resources, pass something.

your hunch/hʌntʃ/ was right: a feeling or guess based on intuition rather than known facts

it is after/off hours

he will be on leave next week

Sorry to bug you

Do you go by xxx: is used when asking someone if they prefer to be called or known by a particular name.

examine/outline sth. iterating over time 如此往复… sth is discernable/di’sə:nəbl/ in … 有迹可循

It seems we have different readings/understanding on … I will sync up/ set with you later… We are on the same page now. Hopefully we can walk through the doc and fill in the gaps and identify any others.

I’m not familiar with this… I’ll have a look later today and will let you know if I figure out how it works.

强硬催人: Sorry for the push. When you have some time - please do check and let us know. Please let know. Any luck with xxx?

I just stop by. can you come over?

Hi chengdol, can you come to my office for a sec?

My bad for not being clear, I was specifically asking about Apologize for the long delay…

One quick question, …

if you have bandwidth you can start to work on …

Can you give me a quick overview about …

Distracted by the conversations/oncalls … will find comfort room to dail in

Thanks, this is a big call. we have a hard stop at 11:00 AM (for meeting)

you raise/give a bunch of information…

can I drop off(挂断)or I need to stay here?

Is the system back on line? Getting system up today.

users are allowed to elevate their privileges to root. we don’t want to escalate this issue. we will raise this question to XX team and get approval for not using it.

I get lost at this part… I feel a little bit abstract on … please don’t deviate my question. I don’t know whether … will fit in.

Make sense? He can leverage me when thing is not going

Nevermind I got what I needed (问了问题别人还没回答自己找到答案了)

who may know better on this.

(别人说了sorry什么的) That’s OK.

How can I get plugged/plʌɡ/ into the project/team/community?

Get educated ourselves This is a necessity

Use the navigation/ˌnævɪˈɡeɪʃn/ on the left to … 左边的菜单栏

Just in case the above link vanishes some day, I am capturing the main piece of the solution below.

Sorry if it’s off topic, …

A rather unusual situation, …

kubernetes and docker will part ways (分道扬镳)

off hours, in-hours, work time. 描述工作,非工作时间.

Issue

Were you able to get around the issue?

I don’t have a sense of xxx

They don’t know what they don’t know.

We got this feature request which I believe that falls in the charter of your team. I have done some initial digging in and added my notes to the ticket.

it sounds very unsatisfactory

The test is worthwhile

This gives me more exposure to understand the project

Presumably(据推测) it was caused by xxx

I did this inadvertently(without intention)

are you able to do a oncall handoff call?

I am not able to get into the technical side.

any issue seen in other regions because of the same trigger?

we have a path to recover the inconsistent node state.

I will disengage myself from this issue.

It is just a lab, so security is not that paramount/ˈpærəmaʊnt/ here.

I will spend my part time on it

we can follow up with you on that.

could you please suggest steps to remediate?

please advise?

It might have slipped/slɪp/ from my TODO

We will not be able to immediately pick this up, I will add it to our backlog/ hotlist.

I wonder what the implications might be for us?

This issue doesn’t appear to be related to xxx, but an underlying xxx issue with networking…

We should investigate why the network performance is so bad.

This likely also explains some of the poor performance seen for the NFS mounts.

I’m going to remove my name from this one as the software appears(seems) to be functioning as designed, but is limited by the underlying infrastructure.

If additional information comes up and anything else you want me to look at let me know.

Per previous discussion, assigning this to you for now.

Not mission critical.

xxx incurs/introduces overhead.

Provide context and a shared language to describe …

pinpoint the problem.

There is a high likelihood that …

can easily carry out the tasks by their own.

On the flip side, … On the contrary/ˈkɑːntreri/, … in contrast, … Conversely, …

One-off operation 一次性

Countering security breaches, reliable defenses.

没明白对方说的什么: What did you say? Sorry I don’t come from this background. I am not in the context… It is not clear whay you are asking.

Sick

I was “under the weather”(身体不适) yesterday and could not make it to our call. Thanks for capturing this.

AFK for 1 hour.

have some personal errands/ˈerənd/

Departure for doctor appointment, will be off for 4 hours

Discussion/Analysis

Think out loud, (then give your idea)

I haven’t taken too close to it, xx

I try to go through the whole thread but a high level summary of the situation.

of course there are some points you know better than anyone else, and I will definitely ask you when they come up.

I’d also recommend that we don’t start the implementation until the design is approved to avoid throw-away work

what you are suggesting holds good for xxx

This proposal has been discarded.

TBH, I also only have limited visibility on legal side.

will we lose the trust with customer?

do all these works have to proceed sequentially? or can we do the UI work in parallel with the BE(backend)?

the state of the experiment and hence the visibility will be determined per user based on their group membership.

the advantages overweigh the overhead

what is the priority and severity/sə’vɛrəti/ of this customer case?

we inadvertently(unintentionally) patch XXX on this region: or we patch xxx without realizing it.

The proposals help alleviate/əˈliːvieɪt/ the follow pain points.

we have already evaludated xxx with external team and we are in violation.

Relibility is not the only question here, it’s the regional isolation that’s more important.

we have decided already that this is must be done, so lets not get into the question of whether or not it’s required, the how part is what we have to figure out.

thanks, that answer’s the first question.

They are in addition to above.

We can have a whiteboard session(the idea flies) to discuss it…

is it because the technical difficulty/engineering complexity or other things?

it is not just possible, it is happening.

the PM has resistance against the proposal because …

it is a long/medium term plan, it does not have funding yet

stable, reliable, sustainable, scalable, available, maintainable robust, resilient/rɪˈzɪliənt/

decision criteria: testability, maintainability, scalability, backward compatibility resuability, level of effort required

I am not sure if it is worth the trouble.

it is not foolproof but we should reduce the proliferation/prəu-,lifə’reiʃən/ of PII in notes and the tainted/'teɪntɪd/ data make it harder to manage compliance.

I think it’s a fair point.

I will check once on xxx side.

I will create some dummy/'dʌmi/ data for testing as per my scenario.

The steps to set available nodes as zero at the beginning of region bring up and opening it up as part of go-live tasks are missing in the checklist.

This should go whoever is responsible for xxx.

Defer to xxx (answer to a question or decision)

Add this to postmortem, please do conduct postmortem and analysis at your earliest convenience.

let me structure this a little bit and talk to you.

I will catch you up later. (fill the gap)

I will try to get that out(succeed in uttering, publishing, or releasing something) by tomorrow, if not by Monday for sure.

human-in-the-loop model/pattern (need human intervention)

Irrespective on the engineering side we are working on a plan to evaluate the impact based on both Iperf and xxx

The first option seems a non-starter, since we cannot get blank access to all clouds.

The transition to support and ops is complete and they are onboarded

We can leverage other teams’ achievements to speed up the process. = We can take advantage of other teams’ achievements to speed up the process.

We have different reading on …let’s have xxx to explain on it

let’s put one step back

Here is a guess/suspect/speculation/ˈspekjuleɪt/, it remains skeptical/ˈskeptɪkl/

Looks like we once again run into the … issue caused by …

The theory didn’t hold.

… is not designed in that way, I suggest to read to recap what is the purpose of … and how does it interplay/interact with …

IMO,IMHO … The doc is based on incorrect assumption, specifically, see … Consequently, I don’t believe … it is a most plausible alternative solving the issue of …

How to push back request effectively

Requires a delicate/ˈdelɪkət/ balance of assertiveness, diplomacy, and providing well-reasoned arguments.

  1. Understand the Request: Make sure you fully understand the PM’s request and the reasons/values behind it. Ask clarifying questions to get a clear picture of the requirements and the expected outcomes.
  • Could you clarify the main objectives and expected outcomes for this feature? I want to make sure we’re on the same page.

  • How often does that happen? (if it is low frequency and has manual fix, then it does not have high priority)

  • What’s current workaround or solution.

  1. Gather Data and Facts: Before the meeting, gather data, metrics, and facts to support your position. This could include technical constraints, resource limitations, potential risks, or user feedback.
  • “cannot provide any promise or funding this now, get back to you later after gathering more metrics, data to see if it worth the effort and how much effort is needed.”
  • “I’ve reviewed the performance metrics, and our current infrastructure may not support the scalability requirements of this feature.”
  1. Acknowledge the Request: Start by acknowledging the PM’s request and showing appreciation for their input. This sets a positive tone for the discussion.
  • “I understand that this feature could be beneficial for our users and aligns with our product goals.”
  • “I appreciate your suggestion to add this feature, as it could potentially enhance user engagement.”
  1. Express Concerns Diplomatically: Clearly and respectfully express your concerns about the request, focusing on the potential challenges, risks, or impacts on the team and project.
  • “However, I have some concerns about the technical complexity of implementing this feature within our current timeline.”
  • "However, given our current workload and the technical challenges involved, I’m concerned that we may not be able to deliver this feature within the proposed timeline.
  1. Provide Alternatives: Offer alternative solutions or compromises that address the PM’s needs while accommodating the team’s constraints or concerns.
  • “Instead of implementing the feature as initially proposed, what if we consider a phased approach or explore alternative solutions that could achieve similar outcomes more efficiently?”
  • “Instead of a full-fledged feature, what if we start with a simpler version or a prototype to test the concept and gather user feedback?”
  • “I am also interested in it, but I more prefer to have it until we have clear understanding the longer term plan, otherwise the effort might not be aligned with the future design, we can keep it as what it is for now.”
  1. Seek Collaboration: Encourage collaboration and open dialogue by inviting the PM and other stakeholders to discuss and brainstorm solutions together.
  • “follow up meeting with more stakeholders.”
  • “I believe that by working together, we can find a solution that meets our objectives while considering the team’s capacity and the project’s constraints.”
  • “I can spend partial time on it: 30%”
  1. Be Open to Feedback: Be receptive to feedback and be prepared to adjust your position based on new information or perspectives presented during the discussion.
  • “I’m open to feedback and would appreciate your insights on how we can best address this feature request while considering our current priorities and constraints.”

Testing

  • From the ground up to support a continuous integration-based methodology.

  • This new approach allows for iterative integration and testing

  • We’ve significantly reduced turnaround time(from start to finish) and operational toil while improving predictability.

  • scale, stress and performance testing is in-progress.

  • we will be starting complete tear down of testing env, the below ones will be dismantled.

Launch

This feature has been top of mind for customers/account team, xxx

This cross-functional project brought many teams together in delivering this impactful feature, a big thank you to xx team, partner team and leadership for their contributions in makeing this accomplishment in reality.

If we miss any team/person, please drop us a note or add them in the thread.

Leaving

For other people

I am grateful that when we could interact, although it was rare

was very nice working with you, all the best to your next stage!

People will miss you!

please carry on the good work!

whom can I reach to now, I mean your replacement.

We will be having farewell lunch for xx at xx on Monday, Febr 3rd. Let’s meet and wish him all the best in his new role and Thank him for all the GREAT work he did for our team. Please RSVP by Thursday.

Thanks for all the work you did… I will surely miss your expertise.

Myself

Resignation letter to manager Dear XXX,

Please accept this message as notification that I am going to leaving my position and my last day will be July 24, two weeks from today.

I have enjoyed my time at XXX and will miss working with you and the team. I’m proud of the work we’ve done. Thank you for your professional guidance and support, I wish you and XXX the best success in the future.

Please let me know what to expect as far as my final work schedule, accrued vacation leave, and my employee benefits.

I’m also happy to help assist in training my replacement during the transition and make it as smooth as possible.

Sincerely,

I was previously working at XXX in San Jose as a software engineer, I’m particularly interested in Linux, Cloud Computing, especially the container orchestration. In my free time I enjoy hiking, playing piano, and writing on my blog.

Resignation letter to peers Dear colleagues and friends, this Friday, Jan 24th will be my last day at xx, but this Is not a note to say good bye. This is a note to say thank you.

I am so thankful for my time here at xxx and for all of the wonderful people who have I have had the privilege to work with.

I have learned so much and been inspired by many of you over the years. I am so grateful for all the gifts the people of xx have given me and I can only hope in some way I have inspired some of you as well.

I take my leave from xx because of a unique opportunity to leverage all I have learned, and build some really cool xx solutions right here in my beloved Chicago. Leaving IBM and all the people I love here was the hard part, even with the exciting opportunity ahead of me.

from Book

This book is a broad overview of living on the Linux command line

Another goal is to acquaint/əˈkweɪnt/ you with the Unix way of thinking

This book is divided into four parts

ordinary tasks that are commonly performed from the command line working directory = current directory You will thank yourself later reveal/rɪˈviːl/ more details examine text files (这个词要用起来) here are some all-time favorites: …

As we gain Linux experience,… Without a proper understanding of expansion, the shell will always be a source of mystery and confusion.

… does the trick.

How to go far

  1. Early draft
  2. Comments and discussion about content, presentation
  3. synthesizing many ideas
  4. respectful discussion and disagreement
  5. lose the ego
  6. learn to take criticism
  7. learn from mistakes

How to feedback

  1. Show the intent(do you have a few minutes)
  2. Show the data
  3. Show the impact
  4. Leave room for questions
  5. Ask for feedback proactively (peers, manager, team)

When I was working on non-root set up worker containers or pods, in order to grant the non-root user su password-less privilege, I got into PAM module in RHEL. Let’s spend time to understand it.

Pluggable authentication modules (PAMs) are a common framework for authentication and authorization.

There are many programs on your system that use PAM modules like su, passwd, ssh and login and other services. PAM main focus is to authenticate your users.

PAM or Pluggable Authentication Modules are the management layer that sits between Linux applications and the Linux native authentication system.

For full details, please refer: USING PLUGGABLE AUTHENTICATION MODULES (PAM) CHINESE VERSION

Each PAM-aware application or service has a file in the /etc/pam.d/ directory. Each file in this directory has the same name as the service to which it controls access. For example, the login program defines its service name as login and installs the /etc/pam.d/login PAM configuration file.

What I have done is add a non-root user, for example demo, to wheel group (usually wheel group is pre-existing), operate this as root user:

1
usermod -a -G wheel demo

Why wheel group? What is wheel stand for in computing? In computing, the term wheel refers to a user account with a wheel bit, a system setting that provides additional special system privileges that empower a user to execute restricted commands that ordinary user accounts cannot access.

Modern Unix systems generally use user groups as a security protocol to control access privileges. The wheel group is a special user group used on some Unix systems to control access to the su or sudo command, which allows a user to masquerade as another user (usually the super user).

By default it permits root access to the system if the applicant user is a member of the wheel group.

Check demo group information:

1
2
3
id demo

uid=1010(demo) gid=1010(demo) groups=1010(demo),10(wheel)

You can also go to /etc/group file to check group members of wheel:

1
wheel:x:10:demo

Then go to edit /etc/pam.d/su file to uncomment this directive:

1
2
# Uncomment the following line to implicitly trust users in the "wheel" group.
auth sufficient pam_wheel.so trust use_uid

What is pam_wheel.so? The pam_wheel PAM module is used to enforce the so-called wheel group. By default it permits root access(su - ) to the system if the applicant user is a member of the wheel group. If no group with this name exist, the module is using the group with the group-ID 0.

Now the user demo can su to other users (include root) without password. You can run command as another user:

1
su - <another> -c "<command>"

If you also want to have sudo password-less privilege for wheel group user, you need to edit /etc/sudoers file by visudo as root like this:

1
2
3
4
5
## Allows people in group wheel to run all commands
#%wheel ALL=(ALL) ALL

## Same thing without a password
%wheel ALL=(ALL) NOPASSWD: ALL

PAM file format

Each PAM configuration file, such as /etc/pam.d/su contains a group of directives that define the module (the authentication configuration area) and any controls or arguments with it.

The directives all have a simple syntax that identifies the module purpose (interface) and the configuration settings for the module.

1
module_interface	control_flag	module_name module_arguments
  • module_name — such as pam_wheel.so
  • auth — This module interface authenticates users. For example, it requests and verifies the validity of a password. Modules with this interface can also set credentials, such as group memberships.

All PAM modules generate a success or failure result when called. Control flags tell PAM what to do with the result. Modules can be listed (stacked) in a particular order, and the control flags determine how important the success or failure of a particular module is to the overall goal of authenticating the user to the service.

  • sufficient — The module result is ignored if it fails. However, if the result of a module flagged sufficient is successful and no previous modules flagged required have failed, then no other results are required and the user is authenticated to the service.

Let’s see an example:

1
2
3
4
5
6
[root@MyServer ~]# cat /etc/pam.d/setup

auth sufficient pam_rootok.so
auth include system-auth
account required pam_permit.so
session required pam_permit.so

Here the modules are stacking, from up to bottom, verify each directive one by one,

  • auth sufficient pam_rootok.so — This line uses the pam_rootok.so module to check whether the current user is root, by verifying that their UID is 0. If this test succeeds, no other modules are consulted and the command is executed. If this test fails, the next module is consulted.

  • auth include system-auth — This line includes the content of the /etc/pam.d/system-auth module and processes this content for authentication.

PAM uses arguments to pass information to a pluggable module during authentication for some modules.

  • trust: The pam_wheel module will return PAM_SUCCESS instead of PAM_IGNORE if the user is a member of the wheel group

  • use_uid: The check for wheel membership will be done against the current uid instead of the original one

When deploying DS, I find the compute pod that assigned to the second node is hanging in CreateContainer status. I SSH into that node and find node memory is occupied heavily by some other processes so the command response is extremely slow(其实ssh laggy是因为%CPU的原因,ssh需要和其他进程竞争), and the %CPU is also high with the swapping daemon(由于当时没记录,我猜测可能是[kswapd] daemon, 并且 [kswapd] 并不是仅仅针对swap工作,也会做释放缓存的操作,结合%CPU被其占用,估计是在持续做释放缓存的操作,但失败了).

Some informative readings:

The memory usage on bad node:

1
2
3
4
free
total used free shared buff/cache available
Mem: 8168772 105152 295732 7491216 7767888 273448
Swap: 0 0 0

See the available size is too low, Comparing with the good node:

1
2
3
4
free
total used free shared buff/cache available
Mem: 8168772 123504 7041388 270836 1003880 7424612
Swap: 0 0 0

You see the shared(Memory used mostly by tmpfs) and buff/cache parts are huge on bad node, I need to flush and clean it. (当时还未理解这些column的具体含义, 特别是shared, buff/cache, available). 还应该用memory leak相关分析工具去查看哪个调用栈导致了内存紧张 or memleak).

马后炮, 应该去调查为什么这个node shared, buff/cache size 异常的高,以及为什么[kswapd]操作失败,原因可能是tmpfs(shared)仍然在被使用,需要调查. 后面介绍了如何释放cache by writing to drop_caches,但是我记得当时作用不大,this second comment may help: Why can’t I release memory cache by /proc/sys/vm/drop_caches

1
2
3
4
# -t: type
df -t tmpfs --total -h
# check one of the tmpfs mount status
lsof -nP +L1 /dev/shm | grep DEL

The general solution to release cache intentionally, this post is good to reference. If you have to clear the disk cache, this command is safest in enterprise and production, will clear the PageCache only:

1
2
# modify kernel behavior by proc file
sync; echo 1 > /proc/sys/vm/drop_caches

What is sync command: flush any data buffered in memory out to disk.

More aggressively, Clear dentries and inodes:

1
sync; echo 2 > /proc/sys/vm/drop_caches 

Clear PageCache, dentries and inodes:

1
2
3
# 清理文件页、目录项、Inodes等各种缓存
# do all above
sync; echo 3 > /proc/sys/vm/drop_caches

It is not recommended to use this in production until you know what you are doing, as it will clear PageCache, dentries and inodes. Because just after your run drop_caches, your server will get busy re-populating memory with inodes and dentries, original Kernel documentation recommends not to run this command outside of a testing or debugging environment. But what if you are a home user or your server is getting too busy and almost filling up it’s memory. You need to be able trade the benefits with the risk.

  • what is dirty cache? Dirty Cache refers to data which has not yet been committed to the database (or disk), and is currently held in computer memory. In short, the new/old data is available in Memory and it is different to what you have in database/disk.

  • what is clean cache? Clean cache refers to data which has been committed to database (or disk) and is currently held in computer memory. This is what we desire where everything is in sync.

  • what is dentries and inodes? A filesystem is represented in memory using dentries and inodes. Inodes are the objects that represent the underlying files (and also directories). A dentry is an object with a string name (d_name), a pointer to an inode (d_inode), and a pointer to the parent dentry (d_parent)

  • what is drop_caches? Writing to this will cause the kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free. It will not kill any process.

BTW, if you want to clean swap space (actually here we don’t have swap enabled):

1
swapoff -a && swapon -a
0%