Linux Storage System

[x] ordinary file can also be formatted and mounted [x] terraform mount disk /dev/sdb, why this name? [x] do experiment, using vagrant mount extra disk [x] blkid, lsblk 使用场景 [x] fstab mount 设置

Vagrant demo please see vagrant-storage. This is for VirtualBox provider, at the time of writting this is a experimental feature.

After launch the vagrant machine:

1
2
3
4
# up
VAGRANT_EXPERIMENTAL="disks" vagrant up
# down
vagrant destroy -f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# check block device, their name, type, size and mount point
# sdb and sdc are the additional disks added in Vagrantfile
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 31G 0 part
├─centos_centos7-root 253:0 0 29G 0 lvm /
└─centos_centos7-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 2G 0 disk
sdc 8:32 0 2G 0 disk

# /dev/sda is not used up, let's add one more partition of 3GB
fdisk /dev/sda
Command (m for help): n
Partition type:
# we have sda1 sda2 2 primart already
p primary (2 primary, 0 extended, 2 free)
e extended
Select (default p):
Using default response p
Partition number (3,4, default 3):
First sector (67108864-83886079, default 67108864):
Using default value 67108864
Last sector, +sectors or +size{K,M,G} (67108864-83886079, default 83886079): +3GB
Partition 3 of type Linux and of size 2.8 GiB is set

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

# re-read partition table for /dev/sda
partprobe -s /dev/sda

# now see lsblk, sda3
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 31G 0 part
│ ├─centos_centos7-root 253:0 0 29G 0 lvm /
│ └─centos_centos7-swap 253:1 0 2G 0 lvm [SWAP]
└─sda3 8:3 0 2.8G 0 part
sdb 8:16 0 2G 0 disk
sdc 8:32 0 2G 0 disk

# format sda3
mkfs -t ext4 /dev/sda3

# see type
blkid | grep sda3
/dev/sda3: UUID="7d4a365c-1639-41de-a7c7-ebe79ea2830c" TYPE="ext4"

# mount to /data3 folder
mkdir /data3
mount /dev/sda3 /data3
mount | grep sda3
# check lsblk
# now see lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 31G 0 part
│ ├─centos_centos7-root 253:0 0 29G 0 lvm /
│ └─centos_centos7-swap 253:1 0 2G 0 lvm [SWAP]
└─sda3 8:3 0 2.8G 0 part /data3
sdb 8:16 0 2G 0 disk
sdc 8:32 0 2G 0 disk

# check space and inode usage
cd /data3
df -k .
df -i .

If you copy a big file to /data3, its use% may not change because flush does not happen, run sync to flush file system buffer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# format whole disk sdb without partition
# use btrfs and mount with compress
mkfs -t btrfs /dev/sdb

mkdir /datab
mount /dev/sdb -o compress /datab

mount | grep sdb
/dev/sdb on /datab type btrfs (rw,relatime,seclabel,compress=zlib,space_cache,subvolid=5,subvol=/)

# check lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 31G 0 part
│ ├─centos_centos7-root 253:0 0 29G 0 lvm /
│ └─centos_centos7-swap 253:1 0 2G 0 lvm [SWAP]
└─sda3 8:3 0 2.8G 0 part /data3
sdb 8:16 0 2G 0 disk /datab
sdc 8:32 0 2G 0 disk

For creating logical volumes, please see LVM section. For creating loop device, please see Loop Device section.

From Linkedin Learn Linux: Storage Systems.

首先需要理解加入一个新的disk到系统之后,发生了什么,以及需要做什么工作才能使用这个新加入的disk。可以参考一下这篇文章. 主要用到了fdisk(partition), mkfs(make filesystem), mount or fstab(make accessable)

新加入的disk 的device file 如何命名的, Name conventions of device file, see here, 比如/dev/sda, /dev/sdb, etc.

总的来看,可以用这样的顺序去观察block storage system:

  • lsblk, blkid 查看block device的大致状态,filesystem, disk, partition, mount point等
  • /etc/fstab 查看是否persistent 以及 mount option 是否合适
  • mount 查看一下defaults mount option 具体内容是什么
  • df -k/-i 查看mount point space使用情况
  • dd 测试I/O performance (或者结合iperf3如果是NFS 之类的分布式存储)

Partition

lsblk and blkid are used to identify block storages (linux also has character device). 有一点要注意,使用lsblk的时候,比如:

1
2
3
4
5
6
NAME                    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 32G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 31G 0 part
├─centos_centos7-root 253:0 0 29G 0 lvm /
└─centos_centos7-swap 253:1 0 2G 0 lvm [SWAP]

区分TYPE的不同类型(也可以从缩进结构看出), MOUNTPOINT中有值的部分才会在mount command中显示。在/etc/fstab中也会看到对应的entries.

Check storage partition:

1
2
3
4
5
6
7
8
9
10
11
# list partitions and see type of a disk
# `p` to see part
# `n` to add new part
# `d` to delete part
# `w` to confirm
# `q` to quit
# /dev/sda is disk type
fdisk /dev/sda

# or
ls /dev/sda*

Formating

Formatting partition or disk is to make a filesystem on it.

1
2
3
4
5
# format as ext4 type to /dev/sdb1 partition
# -t: type
# -f: force reformat
mkfs -t ext4 -f /dev/sdb1
mkfs -t xfs -f /dev/sdb1

Mounting

Mounting is associating a filesystem with a directory, mornally we would mount an empty directory, otherwise the content in existing directory will be hidden.

1
2
3
4
5
6
7
8
9
mount /dev/sdb1 /data

# non-disk filesystem, system will do special mounts for you
# -t: type
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t debugfs debugfs /sys/kernel/debug
# NFS
mount server:dir /nfs-data

The persistent mounts are in /etc/fstab file, If you type mount command on shell, you will see several mount points, like above proc, sysfs, debugfs. These are special mounts not in /etc/fstab, systemd mounts them on boot automatically.

For the mount options specific to file system type, see man page fstab and mount.

You can unmount by umount command, filesystem cannot be unmounted while in use, for example files are open, process has dir in it. Can check by lsof command.

1
umount /data

Filesystem Types

Commonly used ones: ext2/3/4, xfs, btrfs. they have different properties.

1
2
3
4
man 5 filesystemd
man 5 ext4
man 5 xfs
man 5 btrfs

Note that sometime you cannot create file because of no space left on device, but when you check df -k ., it is not full, check df -i . may help, you may use up the inode capacity even if there are lots of space remain.

LVM

logical volume manager, a layer above physical partitions that allows multiple partitions to appear as one. Provides for growing a filesystem by adding physical space, can stripe and mirror.

There are 3 levels of abstraction, one onto another:

  • physical volumes: disk or disk partitions
  • volume groups: collections of physical volumes
  • logical volumes: partitioning up a volume group

Assume we have new disk partitions /dev/sdc1, /dev/sdc2, /dev/sdc3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
lvm
# show all physical volume
> pvs -a
# find lv on physical volume
> pvck /dev/sda2
# show logical volume group
> vgs

# create physical volume
pvcreate /dev/sdc1
pvcreate /dev/sdc2
pvcreate /dev/sdc3

# list physical volume
# you can see volume group name
pvscan

# display detail
pvdisplay /dev/sdc1

Let’s create volume group:

1
2
3
4
5
6
7
8
9
10
11
12
13
# -c n: cluster no
# vg1: group name
vgcreate -c n vg1 /dev/sdc1 /dev/sdc2
pvscan
# add /dev/sdc3 to group
vgextend vg1 /dev/sdc3
pvscan
# remove unused pv
vgreduce vg1 /dev/sdc3
pvscan
# remove group
vgremove vg1
pvscan

Then create logical volume:

1
2
3
4
5
6
7
# create a logical volume 600M from group vg1
# -L: size
# -n: lv name
lvcreate -L 600M -n apple vg1
# format and mount it
mkfs -t ext4 /dev/vg1/apple
mkdir /apple && mount /dev/vg1/apple /apple
1
2
3
4
5
6
7
// apple will across 2 physocal volume
sdc 8:32 0 2G 0 disk
├─sdc1 8:33 0 500M 0 part
│ └─vg1-apple 253:2 0 600M 0 lvm /apple
├─sdc2 8:34 0 500M 0 part
│ └─vg1-apple 253:2 0 600M 0 lvm /apple
└─sdc3 8:35 0 500M 0 part
1
2
3
4
5
6
7
8
9
10
11
12
13
14
umount /apple
# extend
# 注意不能超过所在group的总大小
# it will also extend filesystem automatically
lvextend -L +200M -r /dev/vg1/apple

# shrink, first resuze filesystem
# then lv
fsadm -e resize /dev/vg1/apple 300M
lvresize --size 300M /dev/vg1/apple

# remove and reduce for logical volume
lvremove
lvreduce

现在就明白了,在Vagrantfile demo中,最开始已经有了2个logical volume: root and swap:

1
2
3
4
5
6
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 31G 0 part
│ ├─centos_centos7-root 253:0 0 29G 0 lvm /
│ └─centos_centos7-swap 253:1 0 2G 0 lvm [SWAP]

可以查看这2个lv 的信息, 这表示/dev/sda2是一个physical volume,centos_centos7 是volume group name. 对于logical volume的每一个抽象层,都有对应的command:

1
2
3
4
5
6
# show physical volume info
pvdisplay
# show lv info
lvdisplay
# show volume group info
vgdisplay

Swapping

Swapping 对于很多应用和服务都不太好,影响性能,所以注意是否需要关闭。 Usually swapping is set in /etc/fstab file, for example, in Vagrant machine:

1
2
3
// as mentioned, /dev/mapper/centos_centos7-swap
// is a logical volume
/dev/mapper/centos_centos7-swap swap swap defaults 0 0

To disable it, just comment out and run swapoff /dev/mapper/centos_centos7-swap or swapoff -a

A partition or file can be configured as swap space:

1
2
3
4
5
6
7
8
# a partition
mkswap /dev/sdc2
# on and off, not persistent
swapon /dev/sdc2
swapoff /dev/sdc2

# check swap components
swapon -s

Uses a file as swap space (loop device也是类似的情况)

1
2
3
4
5
6
7
8
9
10
# generate 1G file
# or using truncate or fallocate
dd if=/dev/zero of=/tmp/myswap bs=1G count=1
chown root:root /tmp/myswap
chmod 0600 /tmp/myswap
# enable swap
mkswap /tmp/myswap
swapon /tmp/myswap
# off
swapoff /tmp/myswap

Then if you check free -h, swap space gets extended 1G.

Loop Device

这里不得不说到Pseudo-devices, 也就是常用的/dev/null, /dev/zero, /dev/full, /dev/random, etc. 它们都是character based devices.

What is loop device wiki, is a pseudo-device that makes a file accessible as block device. files of this kind are often used for CD, ISO images, etc. So after mounting them you can access the content.

In my blog <<What's New in CentOS 8>>, I have mentioned the commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# shrink or extend file to specific size
# much faster than 'dd' command
truncate -s 1g /tmp/loop.img
# you can make partition on it or skip this step
# for example, make 2 partitions 200M and 400M each
fdisk /tmp/loop.img

# create loopback device from a file, will associate file with /dev/loop0
# -f: find unused loop device
# -P: force kernel to scan partition table on newly create loop device
losetup -fP /tmp/loop.img
# now from lsblk, you will see the partitions under /dev/loop0

# format
mkfs -t ext4 /dev/loop0p1
mkfs -t xfs /dev/loop0p2

# mount it
mkdir /loop{0,1}
mount /dev/loop0p1 /loop0
mount /dev/loop0p2 /loop1
mount | grep loop0
# unmount
umount /loop0
umount /loop1

Or without losetup command:

1
2
3
4
5
6
7
8
9
10
11
truncate -s 1g /tmp/loop.img2
mkdir /loop2
# must first format the file
mkfs -t xfs /tmp/loop.img2
# -o: option is loop
# asociate /dev/loop1 to the file
# no need losetup command
mount -o loop /tmp/loop.img2 /loop2

# unmount
umount /loop2

So let’s see

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// see first unused loop device
// losetup -f
/dev/loop2

// losetup -a
/dev/loop0: [64768]:16777282 (/tmp/loop.img)
/dev/loop1: [64768]:16777842 (/tmp/loop.img2)

// lsblk, only show loop part
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
├─loop0p1 259:0 0 200M 0 loop /loop0
└─loop0p2 259:1 0 400M 0 loop /loop1
loop1 7:1 0 1G 0 loop /loop2

// blkid
/dev/loop0p1: UUID="293db313-904b-40a7-9e05-de2ea6f7e12a" TYPE="ext4"
/dev/loop0p2: UUID="2d382bc4-8323-45d1-927b-17bbd1e8880d" TYPE="xfs"
/dev/loop0: PTTYPE="dos"
/dev/loop1: UUID="40282d5f-1d4e-495c-a480-78470237f8e2" TYPE="xfs"

RAID Partitioning

Here is software RAID, combining multiple disks to improve performance and/or reliability, we can have striping, redundancy features, etc.

1
2
3
4
5
6
7
8
9
10
# level1: mirror
# level5: redundancy
mdadm --create --verbose /dev/md/myraid --level=5 --raid-devices=3 /dev/sdd{1,2,3}
mkfs -t ext4 /dev/md/myraid
mkdir /mydir && mount /dev/md/myraid /mydir
# check
lsblk -o name,size,fstype,type
# cancel
umount /mydir
mdadm --stop /dev/md/myraid

SSHFS

Filesystem client based on ssh. Fuse-based in user space, not privileged. Communication securely over SSH. Using standard SSH port (you can specify other ports).

类似于NFS mount, 通过SSH实现,这种方式还是挺方便,比如需要在bastion host中运行程序,可以把在develop host上的code repo 同步挂在到bastion中,就不用scp了.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# on client install
sudo apt-get install sshfs
# or
yum install -y fuse-sshfs

mkdir /sshdir
# mount remote root home directory
# the connection may be flakely
sshfs [user]@<hostname or ip>:[dir] /sshdir [options]

# check on /sshdir side host
cd /sshdir && df -h .
mount | grep ssh

# unmount
fusermount -u /sshdir
0%