Local Persistent Volume

Introduction

Kubernetes 1.14: Local Persistent Volumes GA Recap: A local persistent volume represents a local disk directly-attached to a single Kubernetes Node. With the Local Persistent Volume plugin, Kubernetes workloads can now consume high performance local storage using the same volume APIs that app developers have become accustomed to.

一个和hostPath的重要区别: The biggest difference is that the Kubernetes scheduler understands which node a Local Persistent Volume belongs to. With HostPath volumes, a pod referencing a HostPath volume may be moved by the scheduler to a different node resulting in data loss. But with Local Persistent Volumes, the Kubernetes scheduler ensures that a pod using a Local Persistent Volume is always scheduled to the same node.

While HostPath volumes may be referenced via a Persistent Volume Claim (PVC) or directly inline in a pod definition, Local Persistent Volumes can only be referenced via a PVC. This provides additional security benefits since Persistent Volume objects are managed by the administrator, preventing Pods from being able to access any path on the host.

Additional benefits include support for formatting of block devices during mount, and volume ownership using fsGroup.

注意: 实际上emptyDir + fsGroup也可以实现类似hostPath的效果,emptyDir用的是/sysroot (RedHat Linux), 比如多个pods 使用emptyDir在同一个Node, 我在各自的emptyDir中touch了一个file: compute-0 和compute-3, 进入Node使用find command就可以看到了:

1
2
3
/sysroot/ostree/deploy/rhcos/var/lib/kubelet/pods/68e65ed4-4e62-4588-9269-8947dea9dd46/volumes/kubernetes.io~empty-dir/compute-dedicated-scratch/compute-0

/sysroot/ostree/deploy/rhcos/var/lib/kubelet/pods/92b26b37-92f3-4609-83cb-da8cb8727ca2/volumes/kubernetes.io~empty-dir/compute-dedicated-scratch/compute-3

还需要注意的是,local storage provisioning 在每个node上只会provision attach的disk个数一样的PV,并且这个PV会被一个PVC占据,尽管PV大小是500G但是PVC只请求5G。(不知道这个以后是否会有改进)

Steps

Only test on OCP 4.3 version

OpenShift persistent storage using local volumes Deploy local-storage Operator

  1. install local storage operator (it is by default set in local-storage namespace)
  2. provision the local storage
  3. create local volume persistentVolumeClaim and attach to pod

After deploy the operator, then

1
2
3
## get hostname of each worker node
## can use label -l to filter worker node if needed
kbc describe node | grep hostname
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks"
namespace: "local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
## put hostname above here
- worker0.jc-portworx.os.xx.com
- worker1.jc-portworx.os.xx.com
- worker2.jc-portworx.os.xx.com
storageClassDevices:
- storageClassName: "local-sc"
volumeMode: Filesystem
## The file system that will be formatted when the local volume is mounted
fsType: xfs
devicePaths:
## use blkid command to get this devicePath
- /dev/vdc

For example:

1
2
3
4
blkid
## for clarity I remove unrelated output
## First colume is the device path
/dev/vdc: LABEL="mdvol" UUID="fdac344f-8d5f-48bd-9101-99cb416bb93d" TYPE="xfs"

let’s check /dev/vdc

1
2
3
4
lsblk /dev/vdc

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdc 252:32 0 500G 0 disk

After apply the CR LocalVolume, let’s check local-storage namespace status, you should see lcoal diskmaker and provisioner pods are up and running, corresponding PVs are ready as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
NAME                                          READY   STATUS    RESTARTS   AGE
pod/local-disks-local-diskmaker-6787r 1/1 Running 0 52m
pod/local-disks-local-diskmaker-jvwnq 1/1 Running 0 52m
pod/local-disks-local-diskmaker-lfzq9 1/1 Running 0 52m
pod/local-disks-local-provisioner-fzgs2 1/1 Running 0 52m
pod/local-disks-local-provisioner-mqd86 1/1 Running 0 52m
pod/local-disks-local-provisioner-t2bvz 1/1 Running 0 52m
pod/local-storage-operator-7f8dbfb95c-7brlv 1/1 Running 0 16h

## PV
local-pv-38162728 500Gi RWO Delete Available local-sc 7m45s
local-pv-64bcf276 500Gi RWO Delete Available local-sc 7m45s
local-pv-bd2d227 500Gi RWO Delete Available

If things are all set, we can consume the local storage provisioned by local-sc. Here I use volumeClaimTemplates instead of create separate PVC (这里应该不能使用分开的PVC,因为PVC的创建和pod位于的node有关,事先并不知道).

Notice that if there is one PV per node, then one PVC will consume the whole PV. So if use statefulset with volume claim template, we will only have one pod per node.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: test-local-sc
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 2
securityContext: {}
serviceAccount: wkc-iis-sa
serviceAccountName: wkc-iis-sa
containers:
- name: nginx
image: xxx.swg.com/compute-image:b994-11_7_1_1-b191
securityContext:
allowPrivilegeEscalation: true
privileged: false
readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 10032
command: ['/bin/sh', '-c', 'tail -f /dev/null']
volumeMounts:
- name: my-scratch
mountPath: /opt/xx/Scratch2
volumeClaimTemplates:
- metadata:
name: my-scratch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-sc
resources:
requests:
storage: 5Gi

Now let’s check /dev/vdc again by lsblk, you will see it is associated with the pod.

0%