Kubernetes Operators

这段时间开始研究Operator了,刚好有这本书,计划快速过一遍,recap quick start. 看完了一遍,最深得感受就是,如果K8s是云的操作系统,那么Operator就是一个云应用程序自己的管理工具,也就是书中说的application SRE。

Book accompanying git repo: https://github.com/chengdol/chapters/ (this is forked from origin) 还推荐几本书, from O’reilly:

  • Programming Kubernetes (dive deeper into API)
  • Extending Kubernetes

My K8s operator-sdk demo git repo, step by step guide you setup a go-based operator and deploy in K8s cluster: https://github.com/chengdol/k8s-operator-sdk-demo

其他一些资料收集在了这篇blog中: Kubernetes Operator Learning

Chapter 1 Introduction

Operators grew out of work at CoreOS during 2015 and 2016. User experience with the Operators built there and continuing at Red Hat.

An Operator continues to monitor its application as it runs, and can back up data, recover from failures, and upgrade the application over time, automatically.

An Operator is a custom Kubernetes controller watching a CR type and taking application-specific actions to make reality match the spec in that resource.

Making an Operator means creating a CRD and providing a program that runs in a loop watching CRs of that kind.

The Operator pattern arose in response to infrastructure engineers and developers wanting to extend Kubernetes to provide features specific to their sites and software.

看到这里产生了一个疑问: Helm and Operator.

Chapter 2 Running Operators

这是最基本的operator 演示,一个etcd cluster,有很大的启发价值,注意下面例子中各自的创建顺序: https://github.com/kubernetes-operators-book/chapters/tree/master/ch03

First need cluster-wise privilege:

1
2
3
4
5
6
7
8
9
10
11
12
## need cluster wide privilege
kubectl describe clusterrole cluster-admin

## good
Name: cluster-admin
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
*.* [] [] [*]
[*] [] [*]

Start with etcd as ‘hello world’ example. Deviation:

you’ll deploy the etcd Operator, then have it create an etcd cluster according to your specifications. You will have the Operator recover from failures and perform a version upgrade while the etcd API continues to service read and write requests, showing how an Operator automates the lifecycle of a piece of foundation software.

A CRD is akin to a schema for a CR, defining the CR’s fields and the types of values those fields contain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: etcdclusters.etcd.database.coreos.com
spec:
group: etcd.database.coreos.com
names:
kind: EtcdCluster
listKind: EtcdClusterList
plural: etcdclusters
shortNames:
- etcdclus
- etcd
singular: etcdcluster
scope: Namespaced
version: v1beta2
versions:
- name: v1beta2
served: true
storage: true

The CR’s group, version, and kind together form the fully qualified name of a Kubernetes resource type. That canonical name must be unique across a cluster.

Defining an Operator Service Account:

1
2
3
4
apiVersion: v1
kind: ServiceAccount
metadata:
name: etcd-operator-sa

Defining role:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: etcd-operator-role
rules:
- apiGroups:
- etcd.database.coreos.com
resources:
- etcdclusters
- etcdbackups
- etcdrestores
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- '*'
- apiGroups:
- apps
resources:
- deployments
verbs:
- '*'
- apiGroups:
- ""
resources:
- secrets
verbs:
- get

Defining rolebinding, assigns the role to the service account for the etcd Operator:

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: etcd-operator-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: etcd-operator-role
subjects:
- kind: ServiceAccount
name: etcd-operator-sa
namespace: default

The Operator is a custom controller running in a pod, and it watches the EtcdCluster CR you defined earlier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: apps/v1
kind: Deployment
metadata:
name: etcd-operator
spec:
selector:
matchLabels:
app: etcd-operator
replicas: 1
template:
metadata:
labels:
app: etcd-operator
spec:
containers:
- name: etcd-operator
image: quay.io/coreos/etcd-operator:v0.9.4
command:
- etcd-operator
- --create-crd=false
env:
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
imagePullPolicy: IfNotPresent
serviceAccountName: etcd-operator-sa

Declaring an etcd cluster:

1
2
3
4
5
6
7
apiVersion: etcd.database.coreos.com/v1beta2
kind: EtcdCluster
metadata:
name: example-etcd-cluster
spec:
size: 3
version: 3.1.10

After create CR resource, operator will generate 3 replicas pod (the pod definition is written by operator logic).

This example etcd cluster is a first-class citizen, an EtcdCluster in your cluster’s API. Since it’s an API resource, you can get the etcd cluster spec and status directly from Kubernetes.

1
2
## etcdcluster is a resource just like pod/deploy/sts
kubectl describe etcdcluster example-etcd-cluster

The etcd Operator creates a Kubernetes service in the etcd cluster’s namespace:

1
kubectl get services --selector etcd_cluster=example-etcd-cluster

Run the etcd client on the cluster and use it to connect to the client service and interact with the etcd API.

1
kubectl run --rm -i --tty etcdctl --image quay.io/coreos/etcd --restart=Never -- /bin/sh

From the etcd container’s shell, create and read a key-value pair in etcd with etcdctl’s put and get verbs:

1
2
3
4
5
6
7
export ETCDCTL_API=3
export ETCDCSVC=http://example-etcd-cluster-client:2379
etcdctl --endpoints $ETCDCSVC put foo bar
etcdctl --endpoints $ETCDCSVC get foo

## check etcd cluster general health
etcdctl --endpoints http://example-etcd-cluster-client:2379 cluster-health

You can try to delete etcd pod or upgrade the version (edit cr file then apply) and watching the operator recover the health.

kubectl tricks for upgrade:

1
2
kubectl patch etcdcluster example-etcd-cluster --type='json' \
-p '[{"op": "replace", "path": "/spec/version", "value":3.3.12}]'

Chapter 3 Operators at the Kubernetes Interface

Operators extend two key Kubernetes concepts: resources and controllers. The Kubernetes API includes a mechanism, the CRD, for defining new resources.

这2段话把一般通用控制器和operator的区别讲清楚了:

The actions the ReplicaSet controller takes are intentionally general and application agnostic. It does not, should not, and truly cannot know the particulars of startup and shutdown sequences for every application that might run on a Kubernetes cluster.

An Operator is the application-specific combination of CRs and a custom controller that does know all the details about starting, scaling, recovering, and managing its application.

Every Operator has one or more custom controllers implementing its application-specific management logic.

An Operator, in turn, can be limited to a namespace, or it can maintain its operand across an entire cluster.

For example, cluster-scoped operator:

Istio operator: https://github.com/istio/operator cert-manager: https://github.com/jetstack/cert-manager

A service account is a special type of cluster user for authorizing programs instead of people. An Operator is a program that uses the Kubernetes API, and most Operators should derive their access rights from a service account.

Chapter 4 The Operator Framework

This chapter introduced the three pillars of the Operator Framework: the Operator SDK for building and developing Operators; Operator Lifecycle Manager for distributing, installing, and upgrading them; and Operator Metering for measuring Operator performance and resource consumption.

The Red Hat Operator Framework makes it simpler to create and distribute Operators. It makes building Operators easier with a software development kit (SDK) that automates much of the repetitive implementation work. The Framework also provides mechanisms for deploying and managing Operators. Operator Lifecycle Manager (OLM) is an Operator that installs, manages, and upgrades other Operators. Operator Metering is a metrics system that accounts for Operators’ use of cluster resources.

Operator SDK: https://github.com/operator-framework/operator-sdk The SDK currently includes first-class support for constructing Operators in the Go programming language, with support for other languages planned. The SDK also offers what might be described as an adapter architecture for Helm charts or Ansible playbooks.

Operator Lifecycle Manager takes the Operator pattern one level up the stack: it’s an Operator that acquires, deploys, and manages Operators on a Kubernetes cluster.

Operator Metering is a system for analyzing the resource usage of the Operators running on Kubernetes clusters.

Install operator SDK: https://sdk.operatorframework.io/docs/install-operator-sdk/ 注意k8s version是否与当前operator sdk兼容,比如我实验的时候k8s version 1.13.2,它支持的crd api version is apiextensions.k8s.io/v1beta1, 而最近的operator sdk生成的crd api version is apiextensions.k8s.io/v1. 书中用的operator sdk version 0.11.0.

Chapter 5 Sample Application: Visitors Site

In the chapters that follow, we’ll create Operators to deploy this application using each of the approaches provided by the Operator SDK (Helm, Ansible, and Go), and explore the benefits and drawbacks of each.

读到这里,疑惑Helm是如何处理这个问题的,特别是对同一个charts之中的依赖: When deploying applications through manifests, awareness of these relationships is required to ensure that the values line up.

The manifest-based installation for this demo: https://github.com/kubernetes-operators-book/chapters/tree/master/ch05 Now deploying it manually with correct order:

1
2
3
kubectl create -f database.yaml
kubectl create -f backend.yaml
kubectl create -f frontend.yaml

Deletion:

1
2
3
kubectl delete -f database.yaml
kubectl delete -f backend.yaml
kubectl delete -f frontend.yaml

Chapter 6 Adapter Operators

You would have to create CRDs to specify the interface for end users. Kubernetes controllers would not only need to be written with the Operator’s domain-specific logic, but also be correctly hooked into a running cluster to receive the proper notifications. Roles and service accounts would need to be created to permit the Operator to function in the capacity it needs. An Operator is run as a pod inside of a cluster, so an image would need to be built, along with its accompanying deployment manifest.

这章节主要是利用已有的Helm or Ansibel去构造Adapter Operator: The Operator SDK provides a solution to both these problems through its Adapter Operators. Through the command-line tool, the SDK generates the code necessary to run technologies such as Helm and Ansible in an Operator.

First understand the role of CRDs.

  • A CRD is the specification of what constitutes a CR. In particular, the CRD defines the allowed configuration values and the expected output that describes the current state of the resource.
  • A CRD is created when a new Operator project is generated by the SDK.
  • The SDK prompts the user for two pieces of information about the CRD during project creation: kind, api-version

Official operator SDK sample: https://github.com/operator-framework/operator-sdk-samples

Helm Operator

demo git repo to generate helm operator: https://github.com/kubernetes-operators-book/chapters/tree/master/ch06/visitors-helm

A Helm Operator can deploy each instance of an application with a different version of values.yaml. The Operator SDK generates Kubernetes controller code for a Helm Operator when it is passed the --type=helm argument. As a prerequisite, be sure to install the Helm command-line tools on your machine.

New Chart

Generate a blank helm chart structure within the operator project code:

1
2
OPERATOR_NAME=visitors-helm-operator
operator-sdk new $OPERATOR_NAME --api-version=example.com/v1 --kind=VisitorsApp --type=helm

At this point, everything is in place to begin to implement your chart.

There are several direcotyies created:

  • build: it contains Dockerfile for operator image
  • deploy: crds definition, role and rolebinding, service account
  • helm-charts: helm chart structure for your app
  • watches.yaml: maps each CR type to the specific Helm chart that is used to handle it.

Existing Chart

Helm install command 其实有很多参数可以customize,比如选择values yaml file, 但这里没有这么灵活,用的是默认的values.yaml.

一定要事先检查template validation,比如对于helm3:

1
helm template <chart dir or archive file> [--debug] | less

查看每个rendering 是否格式正确,helm template对format issue并不会报错。

Generate Helm operator atop existing helm archive, 对于OpenShift,要先oc login,否则operator-sdk 不能获得cluster info:

1
2
3
4
5
OPERATOR_NAME=visitors-helm-operator
## download existing chart archive
wget https://github.com/kubernetes-operators-book/chapters/releases/download/1.0.0/visitors-helm.tgz
## generate helm operator
operator-sdk new $OPERATOR_NAME --api-version=example.com/v1 --kind=VisitorsApp --type=helm --helm-chart=./visitors-helm.tgz
  • --helm-chart: A URL to a chart archive, The repository and name of a remote chart, or The location of a local directory
  • --helm-chart-repo: Specifies a remote repository URL for the chart
  • --helm-chart-version: Tells the SDK to fetch a specific version of the chart. If this is omitted, the latest available version is used.

You will see deploy/crds/example.com_v1_visitorsapp_cr.yaml has the fields exactly the same as values.yaml in helm chart.

Before running the chart, the Operator will map the values found in the custom resource’s spec field to the values.yaml file.

如此生成的CRD 和 role (extremely permissive) 可以直接使用,但可能不满足具体要求,比如constraints 以及权限限制,需要自己调整:

Ansible Operator

More or less the same as Helm operator generation. Generate blank Ansible operator project:

1
2
OPERATOR_NAME=visitors-ansible-operator
operator-sdk new $OPERATOR_NAME --api-version=example.com/v1 --kind=VisitorsApp --type=ansible

Test Operator

An Operator is delivered as a normal container image. However, during the development and testing cycle, it is often easier to skip the image creation process and simply run the Operator outside of the cluster. 这个用在开发测试的时候,它不会部署一个真正的operator deployment,只是一个process,但实验效果和真实的一样。这里只是针对helm and ansible的类型.

1
2
3
4
5
6
## go to root path of operator project
## set full path in `chart` field to chart
cp watches.yaml local-watches.yaml
kubectl apply -f deploy/crds/*_crd.yaml
## start opeerator process
operator-sdk up local --watches-file ./local-watches.yaml

The process is up and running, next is to apply your cr yaml:

1
2
kubectl apply -f deploy/crds/*_cr.yaml
kubectl delete -f deploy/crds/*_cr.yaml

你会看到log的变化,以及application在k8s cluster中的更新。 Once the test is complete, end the running process by pressing Ctrl-C.

During development, repeat this process to test changes. On each iteration, be sure to restart the Operator process to pick up any changes to the Helm or Ansible files

Deploy Operator

Running an Operator outside of the cluster, is convenient for testing and debugging purposes, but production Operators run as Kubernetes deployments.

  1. Build the operator image. The Operator SDK’s build command chains to the underlying Docker daemon to build the Operator image, and takes the full image name and version when run:
1
operator-sdk build jdob/visitors-operator:0.1

You can check the Dockerfile, no additional changes are needed. the ${HOME} is consistent with the path in watches.yaml.

Once built, push the image to an externally accessible repository

  1. Configure the deployment. Update the deploy/operator.yaml file that the SDK generates with the name of the image.

  2. Deploy CRD

  3. Deploy Service account and Role

  4. Deploy Operator deployment

Chapter 7 Operators in Go with the Operator SDK

这一节的code参考,没有把所有logic都写到一个文件下,而是针对不同的resource分开写的,还将公用的部分单独分出来了,很有参考价值, 我总结了一下实现,看这里: https://github.com/chengdol/k8s-operator-sdk-demo

The Operator SDK provides that flexibility by making it easy for developers to use the Go programming language, including its ecosystem of external libraries, in their Operators. Write acutall business logic of operator.

While you can write all these pieces manually, the Operator SDK provides commands that will automate the creation of much of the supporting code, allowing you to focus on implementing the actual business logic of the Operator.

We will explore the files that need to be edited with custom application logic and discuss some common practices for Operator development.

Create Go Based Operator

关于如何command line创建go-based operator书中的表述不清楚,我参考了Red Hat的文档:

这描述太含糊了: In particular, the Operator code must be located in your $GOPATH, 关键是怎么设置$GOPATH:

如果用go env | grep GOPATH,发现已经有默认值了$HOME/go,但还需要在bash env中export:

1
2
3
4
5
6
7
8
9
10
11
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
#export GO111MODULE=on

OPERATOR_NAME=visitors-operator
## 这个路径和后面的controller import中的路径要一致!
OPERATOR_PATH=$GOPATH/src/github.com/jdob
mkdir -p $OPERATOR_PATH
cd $OPERATOR_PATH
## no --type specified, default is go
operator-sdk new $OPERATOR_NAME

针对operator-sdk new出现的错误信息,我export了GO111MODULE=on,但后来重做一遍后这个错误又消失了:

The generation can take a few minutes as all of the Go dependencies are downloaded.

Add CRDs

You can add new CRDs to an Operator using the SDK’s add api command. Run from the Operator project root directory to generate CRD: 这应该说明一个Operator可以有多个CRDs.

1
2
3
cd $OPERATOR_PATH/$OPERATOR_NAME
operator-sdk add api --api-version=example.com/v1 --kind=VisitorsApp
## from command outputs, you will see what files are generated

3 files are important:

  • deploy/crds/*cr.yaml
  • deploy/crds/*crd.yaml
  • pkg/apis/example/v1/visitorsapp_types.go: contains a number of struct objects that the Operator codebase leverages

For example, in pkg/apis/example/v1/visitorsapp_types.go edit the Spec and Status struct:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// VisitorsAppSpec defines the desired state of VisitorsApp
// +k8s:openapi-gen=true
type VisitorsAppSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Add custom validation using kubebuilder tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html

Size int32 `json:"size"`
Title string `json:"title"`
}

// VisitorsAppStatus defines the observed state of VisitorsApp
// +k8s:openapi-gen=true
type VisitorsAppStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Add custom validation using kubebuilder tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html

BackendImage string `json:"backendImage"`
FrontendImage string `json:"frontendImage"`
}

After editing, run

1
2
## After any change to a *_types.go file, you need to update any generated code
operator-sdk generate k8s

Then customize deploy/crds/example_v1_visitorsapp_crd.yaml file to reflect the struct content, for example: https://github.com/chengdol/chapters/tree/master/ch07/visitors-operator/deploy/crds

这里并没有特意修改RBAC,用的默认Operator permission: https://github.com/chengdol/chapters/tree/master/ch07/visitors-operator/deploy

Write Control Logic

Inside of the Operator pod itself, you need a controller to watch for changes to CRs and react accordingly. Similar to adding a CRD, you use the SDK to generate the controller’s skeleton code.

1
2
## generate controller code skeleton
operator-sdk add controller --api-version=example.com/v1 --kind=VisitorsApp

The file pkg/controller/visitorsapp/visitorsapp_controller.go will be created, the is the controller file that implements the Operator’s custom logic.

More information on K8s controller: https://kubernetes.io/docs/concepts/architecture/controller/

主要有2个func需要customize: add and Reconcile,一个是watch也就是告诉K8s哪些resource需要监控,一个是控制逻辑. While the bulk of the Operator logic resides in the controller’s Reconcile function, the add function establishes the watches that will trigger reconcile events: https://github.com/chengdol/chapters/tree/master/ch07/visitors-operator/pkg/controller/visitorsapp

The first watch listens for changes to the primary resource that the controller monitors. 也就是自定义的kind类型。 The second watch, or more accurately, series of watches, listens for changes to any child resources the Operator created to support the primary resource. 也就是自定义kind类型中间接的其他resources,比如deployment, sts, service等

Reconcile function The Reconcile function, also known as the reconcile loop, is where the Operator’s logic resides: https://github.com/chengdol/chapters/blob/master/ch07/visitors-operator/pkg/controller/visitorsapp/visitorsapp_controller.go

The Reconcile function returns two objects: a ReconcileResult instance and an error. 有几种可能:

1
2
3
4
return reconcile.Result{}, nil
return reconcile.Result{}, err
return reconcile.Result{Requeue: true}, nil
return reconcile.Result{RequeueAfter: time.Second*5}, nil

Since Go-based Operators make heavy use of the Go Kubernetes libraries, it may be useful to review: https://pkg.go.dev/k8s.io/api the core/v1 and apps/v1 modules are frequently used to interact with the common Kubernetes resources.

这里提到了update status value,应该对应的是resource yaml中底部的status 信息:

1
2
instance.Status.BackendImage = "example"
err := r.client.Status().Update(context.TODO(), instance)

如同我在这章开头提到的,作者将不同resource的逻辑分开到不同的go file了,可以仔细观察怎么写的.

关于Child resource deletion: If the child resource’s owner type is correctly set to the primary resource, when the parent is deleted, Kubernetes garbage collection will automatically clean up all of its child resources

It is important to understand that when Kubernetes deletes a resource, it still calls the Reconcile function.

There are times, however, where specific cleanup logic is required. The approach in such instances is to block the deletion of the primary resource through the use of a finalizer. A finalizer is simply a series of strings on a resource, 感觉就是一个mark.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
finalizer := "visitors.example.com"

beingDeleted := instance.GetDeletionTimestamp() != nil
if beingDeleted {
if contains(instance.GetFinalizers(), finalizer) {

// Perform finalization logic. If this fails, leave the finalizer
// intact and requeue the reconcile request to attempt the clean
// up again without allowing Kubernetes to actually delete
// the resource.

instance.SetFinalizers(remove(instance.GetFinalizers(), finalizer))
err := r.client.Update(context.TODO(), instance)
if err != nil {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}

Idempotency

It is critical that Operators are idempotent. Multiple calls to reconcile an unchanged resource must produce the same effect each time.

  1. Before creating child resources, check to see if they already exist. Remember, Kubernetes may call the reconcile loop for a variety of reasons beyond when a user first creates a CR. Your controller should not duplicate the CR’s children on each iteration through the loop.

  2. Changes to a resource’s spec (in other words, its configuration values) trigger the reconcile loop. Therefore, it is often not enough to simply check for the existence of expected child resources. The Operator also needs to verify that the child resource configuration matches what is defined in the parent resource at the time of reconciliation.

  3. Reconciliation is not necessarily called for each change to the resource. It is possible that a single reconciliation may contain multiple changes. The Operator must be careful to ensure the entire state of the CR is represented by all of its child resources.

  4. Just because an Operator does not need to make changes during a reconciliation request doesn’t mean it doesn’t need to update the CR’s Status field. Depending on what values are captured in the CR’s status, it may make sense to update these even if the Operator determines it doesn’t need to make any changes to the existing resources.

Operator Impact

If the Operator incorrectly handles operations, they can negatively affect the performance of the entire cluster.

Test Operator

如果operator test有错误,则image build之后运行也会出现同样的错误!

The process running the Operator may be outside of the cluster, but Kubernetes will treat it as it does any other controller.

Go to the root project directory:

1
2
3
4
5
6
## deploy CRD
kubectl apply -f deploy/crds/*_crd.yaml
## start operator in local mode
operator-sdk up local --namespace default
## deploy CR
kubectl apply -f deploy/crds/*_cr.yaml

The Operator SDK uses credentials from the kubectl configuration file to connect to the cluster and attach the Operator. The running process acts as if it were an Operator pod running inside of the cluster and writes logging information to standard output.

Chapter 8 Operator Lifecycle Manager

这章节概念性的东西较多,建议多读几遍。 OLM git repo: https://github.com/operator-framework/operator-lifecycle-manager

Once you have written an Operator, it’s time to turn your attention to its installation and management. As there are multiple steps involved in deploying an Operator, a management layer becomes necessary to facilitate the process. 就是管理Operator的东西.

OLM’s benefits extend beyond installation into Day 2 operations, including managing upgrades to existing Operators, providing a means to convey Operator stability through version channels, and the ability to aggregate multiple Operator hosting sources into a single interface. OLM在Openshift 上是自带的,K8s上没有。OLM也是通过CRD实现的,在Openshift 中run oc get crd 就可以看到相关CRDs.

  1. ClusterServiceVersion You can think of a CSV as analogous to a Linux package, such as a Red Hat Package Manager (RPM) file.

Much like how a deployment describes the “pod template” for the pods it creates, a CSV contains a “deployment template” for the deployment of the Operator pod.

  1. CatalogSource A CatalogSource contains information for accessing a repository of Operators. OLM provides a utility API named packagemanifests for querying catalog sources, which provides a list of Operators and the catalogs in which they are found.
1
kubectl -n olm get packagemanifests
  1. Subscription End users create a subscription to install, and subsequently update, the Operators that OLM provides. A subscription is made to a channel, which is a stream of Operator versions, such as “stable” or “nightly.”

To continue with the earlier analogy to Linux packages, a subscription is equivalent to a command that installs a package, such as yum install.

  1. InstallPlan A subscription creates an InstallPlan, which describes the full list of resources that OLM will create to satisfy the CSV’s resource requirements.

  2. OperatorGroup An Operator belonging to an OperatorGroup will not react to custom resource changes in a namespace not indicated by the group.

Installing OLM

version v0.11.0, 我用的k8s v1.13.2,最近的版本不兼容了 https://github.com/operator-framework/operator-lifecycle-manager/releases

1
2
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/0.11.0/crds.yaml
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/0.11.0/olm.yaml

After applying, The CRDs for OLM are created, the olm pods are up and running in olm namespace. OLM可以用于和OperatorHub.io 进行交互,如同Helm 和HelmHub, Docker 和DockerHub. 书中用了个例子说明如何部署etcd operator from operatorHub.

后面主要是讲如何publish自己的operator了, 目前用不到。

Chapter 9 Operator Philosophy

Let’s try to connect those tactics to the strategic ideas that underpin them to understand an existential question: what are Operators for?

An Operator reduces human intervention bugs by automating the regular chores that keep its application running. Operators: Kubernetes Application Reliability Engineering

有些启发价值: You can build Operators that not only run and upgrade an application, but respond to errors or slowing performance.

Control loops in Kubernetes watch resources and react when they don’t match some desired state. Operators let you customize a control loop for resources that represent your application. The first Operator concerns are usually automatic deployment and self-service provisioning of the operand. Beyond that first level of the maturity model, an Operator should know its application’s critical state and how to repair it. The Operator can then be extended to observe key application metrics and act to tune, repair, or report on them.

Site Reliability Engineering lists the four golden signals as latency, traffic, errors, and saturation.

Highly Successful Operators:

  1. An Operator should run as a single Kubernetes deployment.
  2. Operators should define new custom resource types on the cluster.
  3. Operators should use appropriate Kubernetes abstractions whenever possible.
  4. Operator termination should not affect the operand.
  5. Operator termination should not affect the operand.
  6. Operator termination should not affect the operand.
  7. Operators should be thoroughly tested, including chaos testing.

Appendix

Running an Operator as a Deployment Inside a Cluster

Please see my git repo for more details.

1
2
3
## build operator image
## go to project root directory
operator-sdk build image:tag

Then docker push image to docker registry, replace the image placeholder in operator.yaml file. Then apply the CR yaml.

书中另外2个appendix 是关于CRD validation and RBAC control的设置。

0%