Good description for daily use:

One time I was in the ocean and a wave crashed on top of me and took my breath away as it pulled me deeper into the ocean. Just as I started to recover my breath, another wave dropped on top of me and extracted much of my remaining energy and pulled me even deeper into the ocean. Just as I started to recover, yet another wave crashed down on top of me. The more I would fight the waves and the ocean, the more energy I drained. I seriously wondered if I would die at that moment. I couldn’t breath, my body ached and I was terrified I was going to drown.

Being close to death helped me focus on the only thing that could save me, which was conserving my energy and using the waves not fighting them.

Install and Configure

Usually python2 is pre-installed, need to install python3, can refer this blog Install Python 3.7 on centos 7 This installation will not disturb original python2 pre-installed (it is the dependency of some other packages).

install pip3

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
## install pip3
python3 get-pip.py
## or install pip
python get-pip.py

then you can use pip3 to install other packages, otherwise pip is still use python2.

Make python command execute python3, you can use alias.

Installing packages using pip and virtual environments

Chapter 1 Introduction

Install ipython or ipython3, powerful mixed interactive shell: https://ipython.org/index.html#

1
2
3

pip install ipython
## or
pip3 install ipython

then run it as

1
2
3

ipython
## or
ipython3

Python variables use dynamic typing. In practice, this means reassigned to values of different types or classes

Built-in Functions

print
range Use spaces instead of tabs to indent.

Functions

functions can be object, can put it in the list.

1
2
3

>>> functions = [double, triple]
>>> for function in functions:
...     print(function(3))

Wrapping Functions with Decorators, there are some other python command line build tools. click package:

1	pip install click

'''A example of using the click package to develop a command line tool'''

import click

@click.command()
@click.argument('name')
def hello(name):
    '''Say hello to name'''    
    print(f"Hello {name}")

if __name__ == '__main__':
    hello()

call it:

1 2	$ python simple_cli.py Sue Hello Sue

lambda function, just like java use lambda to create comparator for priority queue.

1	sorted(items, key=lambda item: item[1])

RE package

The re module uses \ to delineate special character for matching, for example \., \n, etc. To avoid confusion with regular string escape sequences, raw strings are recommended in defining regular expressions. Raw strings are prepended with a r before the first quotation mark.

Similar as grep:

import re

re.search(r'Rostam', cc_list)
re.search(r'Chr[a-z][a-z]', cc_list)
re.search(r'[A-Za-z]{6}', cc_list)
re.search(r'[A-Za-z]+@[a-z]+\.[a-z]+', cc_list)

Lazy Evaluation

This will have footprint in memory, generate values as needed, will not take much memory.

Create generator, using () instead of [](list comprehension)

1	gen_o_nums = (x for x in range(100))

More IPYTHON features

Using IPython To Run Unix Shell Commands, add ! before command, but sometime does not need (default setting):

1	In [2]: !ls -l

can assign to a variable:

In [6]: res = !df -h | head -n 7

## list format
In [7]: res
Out[7]:
['Filesystem                 Size  Used Avail Use% Mounted on',
 '/dev/mapper/rhel-root      241G   73G  169G  31% /',
 'devtmpfs                   3.9G     0  3.9G   0% /dev',
 'tmpfs                      3.9G     0  3.9G   0% /dev/shm',
 'tmpfs                      3.9G  403M  3.5G  11% /run',
 'tmpfs                      3.9G     0  3.9G   0% /sys/fs/cgroup',
 '/dev/vda1                 1014M  208M  807M  21% /boot']

In [8]: res.grep("dev")
Out[8]:
['/dev/mapper/rhel-root      241G   73G  169G  31% /',
 'devtmpfs                   3.9G     0  3.9G   0% /dev',
 'tmpfs                      3.9G     0  3.9G   0% /dev/shm',
 '/dev/vda1                 1014M  208M  807M  21% /boot']

magic commands:

## enter bash 
In [13]: %%bash
## write into a file
In [14]: %%writefile test.sh

Make IPython import shell alias

#!/usr/bin/env python
import re
import os.path
c = get_config()
with open(os.path.expanduser('~/.bashrc')) as bashrc:
    for line in bashrc:
        if not line.startswith('alias'):
            continue
        parts = re.match(r'^alias (\w+)=([\'"]?)(.+)\2$', line.strip())
        if not parts:
            continue
        source, _, target = parts.groups()
        c.AliasManager.user_aliases.append((source, target))

Drop this code in ~/.ipython/profile_default/ipython_config.py, just like .bashrc and .vimrc, the configuration file for ipython.

How to import shell functions in .bashrc? Or wirte python function instead.

Chapter 2 Automating Text and Files

In the DevOps world, you are continually parsing, searching, and changing the text in files, whether it’s searching application logs or propagating configuration files.

read regular file:

## don't need to close explicitly
with open("/root/DS/tmp.txt", "r") as handler:
  data = handler.read()

## one char
data[0]
## file size
len(data)

or use

## this will parse lines by `\n`
data = readlines()
## i-th line
data[i]

Different operating systems use different escaped characters to represent line-endings. Unix systems use \n and Windows systems use \r\n. Python converts these to \n when you open a file as text. If you are opening a binary file, such as a jpeg image, you are likely to corrupt the data by this conversion if you open it as text. You can, however, read binary files by appending a b to mode:

1
2
3

file_path = 'bookofdreamsghos00lang.pdf'
with open(file_path, 'rb') as open_file:
  btext = open_file.read()

write file:

content='''export a=123
export b=456
'''
with open("/root/DS/.envrc", "w") as handler:
  handler.write(content)

The open function creates the file if it does not already exist and overwrites if it does. if want to append, use a mode instead of w. For binary file use bw or ba.

JSON

import json
with open('xx.json', 'r') as handler:
  data = json.load(handler)

json.load() is used to load file
## Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads() is used to load other object
##Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.

pprint
Pretty printing has been turned ON
## then print data is good
## or
print(json.dumps(data, indent=2))

## update
data["workerNodeHosts"][0]["name"] = "localhost"
## write file
with open('xx.json', 'w') as handler:
  json.dump(data, handler, indent=2)

## the same as load() and loads()
json.dump() is for file
json.dumps() is for general object

actually you can use data pretty printer:

1
2
3

import pprint

pprint.pprint(data)

YAML

The most commonly used library for parsing YAML files in Python is PyYAML. It is not in the Python Standard Library, but you can install it using pip:

pip install pyyaml

import yaml
## read
with open("xx.yml", "r") as handler: 
  data = yaml.safe_load(handler) 

## python convert data as a dict, so you can edit it

print(yaml.dump(data, indent=2))
## write
with open("xx.yml", "w") as handler:
  yaml.dump(data, handler, indent=2)

XML

Historically, many web systems used XML to transport data. One use is for RSS feeds. RSS (Really Simple Syndication) feeds are used to track and notify users of updates to websites. These feeds have been used to track the publication of articles from various sources. RSS uses XML formatted pages. Python offers the xml library for dealing with XML documents. It maps the XML documents hierarchical structure to a tree-like data structure.

import xml.etree.ElementTree as ET
tree = ET.parse('/tmp/test.xml')
root = tree.getroot()
for child in root:
  print(child.tag, child.attrib)

CSV

data stored as comma-separated values.

In [16]: import csv
In [17]: file_path = '/tmp/user.csv'

In [18]: with open(file_path, newline='') as handler:
    ...:     off_reader = csv.reader(handler, delimiter=',')
    ...:     for _ in range(5):
    ...:         print(next(off_reader))
    ..

pandas packages is mainstay to do data science work. Pandas has many more methods for analyzing and manipulating table like data, and there are many books on its use. You should be aware that it is available if you need to do data analysis.

Search Text

One widely used format is the Common Log Format (CLF). A variety of log analysis tools can understand this format:

1 2	<IP Address> <Client Id> <User Id> <Time> <Request> <Status> <Size> 127.0.0.1 - swills [13/Nov/2019:14:43:30 -0800] "GET /assets/234 HTTP

Just give some examples:

line = '127.0.0.1 - swills [13/Nov/2019:14:43:30 -0800] "GET /assets/234 HTTP/1.0" 200 2326'
## use name groups
r = r'(?P<IP>\d+\.\d+\.\d+\.\d+) - (?P<User>\w+) \[(?P<Time>\d\d/\w{3}/\d{4}:\d{2}:\d{2}:\d{2} [-+]\d{4})\] (?P<Request>".+")'
m = re.search(r, line)

In [11]: m.group('IP')
Out[11]: '129.0.0.1'

In [12]: m.group('Time')
Out[12]: '13/Nov/2019:14:43:30 -0800'

In [13]: m.group('User')
Out[13]: 'swills'

In [14]: m.group('Request')
Out[14]: '"GET /assets/234 HTTP/1.0"'

Note: Python automatically allocates and frees memory.The Python garbage collector can be controlled using the gc package, though this is rarely needed.

For large files. If the files contain data that can be processed one line at a time, the task is easy with Python. You can read one line at a time, process the line, and then move to the next. The lines are removed from memory automatically by Python’s garbage collector, freeing up memory.

In [23]: with open('big-data.txt', 'r') as source_file:
    ...:     with open('big-data-corrected.txt', 'w') as target_file:
    ...:         for line in source_file:
    ...:             target_file.write(line)

Chapter 3 Command Line

Python offers tools for interacting with systems and shells. You should become familiar with the sys, os, and subprocess modules, as are all essential tools.

import sys

## little or big endian
sys.byteorder
## python object size
sys.getsizeof(1)
## platform
sys.platform
## python version
sys.version_info.major
sys.version_info.minor

The most common usage of the os module is to get settings from environment variables.

import os

## pwd and cd
os.getcwd()
os.chdir('/tmp')
## get and set env var
os.environ.get('HOME')
os.environ['HOME'] = '/tmp'
## login user
os.getlogin()

With subprocess you can run your favorite shell command or other command line software and collect its output from within Python. For the majority of use-cases, you should use the subprocess.run function to spawn processes

import subprocess
## text = True: convert to string
sub = subprocess.run(['ls', '-ltr'], capture_output=True, universal_newlines=True [,text=True, stdout=<file>])
sub.stdout
sub.stderr
print(sub.stdout.decode())
## exception will be raised when use
sub = subprocess.run(['ls', '/non'], capture_output=True, universal_newlines=True, check=True)

Creating Command Line Tools

Invoke python script usually by:

1	python xx.py

or you can eliminate python by adding #!/usr/bin/env python(or python3) at first line of the script, then chmod the script to executable:

./xx.py

The simplest and most basic way to process arguments from the command line is to use the argv attribute of the sys module:

#!/usr/bin/env python
"""
Simple command line tool using sys.argv
"""
import sys

if __name__ == '__main__':
  ## sys.argv is a list
  sys.argv[0]

  if '--help' in sys.argv:
    help_message = f"Usage: {sys.argv[0]} ..."
    print(help_message)
    sys.exit()
## can get index
idx = sys.argv.index('--namespace')
namespace = sys.argv[idx]

This is not enough, we need argument parser! Luckily there are modules and packages designed for the creation of command line tools. These packages provide frameworks to design the user interface for your module when running in a shell. Three popular solutions argparse, click, and fire. All three include ways to design required arguments, optional flags, and means to display help documentation. The first, argparse, is part of the Python standard library, and the other two are third-party packages that need to be installed separately (using pip).

argparse

这个有专门的tutorial，大概看了一下，it does take much work on your part but you get lots of control.

Automatically generates help and usage messages and issues errors when users give the program invalid arguments.

#!/usr/bin/env python
"""
Command line tool using argparse
"""
import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')

## If the name begins with a dash, it is treated as an optional, flag, argument, otherwise as a position-dependent command. 
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()
print(args.accumulate(args.integers))

You can also define sub-commands, like git stash ....

click

It uses Python Function Decorators to bind the command line interface directly with your functions.

Python decorators are a special syntax for functions take other functions as arguments. Python functions are objects, so any function can take a function as an argument. The decorator syntax provides a clean and easy way to do this.

#!/usr/bin/env python
"""
Simple Click example
"""
import click

@click.command()
@click.option('--greeting', default='Hiya', help='How do you want to greet?')
@click.option('--name', default='Tammy', help='Who do you want to greet?')
def greet(greeting, name):
    print(f"{greeting} {name}")


if __name__ == '__main__':
    greet()

Please refer to click documents.

fire

fire document.

for example:

#!/usr/bin/env python
"""
Simple Fire example
"""
import fire

def greet(greeting='Hiya', name='Tammy'):
    print(f"{greeting} {name}")
def goodbye(goodbye='Bye', name='Tammy'):
    print(f"{goodbye} {name}")

if __name__ == '__main__':
    fire.Fire()

An exciting feature of fire is the ability to enter an interactive mode easily. By using the --interactive flag, fire opens an IPython shell with the object and functions of your script available:

1	./fire_example.py <command> <args> -- --interactive

Overall, We recommend click for most use cases. It balances ease and control. In the case of complex interfaces where you want to separate the UI code from business logic, argparse is the way to go. Moreover, if you need to access code that does not have a command line interface quickly, fire is right for you.

Implementing plugins

Once you’ve implemented your applications command line user interface you might want to consider a plugin system. Plugins are pieces of code supplied by the user of your program to extend functionality.

A key part of any plugin system is plugin discover. Your program needs to know what plugins are available to load and run. Create a file named add_plugins.py

#!/usr/bin/env python
import fire
import pkgutil
import importlib

def find_and_run_plugins(plugin_prefix):
    plugins = {}

    # Discover and Load Plugins
    print(f"Discovering plugins with prefix: {plugin_prefix}")
    # pkgutil.iter_modules returns all modules available in the current sys.path
    for _, name, _ in  pkgutil.iter_modules():
        # Check if the module uses our plugin prefix
        if name.startswith(plugin_prefix):
            # Use importlib to load the module, saving it in a dict for later use.
            module = importlib.import_module(name)
            plugins[name] = module

    # Run Plugins
    for name, module in plugins.items():
        print(f"Running plugin {name}")
        # Call the run method on the plugin.
        module.run()

if __name__ == '__main__':
    fire.Fire()

then you can wirte modules for example: module1.py and put it in sys.path directory. If you run ./add_plugins.py find_and_run_plugins module, then it will search, load and run the module1.py module.

Turbocharging Python with Command Line Tools

Here are the raw ingredients that will be used to make several solutions:

Click Framework
Python CUDA Framework
Numba Framework
Scikit-learn Machine Learning Framework

These are tools to speed up the performance.

Chapter 4 Useful Linux Utilities

This chapter will go through some common patterns in the shell and will include some useful Python commands that should enhance the ability to interact with a machine.

As a seasoned performance engineer once said, it depends on what is measured and how.

disk utility

If we had to work in an isolated environment with a server that doesn’t have access to the internet or that we don’t control and therefore can’t install packages, we would have to say that the dd tool can provide help.

This will get thoughput of the new device, for example, the throughput is 1.4GB/s

dd if=/dev/zero of=<new device> count=10 bs=100M

10+0 records in
10+0 records out
10506731520 bytes (11 GB) copied, 3.12099 s, 1.4 GB/s

how to get IOPS, update every 1 second:

1	iostat -d <device> 1

Other common test tool is fio, you may need to install this package. It can help clarify the performance behavior of a device in a read-heavy or write-heavy environment (and even adjust the percentages of reads vs. writes).

network utility

ssh tunneling (ssh port forwarding)

For example, the server hello.com can only access by ssh with port 3345 and not exposed,let’s forward hello.com:3345 to a local port in my machine: https://www.youtube.com/watch?v=AtuAdk4MwWw

1	ssh -f -L 12333:hello.com:3345 root@hello.com -N

-f means run in bachground -L is forwarding rule -N means don’t get into remote shell root@hello.com is the username and address of that server

This tech can also be used to bypass firewall for some ports. Then we can access localhost:12333 to access the server. 疑问：如果port已经被firewall block了，ssh是怎么连接上的呢？