Python3 Quick Start

Posted on 2019-08-05 Edited on 2021-12-27 In Python

Python build-ins modules: https://docs.python.org/3/library/functions.html#open

Template

With Docstring, you can use help() command to get module information, for example:

## "words" is the script name: words.py
import words
help(words)
## fetch_words is a function inside words.py script
help(words.fetch_words)

Acutally help() works on every object.

This is a python script named words.py with demonstration for Docstring:

"""Retrieve and print words from a URL.

Usage:

   python3 words.py <URL>
"""

import sys
from urllib.request import urlopen


def fetch_words(url):
    """Fetch a list of words from a URL.

    Args:
        url: The URL of a UTF-8 text document.

    Returns:
        A list of strings containing the words from
        the document.
    """
    ## can use with-block here
    story = urlopen(url)
    story_words = []
    for line in story:
        line_words = line.decode('utf8').split()
        for word in line_words:
            story_words.append(word)
    ## file like object, still need to close
    story.close()
    return story_words


def print_items(items):
    """Print items one per line.

    Args:
        An iterable series of printable items.
    """

    for item in items:
        print(item)


def main(url):
    """Print each word from a text document from at a URL.

    Args:
        url: The URL of a UTF-8 text document.
    """

    words = fetch_words(url)
    print_items(words)


if __name__ == '__main__':
    ## pass parameter from command line
    ## argv[0] is the module name
    main(sys.argv[1])

On Linux, if you add shebang #!/usr/bin/env python3 at the top of the script, then you can run it by:

1 2	chmod +x words.py ./words.py http://sixty-north.com/c/t.txt

后面会专门学习一下python script方面的知识, see my blog <<Python3 Scripting>>.

Exception

try statements do not create a new scope! the variables in try block can be seen from outside try block:

import sys

def convert(s):
    try:
        number = ''
        for token in s:
            number += DIGIT_MAP[token]
        return int(number)
    except (KeyError, TypeError) as e:
        ## !r is a shortcut to call the repr of the value supplied.
        print(f"Conversion error: {e!r}", file=sys.stderr)
        ## re-raising exception
        raise
    else:
        print("I will be called only if exception didn't occur!")
    finally:
        ## will execute no matter normal or not
        pass


def sqrt(x):
    """Compute square roots using the method 
    of Heron of Alexandria.

    Args:
        x: The number for which the square root 
            is to be computed.

    Returns:
        The square root of x.

      Raises:
          ValueError: If x is negative.
    """
    if x < 0:
        ## upper level can catch this exception
        raise ValueError(
            "Cannot compute square root of "
            f"negative number {x}")

    guess = x
    i = 0
    while guess * guess != x and i < 20:
        guess = (guess + x / guess) / 2.0
        i += 1
    return guess

Common exception types:

indexError: index out of boundary
keyError: mapping
TypeError: usually avoiding check this, increase function usability.
valueError: int(“hello”)
OSError: os module open file

Modularity

Import module and attribute, 掌握import的语法, module normally is a single python source file, e.g. hello.py. When import, it is represented by module objects.

## import custom module
import hello
## Hello is a class in hello.py
from hello import Hello

## system modules
import math
math.factorial(10)

from math import factorial
factorial(10)

from math import factorial as fac
fac(10)

# other import forms
from math import (factorial, exp)
from math import *

In the interactive python console, use help to explore modules:

# you can search for modules, keywords, symbols, topics
help()
>>> math
>>> urllib.request

# or check math module by first import it
import math
help(math)
## request is a nested module after urllib
import urllib.request
## or
from urllib import request as req
help(urllib.request)

You can check the attributes of an object:

## show all methods
dir(math)
dir(time.ctime)
## show type
type(math)
type(time.ctime)

Commonly used modules:

requests: simple http library
urllib: urllib is a package that collects several modules for working with URLs
sys: access argv, stdin, stdout, etc.
time: principally for working with unix time stamps
datetime: UTC datetime, Unix timestamp, timedetla
pprint: pretty print
os: interface to system services
itertools: iteration processing
contextlib: use with with statement
typing: type hints, built-in after python 3.5
functools: functools.wraps() 用来 copy original function metadata

最近在做project的时候遇到几个新的modules:

threading: threading operation
subprocess: spawn new processes

time vs datetime modules: https://stackoverflow.com/questions/7479777/difference-between-python-datetime-vs-time-modules the time module is principally for working with unix time stamps; expressed as a floating point number taken to be seconds since the unix epoch. the datetime module can support many of the same operations, but provides a more object oriented set of types, and also has some limited support for time zones.

Function

Function name in python uses lowercase and - as delimiter. def keywork bind a function to a name, function in Python is treated as object.

Extended arguments, for example: *args(act as tuple), **kwargs(act as dict). This is called parameters packing, these applies to all types of callables, for example lambda.

The parameter type order must follow: regular positional -> *args -> keyword -> **kwargs, for example:

def print_args(arg1, arg2, *args, kwarg1, kwarg2, **kwargs):
    print(arg1)
    print(arg2)
    print(args)
    print(kwarg1)
    print(kwarg2)
    print(kwargs)

print_args(1, 2, 3, 4, 5, kwarg1 = 6, kwarg2 = 7, kwarg3 = 8, kwarg4 = 9)

The parameters after * must be passed by key word:

def function_name(para1, *, para2 = "hello", para3):
    pass

function_name(1, para2 = "world", para3 = 567)

Correspondingly, we have extended call syntax, unpacking when pass the parameters to function call, * is for tuple or list, ** is for dict.

Unpacking parameters:

def fun(a, b, c, d): 
    print(a, b, c, d) 

my_list = [1, 2, 3, 4] 
  
# Unpacking list into four arguments 
# 在变量前加单星号表示将元组（列表、集合）拆分为单个元素
# 双星号同上，区别是目标为字典，字典前加单星号的话可以得到“键”
fun(*my_list)

Positional-only arguments:

1
2
3

## no kwarg can be used here
def function_name(x, /):
    print(x)

If no return, then will implicitly return None.

def function_name(para1 = "hello", para2 = 34):
    ## rebind the global variable to local
    global count
    ## return None
    return

Notice that always use immutable value for default value!! Default value的赋值会在最初执行函数的时候运行一次，之后调用不会再重新赋值，看样子是一直存在内存里了。

def append_word(org=[]):
    org.append("!")
    return org
## if you call multiple times with default value, the "!" get accumulated
print(append_word()) # ["!"]
print(append_word()) # ["!", "!"]
print(append_word()) # ["!", "!", "!"]

Another example:

import time
def show_time(t = time.ctime()):
    print(t)

## the print will not get updated
show_time()
show_time()

Function is also an object, can be used as parameters:

def print_card(words):
    banner = "+" + "-" * (len(words) + 2) + "+"
    output = "| " + words + " |"
    lines = [banner, output, banner]
    print("\n".join(lines))
    print()

def print_words(printer, words):
    printer(words)

## print_card passed as parameter
print_words(print_card, "hello, world!")

*args and **kwargs 可以用来argument forwarding:

1
2
3

def trace(f, *args, **kwargs):
    res = f(*args, **kwargs)
    return res

Special Functions

Detect whether a module is run as a script or imported as a module.

1
2
3

# only execute function when it is run as a script
if __name__ == "__main__":
    main()

Functional Programming

The special function __call__, 使用后class object可以当做function来调用，__call__就相当于定义了一个调用接口，并且加上其他数据结构，可以实现caching的效果 stateful.

You can use timeit module timeit method to measure exection time.

## resolver.py file
import socket

class Resolver:
    def __init__(self):
        self._cache = {}

    def __call__(self, host):
        if host not in self._cache:
            self._cache[host] = socket.gethostbyname(host)
        return self._cache[host]

    def clear(self):
        self._cache.clear()

    def has_host(self, host):
        return host in self._cache

Run in REPL:

from resolver import Resolver
res = Resolver()
## just like call function
res("www.google.com")
## call second time, the execution time reduces a lot
res("www.google.com")

How to know object is callable, use callable() function to test.

Lambda

Create anonymous callable objects, syntax: lambda [args]: expr, the args are separated by commas or empty, the body is a single expression.

1
2
3

## the key is assigned a callable function
## similar to Java
sorted(name_list, key=lambda name: name.split()[-1])

Functional-style Tools

map(): maps function to a sequence, lazy implementation, return iterator. filter(): remove elements from sequence which don’t meet some criteria, lazily. functools.reduce(): 2-arguments function with a sequence, reduce the sequence to one result.

Local Function

functions defined inside function.

## 实现了和前面lambda类似的功能
def sort_by_last_letter(strings):
    def last_letter(s):
        return s[-1]
    return sorted(strings, key=last_letter)
sort_by_last_letter(["hesd", "sddn", "pplea"])

Name resulation in the scope is checked by LEGB rule: Local -> Enclosing (the containing function) -> Global -> Build-in:

g = "global"
def outer(p = "param"):
    l = "local"
    def inner():
        print(g, p, l)
    inner()

## global param local
outer()
## this call is wrong!
outer.inner()

Local function usage cases:

define one-off functions close to their use.
code organization and readability.
similar to lambda but more general, can have mutliple expressions.

Local function can be returned, working with Closure(在返回local function时，对其需要的资源进行保留，防止被垃圾回收, keep enclosing-scope objects alive):

def enclosing():
    x = "closed over"
    def local_func():
        print(x)
    return local_func
lf = enclosing()
## call it
lf()

## check closure env
## (<cell at 0x106739e88: str object at 0x1067b5ab0>,)
lf.__closure__

Function factories, return other functions, returned function use both their own arguments as well as arguments to the factory.

def raise_to(exp):
    def raise_to_exp(x):
        return pow(x, exp)
    return raise_to_exp

square = raise_to(2)
square(9)

nonlocal is like global keyword, to name binding in enclosing scope. 有点类似于local function使用的全局变量，但只针对同一个local function.

Function Decorators

Allow you to modify existing functions or methods without changing their definition (在原函数中加入上下文). Decorators can be:

Local function 以及 closure 结合使用.
class, the class must implement __call__(), all class define variables are gave to decorated function.
instance of a class, can control decorator behavior via instance variable.

You can think decorator as a function accepting a function and returning a function (callable).

这里举一个local function作为decorator的例子，其他类型decorator暂时没用到:

## f: the target decorated function
def escape_unicode(f):
    ## local function wrap
    def wrap(*args, **kwargs):
        ## in below, f is `city_name`
        res = f(*args, **kwargs)
        return ascii(res)
    ## wrap uses a closure to access f after escape_unicode returns
    return wrap

## original function
def city_name():
    return "Tomの"

## the city_name pass to decorator
@escape_unicode
def city_name():
    return "Tomの"

## now unicode is translated to ascii
print(city_name())

You can have multiple decorators, act in order 3->2->1:

@decorator1
@decorator2
@decorator3
def function():
    pass

Keep original function metadata, for example __name__ and __doc__, using functooks.wraps():

import functools

def noop(f):
    ## 
    @functools.wraps(f)
    def noop_wrapper():
        return f()
    return noop_wrapper

@noop
def hello():
    """Print a dummy message
    """
    print("hello. world!")

## check metadata is there
help(hello)
hello.__name__
hello.__doc__

Parameterized decorator的一个用途是检查传入原函数的参数，比如这里检查第二个参数不能为负数:

def check_non_negative(index):
     ## real decorating part                                      
     def validator(f):                                                       
         def wrap(*args):                                                    
             if args[index] < 0:                                             
                 raise ValueError(                                           
                     'Argument {} must be non-negative.'.format(index))      
             return f(*args)                                                 
         return wrap                                                         
     return validator                                                        

## 这里实际上是调用了check_non_negative，返回的结果作为decorator 
## 和上面的用法不一样了                                                     
@check_non_negative(1)                                                      
def create_list(value, size):                                               
    return [value] * size

## good
create_list('a', 3) 
## bad
create_list('a', -4)

Basic

Unlike other programming languages, Python has no command for declaring a variable, python is dynamic type.

# explicit conversion
int("234")
# toward 0 cutting
int("-342.134")
# 2 is the base
int("100", 2)
float("34.56")

# special
float("nan")
float("inf")
float("-inf")

None 
True, False
# True
bool(3.4)
bool("False")
# False
bool(0)
bool([])
bool(())
bool("")

Operators

Python will not perform implicit type conversion, for example "123" + 56 is wrong. Exception is in if and while condition.

Notice that == vs is when compare strings, == compare the value but is compares the identity equality, you can check the unique number by id(). And comparsion by value can be controlled programatically.

The logic operators are and, or and not. 这里注意它们会返回最后eval的值，可以利用这个特点:

## 999
"hello" and 999
### "world"
0 or 'world'

Check if an object is None using is operator. Function parameters and return are transferred using pass-by-object-reference.

Notice that sequence of the same type also support comparison, just like string comparison, item by item from left to right

1 2	(3, 99, 5, 2) < (5, 7, 3) [5, 3, 1] > [1, 9, 2]

Control Flows

Python does not have switch statement, there are several ways to mimic it: Python switch replacements

## condition
if x > 100 and x <= 999:
    pass
elif x >= 34 or x < -23:
    pass
elif not x:
    pass
else:
    pass

## ternary
res = "big" if x >= 100 else "small"

## loop
while True:
    pass

for item in iterable:
    pass

break/continue

String

Unicode characters.

Python does not have a character data type, a single character is simply a string with a length of 1. Square brackets can be used to access elements of the string or slice string.

Escape will work on both " and '.

The same as Java, string in Python is immutable.

## raw string, no escape
x = r"\n\r\x 234\n\r"
type(x[2])

## list str methods
help(str) ## have definition
dir(str)

## string length
len(str)
## more efficient than "+"
items = "+".join(["hello", "world"])
items.split("+")
"   abc   ".strip()

## _ is the dummy var that will not be used
up, _, down = "good:bad".partition(":")

## operator
"123" + "456"
"123" * 5

Python f-Strings is better and concise then format():

a = 23
b = "apple"
print(f"I have {a} {b}s")
## others
print("{1} and {0}".format(a, b))
print("%d and %s" % (a, b))

# Example of accessing the attributes of an object
import os
file = "/etc"
info = os.stat(file)
print("file {0} uid {1.st_uid}, size {1.st_size}".format(file, info))

# Example of specifying field width and precision
# Print table of powers of two and their square roots
import math
x = 1
for i in range(10):
    x = x * 2
    y = math.sqrt(x)
    print("{0:4}{1:10}{2:10.3f}".format(i, x, y))
    # print("%4d%10d%10.3f" % (i, x, y))

Bytes

In python3, Strings are represented by sequences of unicodes, but textual data in Linux is a sequence of bytes, we need to use encode() and decode() to convert python string to/from bytes.

You get byte object from HTTP request, need to convert to str to use.

x = "hello world"
type(x)
##convert to byte stream
data = x.encode("utf-8")
## convert back
x = data.decode("utf-8")

## byte string
b = b"hello world"
type(b)

List

## can have comma at end
a = [1, 2, "abc", 34, 76.887, "jark", -84.124]
a = []
## slice
a[1: -1: 3]
a[: 5: 3]
a[2:]
## reverse the sequence
a[::-1]
## remove element from list
def a[3]

a.append(998)
## join list, not append, they are different!
a += [1, 2, 3]
a.append([1, 2, 3])

a.pop()
a.index("abc")
a.count(34)
a.remove("abc")
a.insert(3, "apple")
a.reverse()
## reverse sort
a.sort(reverse = True)
## sort by length of each object
a.sort(key=len)

## copy the list, but they are all shallow copy!!
a[:]
a.copy()
list(a)

除了list自带的sort and reverse, out-of-place functions: sorted(), reversed() can also be used, they create a new list, reversed() will return a reversed iterator.

Dict

d = dict()
d = {}
## add or update
d["one"] = 34
del d["one"]

## key can be string, number and bool
d = {"k1": "v1", "k2", 123}
d = dict(a="343", b="uuys", c=123)
## can even be generated by tuple
a = [("sure", 643), (98, "341"), ("name", "monkey")]
d = dict(a)

The copy of dict is shallow.

1 2	d.copy() dict(d)

merge dict:

1 2	## if keys are overlapped, the value will be updated by the merged one d.update({"away": 998})

iterate dict via foreach loop:

1
2
3

for k in d.keys()
for v in d.values()
for k, v in d.items()

Use in and not in to check the existence.

Set

Immutable collection with unique immutable objects.

## s = () is tuple!
s = set()
s = {3, 676, 1, 34, 89}
type(s)

s.add(99)
s.update({5, 232, 89, -45})
s.remove(3)

Use in and not in to check the existence.

The copy of dict is shallow.

1 2	s.copy() set(d)

Set 有很多代数运算法则可以使用:

s.union(t)
s.intersection(t)
s.difference(t)
s.symmetric_difference(t)
s.issubset(t)
s.issuperset(t)
s.isdisjoint(t)

Tuple

Tuples are unchangeable, or immutable as it also is called.

## useless, because immutable
t1 = ()

t1 = ("apple", "banana", "cherry")
## create one item tuple, must append comma!
t2= ("apple",)
type(t2)

## this is also tuple, used in function return
t2 = 1, 2, 4, "ok"
## for example
def min_max(items):
    return min(items), max(items)

## tuple unpacking
lower, upper = min_max([2, 3, 7, 4, 34])
## can be nested unpacking
(a, (b, c)) = (1, (2, 3))
## swap value
a, b = b, a

Other operations:

t3 = ("hello", 10.23, 99)
len(t3)
t3 += ("world!")
t3 * 2

Range

1
2
3

range(stop)
range(start, stop)
range(start, stop, step)

Used usually for loop counter:

1 2	for i in range(10): pass

Other usages, for example, generate a list:

1	list(range(0, 10, 2))

Enumerate

If you want to have index pair with the item, use enumerate():

1
2
3

t = [3, 35, 546, 76, 123]
for i, v in enumerate(t):
    print(f"i = {i}, v = {v}")

Iteration and Iterables

Comprehensions with filtering

## list comprehension
[expr(item) for item in iterable if filtering(item)]
## set comprehension, no meaningful order
{expr(item) for item in iterable if filtering(item)}
## dict
{key_expr(item): val_expr(item) for item in iterable if filtering(item)}

## can also be mixed and nested
## this will create a list with 3 identical dict that value is None
## _ here is just for count
[{letter: None for letter in "ABCDEFG"} for _ in "123" if int(_) % 2 != 0]

## y is nested
## x is outer loop
[(x, y) for x in range(3) for y in range(5)]

Iterable can be passed to iter() to produce a iterator. Iterator can be passed to next() to get the next value in the sequence.

iterable = [1, 2, 3, 4, 5, 6, 7]
iterator = iter(iterable)
next(iterator)
next(iterator)

Generator function, stateful, laziness with yield:

def gen():
    yield 0
    a = 1
    b = 2
    while True:
        yield a
        a, b = b, a + b

for i in gen():
    print(i)
    if i > 1000:
        break

Generator expression, can save big memory than list comprehension:

(expr(item) for item in iterable if filtering(item))
## if used as function parameter, no parenthesis is needed
## 多余的()可以省掉
sum(x * x for x in range(10000000) if x % 2 == 0)

Generator is only single use object, 用完就没了，需要重新造一个。

There are several aggregation functions: bool: any(), all(); sync unlimited number of iterables zip():

1 2	for i, j in zip([1,2,3],[4,5,6]): print(f"output: {i}, {j}")

Class

Python does not have public, private, protected key word, everything is public.

Polymorphism is implemented by late binding in Python, 并不是和Java 通过继承实现多态，Python中你可以对一个Object 调用任意method，只要它有这个method就行，用的时候才会检查。并且继承在Python中主要用来分享共用的方法，当然也可以来多态, 但是继承在python中用得不多。

class Parent:
    """ A Parent class """

    ## no number() in this class, but child class has
    def get_treble_number(self):
        """ description """
        ## editor会报错，但构造parent()不会，除非你调用这个方法才会出错
        return self.get_number() * 3

## inheritance
class Child(Parent):
    """ A Child class """

    ## class variable, shared among all instances
    class_variable = 0
    ## initializer not quite a constructor
    ## self is similar to this in Java
    def __init__(self, number):
        """ description """

        ## _ prefix variable is class property, treated as private, although you can access it
        self._number = number
        ## still needs self to call method
        self._lst = self._gen_list()
        Child.class_variable += 1

    def _gen_list(self):
         """ description """
        return [None] + [item for item in range(1, 10)]

    def get_number(self):
        """ description """

        return self._number

    def get_double_number(self):
         """ description """

        ## method local variable
        double = self._number * 2
        return double

Parent.__doc__ can be used to access doc string.

注意，class method的第一个parameter self 可以是任意其他的名字，就是个标记而已，比如:

1
2
3

class Test:
    def __init__(other, number):
        other._number = number

File I/O and Resource Management

Check default encoding, if not specified in open(), will use this default encoding format.

1 2	import sys sys.getdefaultencoding()

Open file with options, for example:

1
2
3

## mode can be any combination of `crwa`|`tb`
## r+ read/write
f = open('words.txt', mode = 'rwt', encoding = 'utf-8')

Some useful methods after open the file:

## read all return string
f.read()
## read one line
f.readline()
## read all lines return a list
f.readlines()

## use print, f is the file descriptor
print("wirte this line", file = f)
## write lines into file
f.writelines(["first\n", "second\n"])
## rewind to beginning
f.seek(0)
## close file after use, to flush updates to disk
f.close(0)

For reading file, you can also use loop:

1
2
3

for line in f:
    ## will not print \n, or you can use print(line.strip())
    sys.stdout.wirte(line)

Use with-block to auto close the resource (or you can use finally block)，不仅仅是用在file上，比如网络上的读写也可以，它们背后的实现都遵循了同样的规则，所以可以使用with-block:

1 2	with open(...) as f: pass

注意这个as 是可以省略的，在threading module的lock使用中，就没有as.

Data Structure

这里主要是和Java 对比，一些常用的数据结构，比如Stack, Queue, Deque, priorityQueue, Map, etc.

Queue

https://www.geeksforgeeks.org/queue-in-python/ There are 3 ways to implement queue in python:

list, quite slow by append() and pop(0), need shift all elements.
collections module import deque, can be used as queue.
queue module import Queue, A maxsize of zero 0 means a infinite queue. (this can be a synchronzied queue for thread programming)

Deque

see Queue section:

collections module import deque, can be used as queue.

Stack

https://www.geeksforgeeks.org/stack-in-python/

There are 3 ways to implement stack in python:

list, but when growing it has speed issues.
collections module import deque, can be used as stack (use this one).
queue module import LifoQueue, A maxsize of zero 0 means a infinite stack.

Priority Queue

https://www.geeksforgeeks.org/priority-queue-in-python/

By using heap data structure to implement Priority Queues, Time complexity: Insert Operation: O(log(n)) Delete Operation: O(log(n))

Min heap can be implemented by heapq module: https://www.geeksforgeeks.org/heap-queue-or-heapq-in-python/

Max heap is not implemented in heapq, the alternative way is invert the priority: https://stackoverflow.com/questions/2501457/what-do-i-use-for-a-max-heap-implementation-in-python

from queue import PriorityQueue, min heap too: https://www.geeksforgeeks.org/priority-queue-using-queue-and-heapdict-module-in-python

Map

use primitive type dict.