https://en.wikipedia.org/wiki/Unix_domain_socket
A Unix domain socket or IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing on the same host operating system.
IPC 也是一种概念,有多种实现的方式。
The API for Unix domain sockets is similar to that of an Internet socket, but rather than using an underlying network protocol, all communication occurs entirely within the operating system kernel. Unix domain sockets may use the file system as their address name space. (Some operating systems, like Linux, offer additional namespaces.) Processes reference Unix domain sockets as file system inodes, so two processes can communicate by opening the same socket.
Valid socket types in the UNIX domain are:
SOCK_STREAM (compare to TCP) – for a stream-oriented socket
SOCK_DGRAM (compare to UDP) – for a datagram-oriented socket that preserves message boundaries (as on most UNIX implementations, UNIX domain datagram sockets are always reliable and don’t reorder datagrams)
SOCK_SEQPACKET (compare to SCTP) – for a sequenced-packet socket that is connection-oriented, preserves message boundaries, and delivers messages in the order that they were sent
A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming interface (API) for the networking architecture. Sockets are created only during the lifetime of a process of an application running in the node.
Socket Address is comprised of:
protocol type
IP address
port number
On Unix-like operating systems and Microsoft Windows, the command-line tools netstat or ss are used to list established sockets and related information.
Several types of Internet socket are available:
Datagram sockets: connectless sockets, which use UDP.
Stream sockets: connection-oriented sockets, which use TCP.
Raw sockets: IP packet.
Berkeley Sockets
https://en.wikipedia.org/wiki/Berkeley_sockets
Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). 是以上几个socket的一种统一的抽象表示。
The term POSIX sockets is essentially synonymous with Berkeley sockets, but they are also known as BSD sockets.
如果你已经理解了HTTP, HTTP CONNECT method 以及Proxy的知识,那么维基百科的描述已经比较清楚了.
WebSocket Wiki
WebSocket is a computer communications protocol, providing full-duplex communication channels over a single TCP connection.
WebSocket is distinct from HTTP (half-duplex, clients issue request). Both protocols are located at layer 5 in the OSI model and depend on TCP at layer 4. Although they are different, RFC 6455 states that WebSocket “is designed to work over HTTP ports 443 and 80 as well as to support HTTP proxies and intermediaries,” thus making it compatible with the HTTP protocol. To achieve compatibility, the WebSocket handshake uses the HTTP(1.1 version) Upgrade header to change from the HTTP protocol to the WebSocket protocol.
Some proxies also support WebSocket, for example, Envoy, some may not.
WebSocket handshake ws:// or wss:// (WebSocket Secure):
lots of proxies and transparent proxies don’t support it yet
L7, LB challenging (timeouts)
stateful, difficult to horizontal scale
Do you have to use Web Sockets ? (alternatives)
It is important to note that WebSockets is not the only HTTP realtime based solution, there are other ways to achieve real time such as eventsource, and long polling.
A variant of functions that enables concurrency via cooperative multitasking (task yields control when they are waiting for external resources 也就是在做完重要工作需要其他资源的时候,再转换到其他任务,所以协程很适合IO-bound tasks). We can run all tasks in one thread in one process (就是在一个线程中模拟多线程,本质还是单线程,注意Python threading module也是如此), better than multi-threading and multi-processing in some situations(因为切换内核态和用户态要消耗系统时间和资源, 协程由用户自己控制切换,不用陷入系统内核态).
to analyze function bytecode, the Python dis module can help, for example:
# Coroutines with aiohttp asyncdeffetch(url: str) -> str: asyncwith request("GET", url) as r: returnawait r.text("utf-8")
asyncdefmain(): coros = [fetch(url) for url in URLS] ## gather will run tasks concurrently ## *coros: unpacking waitable objects results = await asyncio.gather(*coros) for result in results: print(f"{result[:20]!r}")
Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. It is fault-tolerance, has big overlap with curl, both heavily using.
wget -h and man page are informative.
值得注意的是,不同Linux版本wget支持的选项可能不完整。
Continue download
1 2 3
## assume the url is disrupted and leave a incomplete file in current folder ## -c,--continue: continue wget <url> -c
## other options: ## --no-check-certificate: Don't check the server certificate against the available certificate authorities ## --connect-timeout: seconds, TCP connections that take longer to establish will be aborted ## -e, --execute: commands, env varaible ## --auth-no-challenge: for server never sends HTTP authentication challenges, not recommended wget --no-check-certificate --connect-timeout 10 -e http_proxy="envoy:10000" http://www.httpbin.org/ip --auth-no-challenge --user xxx --password xxx
## https use https_proxy flag wget --no-check-certificate --connect-timeout 10 -e https_proxy="envoy:10000" https://www.httpbin.org/ip --auth-no-challenge --user xxx --password xxx
# Use the output in browser in another machine echo"$(ifconfig | grep -A 10 ^en0 | grep inet | grep -v inet6 | cut -d" " -f2)":8000
# Launch the server # port number: 8000 # --directory: old version python may not support this option python3 -m http.server 8000 --bind 0.0.0.0 --directory <absolute path to share folder>
USER root # pip install RUN pip install SimpleHTTPAuthServer # to setup self-signed certificate, need to pre-generated key and cert RUNmkdir -p /root/.ssh # From source code it use .ssh folder to store the pem. RUN openssl req \ -newkey rsa:2048 -new -nodes -x509 -days 3650 \ -keyout /root/.ssh/key.pem -out /root/.ssh/cert.pem \ -subj "/C=US/ST=CA/L=San Jose/O=GOOG/OU=Org/CN=localhost"
Docker Compose vs Docker Swarm
The difference lies mostly in the backend, where docker-compose deploys container on a single Docker host, Docker Swarm deploys it across multiple nodes.
现在想想,当时组员在本地搭建DataStage也应该用docker compose, it is especially good for web development:
## build or rebuild service images ## if not specify service name, will build all images docker-compose build [--no-cache] [service name]
## run services ## will build images if not yet ## -d: detch ## --no-deps: just bring up specified service docker-compose up [-d | --no-deps] [service name]
## shutdown, similar to docker rm -f ... ## -v,--volumes: remove the mounted data volume ## --rmi all: remove all images docker-compose down [-v] [--rmi all]
## docker-compose.yml can parse env variables in current running environment ## 通过这些环境变量控制读取的文件类型,比如development, staging, production ## export APP_ENV=development
## 这里假设下面的文件夹和文件都存在,比如dockerfile 等等
## docker-compuse version version:"3.7" services: web: container_name:web build: ## relative path is base on docker-compose.yml directory ## can be url context:. ## args used in dockerfile, must declaure using ARG in dockerfile ## only seen in build process! args: -buildno=1 ## relative path to context dockerfile:./web/web.dockerfile ## specify built image and tag ## if no image here, docker-compose will use it's name convension image:webapp:v1 ## dependencies ## docker-compose will start mongo and redis first depends_on: -mongo -redis
## host: container ## 外界访问采用host port ## container 之间互相访问port 不用在这里expose ports: -"80:80" -"443:443" -"9090-9100:8080-8100" ## add env vars from a file env_file: -web.env/app.${APP_ENV}.env -/opt/runtime_opts.env ## other form of env vars environment: RACK_ENV:development ## boolean muct be quoted SHOW:'true' ## value is from current running environment SESSION_SECRET:
## short syntax ## https://docs.docker.com/compose/compose-file/#short-syntax-3 volumes: ## host path, relative to docker-compose.yml -"./web:/opt/web" ## named volume -"mydata:/opt/data"
## overwrite WORKDIR in Dockerfile working_dir:/opt/web
## containers in the same network are reachable by others networks: -nodeapp-network mongo: container_name:mongo build: context:. dockerfile:mongo/mongo-dockerfile ports: -"27017" env_file: -mongo.env/mongo.${APP_ENV}.env networks: nodeapp-network: ## can be accessed by `mongo` or `db` in nodeapp-network aliases: -db
redis: container_name:redis ## pull image from other places image:redis:latest
networks: nodeapp-network: ## docker defaults to use bridge driver in single host driver:bridge
## named volumes volumes: ## default use 'local' driver mydata: dbdata:
More about environement variables:
By default, the docker-compose command will look for a file named .env in the directory you run the command. By passing the file as an argument, you can store it anywhere and name it appropriately, for example, .env.ci, .env.dev, .env.prod. Passing the file path is done using the --env-file option:
1
docker-compose --env-file ./config/.env.dev up
.env contains key=value format equaltions.
Storage
Image is set of read-only layers (shared), whereas container has its unique thin read write layer but it is ephemeral.
Stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
A given volume can be mounted into multiple containers simultaneously. When no running container is using a volume, the volume is still available to Docker and is not removed automatically.
When you mount a volume, it may be named or anonymous.
Volumes also support the use of volume drivers, which allow you to store your data on remote hosts or cloud providers, among other possibilities.
If you need to specify volume driver options, you must use --mount, -v 的表示比较局限, 这里只是一个简单的例子, 实际上用--mount的配置选项很多:
1 2 3 4 5 6 7 8 9 10 11
## docker will create volume myvol2 automatically if it does exist docker run -d \ --name devtest \ -v myvol2:/app[:ro] \ nginx:latest
## or docker run -d \ --name devtest \ --mount source=myvol2,target=/app[,readonly] \ nginx:latest
If the container has files or directories in the directory to be mounted (such as /app/ above), the directory’s contents are copied into the volume, other containers which use the volume also have access to the pre-populated content.
May be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
The file or directory does not need to exist on the Docker host already. It is created on demand if it does not yet exist.
Bind mounts are very performant, but they rely on the host machine’s filesystem having a specific directory structure available.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
docker run -d \ -it \ --name devtest \ -v "$(pwd)"/target:/app[:ro] \ nginx:latest ## or docker run -d \ -it \ --name devtest \ --mount type=bind,source="$(pwd)"/target,target=/app[,readonly] \ nginx:latest
## check Mounts section docker inspect devtest
If you bind-mount into a non-empty directory on the container, the directory’s existing contents are obscured by the bind mount.
tmpfs
https://docs.docker.com/storage/tmpfs/
Only Linux has this option, useful to temporarily store sensitive files.
Stored in the host system’s memory only, and are never written to the host system’s filesystem.
1 2 3 4 5 6 7 8 9 10 11
docker run -d \ -it \ --name tmptest \ --tmpfs /app \ nginx:latest ## or docker run -d \ -it \ --name tmptest \ --mount type=tmpfs,destination=/app,tmpfs-mode=1770,tmpfs-size=1024 \ nginx:latest
networks: frontend: # Use a custom driver driver:custom-driver-1 backend: # Use a custom driver which takes special options driver:custom-driver-2 driver_opts: foo:"1" bar:"2"
Resource
类似于K8s, 也有quota的配置在deploy key下面,但是docker compose file v3 并不支持: ignored by docker-compose up and docker-compose run,虽然可以转换成v2,比如:
//TODO
[ ] https tunnel setup
[ ] docker image build and test
Introduction
http://www.squid-cache.org/
Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a great server accelerator.
noip.com, free hostame + domain <-> public IP mapping. 如果要配置这个hostname对应到router的public IP, 需要设置router把这个流量转移到自己的笔记本某个端口上。
CONNECT Method
Connect主要是用在建立Tunnel. Tunneling can allow communication using a protocol that normally wouldn’t be supported on the restricted network. Tunnel 只是一个通道,里面可以支持一些传输协议, 并不是说tunnel 必须是ssl/tls. 举个例子,你通过一个forward proxy 访问一个服务器,使用HTTPS协议,假设Proxy是一个善良的中间人,它并不知道加密后的流量内容是什么,就不可能像HTTP一样去窥探,拆解packet,于是client会发送一个CONNECT HTTP请求,设立一个Tunnel经过proxy和server进行通信。
–>> When should one use CONNECT
With SSL(HTTPS), only the two remote end-points understand the requests, and the proxy cannot decipher them. Hence, all it does is open that tunnel using CONNECT, and lets the two end-points (webserver and client) talk to each other directly.
–>> MDN web docs: CONNECT
Some proxy servers might need authority to create a tunnel. See also the Proxy-Authorization header.
For example, the CONNECT method can be used to access websites that use SSL (HTTPS). The client asks an HTTP Proxy server to tunnel the TCP connection to the desired destination. The server then proceeds to make the connection on behalf of the client. Once the connection has been established by the server, the Proxy server continues to proxy the TCP stream to and from the client.
这篇文章很不错:
–>> MDN web docs: proxy servers and tunneling
There are two types of proxies: forward proxies (or tunnel, or gateway) and reverse proxies (used to control and protect access to a server for load-balancing, authentication, decryption or caching).
Forward proxies can hide the identities of clients whereas reverse proxies can hide the identities of servers.
The HTTP protocol specifies a request method called CONNECT. It starts two-way communications with the requested resource and can be used to open a tunnel. This is how a client behind an HTTP proxy can access websites using SSL (i.e. HTTPS, port 443). Note, however, that not all proxy servers support the CONNECT method or limit it to port 443 only.
Basic Authorization
这里提一下authz and authn的区别:
authz: authorization,授权, what are allowed to do.
authn: authentication, 鉴权, who you are.
这里是讲了HTTP 基本的authz操作.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication
HTTP provides a general framework for access control and authentication. This page is an introduction to the HTTP framework for authentication, and shows how to restrict access to your server using the HTTP “Basic” schema.
The Basic authz is not secure, send in plain text, although base64. can be decode for example: