System Design

This blog is for system design, please revisit frequently to refresh. The notes are mainly from https://www.educative.io/ and Youtube channel.

有的系统设计主要是各功能部件合理组合:

  1. Design Instagarm
  2. Design Dropbox
  3. Design Twitter post tweets(photos, videos), follow others, favorite tweets generate timeline of top tweets low latancy highly available consistency can take a hit

storage: text + photo + video ingress (write): new generated storage / sec egress (read): read volume / sec

read heavy system data sharding: user id -> tweet id -> (creation time + tweet id, sort by time) query all servers and aggregate

cache for hot users and tweets

  1. Designing Twitter Search

  2. Designing a Web Crawler (BFS, modular, url frontier, DNS, fetcher, DIS, content filter, extractor, url filter)

  3. Designing Facebook Messenger each chat server serves a bunch of users, LB maps user to it’s chat server, chat server commuicate with each other to send/receive message message handling: long polling to receive message hashtable keep track of online user, if offline, notify delivery failure to sender handle message order: 单独靠timestamp不行,use sequence number with every message for each user

database: support high frequence write/read row, quick small updates, range based search: HBase, column-oriented key-value NoSQL database partition by UserID, low latency

有的主要涉及到了数据结构和算法:

  1. Typeahead suggestion (trie, reference)

  2. API rate limiter (dynamic sliding window)

  3. Designing Facebook’s Newsfeed (offline feed generation) contain updates, posts, video, photos from all people user follows user average has 200 followers, 300M DAU, fetch 5 times a day, 1KB each post, so can get traffic. cache each users’ news feed in mem for quick fetch. feed generation: retrieve, rank, store offline generate by dedicated servers, Map<UserID, LikedHashMap/TreeMap<PostID, PostItem>> + LastGenerateTime in memory, LRU cache for user or find user’s activity pattern to help generate newsfeed feed publishing: push to notify, pull for serving

  4. Designing Yelp (querying, objects don’t change often, QuadTree) 解释一下我的理解,这里partition讲的是partition quadtree. 从DB中读location id, 通过hashing map to different quadtree server (这个mapping其实就是quadtree index,可以在quadtree server fail后用来重新构造它的数据),然后各自构造自己的quadtree.这些quadtree servers有一个aggregator server(它有自己的copies)。于是每次request要去所有quadtree server查询,然后聚合返回的数据。对于每个quadtree server,它所包含的location id也有一个本地的mapping, to know which DB servers contains this locatio id info. 这个mapping也使用的hashing实现。

  5. Designing Uber backend (requirements, objects do change often, QuadTree)

  6. Design Ticketmaster (first come first serve, highly concurrent, financial transactions ACID)

CAP Theorem

CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance.

When we design a distributed system, trading off among CAP is almost the first thing we want to consider.

Thinking process

  1. requirements clarification
  2. back of the envelope estimation: scale, storate, bandwidth.
  3. system interface definition
  4. defining data model
  5. high level design
  6. detailed design
  7. identifying and resolving bottlenecks

Crucial Components

这里的笔记主要根据以下几点展开:

  1. Database (book: 7 weeks 7 databases)
  2. Cache system (redis, memcache)
  3. Message queue (kafka && zookeeper or others)
  4. Load balancer (nginx, Round Robin approach)
  5. Log systems
  6. monitor system
  7. My domain of knowledge k8s, docker, micro-services

Key Characteristics of Distributed Systems

Scalability: scaling without performance loss (but actually will). Reliability: keep delivering services when some components fail. Availability: reliable means available, but not vice versa Efficiency: latency and (throughput)bandwidth. Manageability: ease of diagnosing and understanding problems when they occur.

常用技术知识

备份的说法:

Standby replicas Failover to other healthy copies Duplicates Backup (spare) Redundancy (redundant secondary copy)

NoSQL Database:

An Introduction To NoSQL Databases Big Data: social network, search engine, traditional methods of processing and storage are inadequate.

  1. Key-value stores: Redis, Dynamo (redis can also be cache)
  2. Document database: MongoDB, Couchbase
  3. Wide-column database: Cassandra, HBase
  4. Graph database: Neo4J

Advantage of NOSQL database: no data models(no pre-defind schema), unstructed , easy to scale up and down (horizontal data sharding), high performance with big data.

Advantage of SQL database: relational data, normalization (eliminate redundancy), SQL, data integrity, ACID compliance.

Consistent Hashing (with virtual replicas)

https://www.youtube.com/watch?v=ffE1mQWxyKM Using hash mod strategy is not efficient, think about that add a new server, then original 20 % 3 = 2 now is 20 % 4 = 0. We have to re-organize all the existing mappings.

https://www.youtube.com/watch?v=zaRkONvyGr8 Consistent hashing can be used in many situations, like distributed cache, load balancing, database, etc.

For example, we have n servers. Hash the request and get the location of it in the ring, find the server with hash value equal or larger than it and send this request to that server (clockwise move). But server may not distributed in ring evenly or the requests is not uniformly (thus server load factor is not 1/n), so we can use virtual replicas, this can implement by other hash function.

With contsistent hashing, add or remove servers will not cause much overhead. The new added server will grab objects from its near servers and removed server, all original objects will move to next server after the removed one.

Long Polling (轮询)

https://www.jianshu.com/p/d3f66b1eb748?from=timeline&isappinstalled=0 和一般的polling都属于pull(拉模式)。

题外话: push模式其实也是建立了一个持久的connection,但server一旦有新的信息就会push给client,而不会去在乎client的处理能力,这是一个缺点, long polling对于client要更灵活一些(因为client会request first)。

This is a variation of the traditional polling technique that allows the server to push information to a client whenever the data is available. With Long-Polling, the client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately (keep the connection connected). That’s why this technique is sometimes referred to as a Hanging GET.

Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed due to timeouts or receive the disconnect from server.

如果client突然unavailable了,如何检测呢?这个connection是如何保持的?我猜想的是connection保持期间,并不需要额外的sync查看server client是否健在(我记得TCP有一个机制会检测这个connection是否健康?)。如果server 发送了message未收到acknowledge则说明client不在了,则connection中断。

Data Sharding

https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6 Horizontal partitioning is also called as Data Sharding

Web Server vs Application Server

https://stackoverflow.com/questions/936197/what-is-the-difference-between-application-server-and-web-server

proxy server

https://www.educative.io/courses/grokking-the-system-design-interview/N8G9MvM4OR2 A proxy server is an intermediate server between the client and the back-end server.

Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compressing a resource). Another advantage of a proxy server is that its cache can serve a lot of requests.

  1. open (forwarding) proxy: hide clients
  2. reverse proxy: hide servers

Map Reduce

We can have a Map-Reduce (MR) set-up These MR jobs will calculate frequencies of all searched terms in the past hour.

Exponential Moving Average (EMA)

In EMA, we give more weight to the latest data. It’s also known as the exponentially weighted moving average.

Some Design Bottlenecks

  1. data compression 需要吗, 如何选择?

  2. capacity estimation: metadata + content 两方面都要考虑,high level estimations 主要包括: storage for each day, storage for years, incoming bandwidth, outgoing bandwidth. 这些主要来自于: Total user, Daily active user (DAU), size of each request, how many entries each user produce, data growth, 有时对某个量单独估计比较好。

  3. read heavy or wirte heavy? bandwidth, ingress: 每日新增数据总量/秒; egress: 用户浏览或下载总量/秒.

  4. database需要有哪些符合场景的特点? 比如quick small updates, ACID, range based search, etc.

  5. how about consider the peak time read and wirte throughput.

  6. hot user in database handle, 怎么设计database去减轻这个问题.

  7. we may need aggregator server for fetching and process data from different DB or caches.

  8. monitoring system, collect metrics: daily peak, latency. we will realize if we need more replication, load balancing, or caching.

  9. load balancer can sit: between client and web server, web server and application server (or cahce), application server and database. load balancer can be single point of failure, need redundancy to take over when main is down.

  10. load balancer: Round Robin approach, or more intelligent.

  11. cache policy, LRU, 80-20 rule.

Other System Design Videos:

Introduce to System Design

Introduce to System Design 同样推荐了这本书<<Designing Data Intensive Applications>>, 会对这些topics有更深入的讲解。

  1. ask good question: which features care about, which not? how much to scale (data, request, latency)
  2. don’t use buzzword (be clear about the tech you use)
  3. clear and organized thinking
  4. drive discussion (80% I talk)

Things to consider:

  1. Features
  2. API
  3. Availability
  4. Latency
  5. Scalability
  6. Durability
  7. Class Diagram
  8. Security and Privacy
  9. Cost-effective

Concepts to know:

  1. Vertical vs horizontal scaling
  2. CAP theorem
  3. ACID vs BASE
  4. Partitioning/Sharding
  5. Consistent Hashing
  6. Optimistic vs pessimistic locking
  7. Strong vs eventual consistency
  8. RelationalDB vs NoSQL
  9. Types of NoSQL Key value Wide column Document-based Graph-based
  10. Caching
  11. Data center/racks/hosts
  12. CPU/memory/Hard drives/Network bandwidth
  13. Random vs sequential read/writes to disk
  14. HTTP vs http2 vs WebSocket
  15. TCP/IP model
  16. ipv4 vs ipv6
  17. TCP vs UDP
  18. DNS lookup
  19. Http & TLS
  20. Public key infrastructure and certificate authority(CA)
  21. Symmetric vs asymmetric encryption
  22. Load Balancer
  23. CDNs & Edges
  24. Bloom filters and Count-Min sketch
  25. Paxos
  26. Leader election
  27. Design patterns and Object-oriented design
  28. Virtual machines and containers
  29. Pub-sub architecture
  30. MapReduce
  31. Multithreading, locks, synchronization, CAS(compare and set)

Tools:

  1. Cassandra
  2. MongoDB/Couchbase
  3. Mysql
  4. Memcached
  5. Redis
  6. Zookeeper
  7. Kafka
  8. NGINX
  9. HAProxy
  10. Solr, Elastic search
  11. Amazon S3
  12. Docker, Kubernetes, Mesos
  13. Hadoop/Spark and HDFS

Design Spotify| Apple Muisc | Youtube Music

Design Spotify| Apple Muisc | Youtube Music

  1. scope: cover and what else you are not going to cover
  2. key components (具体分析了一下spotify工作的过程,比如存储,传输protocol转换,low latency, CDN)
  3. data model
  4. scaling

if data size is high, consider compress audio data quality, user use different device and network condition distribution CDN

scaling group 比如一些stateless servers可以使用k8s, containers去管理。

0%