⚙️ Software✍️ Khoa📅 20/04/2026☕ 7 phút đọc

System Design: CAP, PACELC và Consistency Foundations

1. CAP Theorem — Chi tiết

1.1 Định nghĩa

Eric Brewer (2000), chứng minh bởi Gilbert & Lynch (2002):

Một distributed system không thể đồng thời đảm bảo cả 3:

Consistency: Mọi read nhận được write gần nhất hoặc error

Availability: Mọi request nhận được response (không phải error), nhưng có thể stale

Partition Tolerance: System tiếp tục hoạt động dù có network partition

1.2 Tại sao phải chọn P?

Network Partition = một số nodes không liên lạc được với nhau

          Node A ══════╳══════ Node B
          (viết data)   ↑       (chưa nhận)
                   partition

Trong thực tế: network partition LUÔN CÓ THỂ xảy ra
→ Phải chọn P (không thể bỏ P)
→ Trade-off thực sự chỉ là: C vs A khi có partition

1.3 CP vs AP — Trade-off thực sự

CP (Consistency + Partition Tolerance):
  Khi có partition → từ chối serve request (return error)
  → Đảm bảo không trả về stale data
  → System có thể unavailable tạm thời

  Ví dụ: ZooKeeper, HBase, etcd, Consul
  Use case: bank transactions, distributed locks,
            bất kỳ thứ gì cần correctness > availability

AP (Availability + Partition Tolerance):
  Khi có partition → vẫn serve request (có thể stale data)
  → Luôn available
  → Data có thể inconsistent tạm thời → eventual consistency

  Ví dụ: Cassandra, DynamoDB, CouchDB, DNS
  Use case: shopping cart, user profiles,
            social media feeds, bất kỳ thứ gì cần availability > correctness

CA (Consistency + Availability) — KHÔNG TỒN TẠI trong distributed systems:
  Chỉ đạt được trong single-node system (không có partition)
  Ví dụ: PostgreSQL single-node (nhưng không phải distributed)

1.4 CAP trong thực tế — Không chỉ là binary

CAP thực tế là spectrum, không phải binary choice:

                    Consistency
                         ▲
                         │
              etcd    ZooKeeper
              HBase   Spanner
                 \      │
                  \     │
                   \    │
    ────────────────────────────────► Availability
                   /    │
                  /     │
                 /      │
           Cassandra  DynamoDB
           CouchDB    Riak

Database thực tế cho phép CHỌN per-operation:
  Cassandra:
    QUORUM reads/writes → strong consistency
    ONE reads/writes → high availability, eventual consistency

  DynamoDB:
    Strongly consistent reads (higher cost)
    Eventually consistent reads (default, cheaper)

2. PACELC — Mở rộng CAP

Daniel Abadi (2012) — CAP chỉ nói về khi có partition, còn bình thường thì sao?

PACELC:
  If Partition:  chọn A (availability) hoặc C (consistency)
  Else (bình thường): chọn L (latency) hoặc C (consistency)

  "PA/EL": High availability khi partition, Low latency bình thường
    → Cassandra, DynamoDB
    → Replicate async → thấp latency, có thể đọc stale

  "PC/EC": Consistent khi partition, Consistent bình thường
    → VoltDB, Megastore
    → Replicate sync → high latency, always consistent

  "PA/EC": High availability khi partition, Consistent bình thường
    → MongoDB (với default settings)

Tại sao PACELC quan trọng hơn CAP?
  Trong thực tế, partitions hiếm (vài lần/năm)
  Latency vs Consistency là trade-off HÀNG NGÀY
  PACELC mô tả tốt hơn behavior thực tế của databases

3. Consistency Models — Từ Strong đến Eventual

Strong ──────────────────────────────────────────────► Weak
  │                                                      │
Linearizability  Sequential  Causal  Read-your-writes  Eventual

3.1 Linearizability (Strongest)

= Real-time ordering

Mọi operation có vẻ xảy ra tại một thời điểm duy nhất
giữa invoke và complete của nó.

Timeline:
  Client A: ──[Write x=1]──────────────────
  Client B: ──────────────[Read x]──────►  → phải thấy x=1

  Client A: ──[Write x=1]──────────────────
  Client B: ──[Read x]──►                  → có thể thấy x=0 hoặc x=1
                                             (đọc trước write complete)

Implementation: consensus protocols (Raft, Paxos)
Cost: high latency (must coordinate)
Used in: etcd, ZooKeeper, Google Spanner

3.2 Sequential Consistency

Kết quả giống như tất cả operations được thực thi theo một thứ tự nào đó,
và operations của mỗi process xuất hiện đúng thứ tự trong sequence đó.

Khác linearizability: KHÔNG yêu cầu real-time ordering
→ Write có thể "appear" sau khi complete
→ Nhưng tất cả processes thấy CÙNG ordering

Client A: Write(x=1), Write(x=2)
Client B: Read(x)=2, Read(x)=1  ← KHÔNG hợp lệ! (B thấy khác thứ tự)
Client B: Read(x)=1, Read(x)=2  ← hợp lệ
Client B: Read(x)=2, Read(x)=2  ← hợp lệ (thấy x=2 ngay từ đầu)

3.3 Causal Consistency

Các operations có quan hệ nhân-quả phải được thấy theo đúng thứ tự.
Các operations không liên quan có thể thấy theo thứ tự khác nhau.

Client A: Write(x=1)
Client B: Read(x)=1, Write(y="based on x=1")
Client C: Read(y)="based on x=1" → PHẢI thấy Read(x)=1 (causal dependency)
Client D: có thể thấy Write(x=1) và Write(y) theo thứ tự bất kỳ
          (nếu D không biết về causal link)

Implementation: vector clocks, causal+ consistency (COPS system)
Used in: MongoDB causal sessions, some Cassandra configs

3.4 Read-Your-Writes (Session Consistency)

Sau khi client write, client đó luôn thấy write của mình.
Clients khác không được đảm bảo.

Client A: Write(x=1) → Read(x) = 1  ✅ (guaranteed)
Client B: Read(x) = 0 hoặc 1        (không guaranteed)

Implementation:
  - Sticky sessions (luôn read từ same replica)
  - Write token / replication position tracking
  - Read from primary sau khi write

Common need: User submit form → reload page → thấy changes

3.5 Eventual Consistency (Weakest)

Nếu không có writes mới, cuối cùng tất cả replicas
sẽ converge về cùng giá trị.

"Cuối cùng" = không có time bound guarantee

Điều KHÔNG được đảm bảo:
  - Khi nào converge
  - Thứ tự operations khác clients thấy
  - Read-your-writes

Điều được đảm bảo:
  - Nếu hệ thống "yên tĩnh" đủ lâu → tất cả nodes nhất quán

Used in: DNS propagation, Cassandra (default), DynamoDB (default)
         shopping cart, social media likes, analytics counters

4. Eventual Consistency — Patterns & Techniques

4.1 Conflict Resolution

Khi 2 nodes accept writes concurrently → conflict khi merge:

Last-Write-Wins (LWW):
  Dùng timestamp → write mới nhất thắng
  Pro: đơn giản
  Con: clock skew → sai kết quả, lost updates

  Client A: Write(x=1) at t=100
  Client B: Write(x=2) at t=99  ← mất! (dù write sau về thực tế)

Multi-Value (keep conflicts):
  Giữ tất cả conflicting versions → application resolve
  Amazon Dynamo: trả về list of values khi có conflict
  Shopping cart: merge (union) các items

CRDTs (Conflict-free Replicated Data Types):
  Data structures designed để merge luôn đúng,
  bất kể thứ tự operations.

  G-Counter (Grow-only counter):
    Mỗi node có counter riêng
    [Node1=5, Node2=3, Node3=7]
    Total = sum = 15
    Merge = max của mỗi node's counter
    → Không bao giờ conflict!

  PN-Counter (Positive-Negative):
    P-counter + N-counter → increment + decrement
    Value = sum(P) - sum(N)

  LWW-Element-Set:
    Set với timestamp per element
    Add và remove đều có timestamp
    Most recent operation wins per element

  OR-Set (Observed-Remove Set):
    Giải quyết Add-Remove conflict
    Mỗi element có unique tag khi add
    Remove xóa tất cả observed tags
    → Concurrent add+remove: add thắng (element vẫn ở trong set)

  Used in: Redis CRDT (Enterprise), Riak, collaborative editors (Yjs, Automerge)

4.2 Read Repair & Hinted Handoff

Read Repair (Cassandra):
  Client đọc từ N replicas → so sánh versions
  Nếu khác nhau → gửi write để sync replica stale
  → Tự động heal inconsistency khi có reads

  Client ──READ──► Replica 1: x=1 (latest)
         ──READ──► Replica 2: x=0 (stale)
         ──READ──► Replica 3: x=1 (latest)
  Client thấy x=1 (quorum)
  → Background: gửi x=1 đến Replica 2

Hinted Handoff:
  Node B down → Node A nhận write thay B
  Lưu "hint": "khi B back online, gửi data này"
  Khi B online → A deliver hints → B caught up

Anti-Entropy (Merkle Trees):
  Background process so sánh data giữa replicas
  Dùng Merkle tree để efficient diff:
    Hash từng leaf (data block)
    Hash từng internal node = hash(children)
    So sánh root hash → nếu khác → drill down tìm diff
  → O(log n) để tìm ra sự khác biệt
  → Sync chỉ phần khác, không cần full data transfer

4.3 Quorum

N = total replicas
W = replicas phải confirm write
R = replicas phải respond read

Strong Consistency: R + W > N

Ví dụ N=3:
  W=2, R=2: R+W=4 > 3 → strong (overlap đảm bảo ít nhất 1 node có latest)
  W=3, R=1: R+W=4 > 3 → strong (all writes confirmed, any read works)
  W=1, R=1: R+W=2 < 3 → eventual (fast but may read stale)
  W=1, R=3: R+W=4 > 3 → strong (read all, take latest)

Cassandra consistency levels:
  ONE:    W=1, R=1  → highest availability, eventual
  QUORUM: W=N/2+1, R=N/2+1 → balanced
  ALL:    W=N, R=N  → strongest, lowest availability

Latency vs Consistency:
  Tăng W → write latency tăng (chờ nhiều nodes hơn)
  Tăng R → read latency tăng
  Giảm W, R → lower latency, risk stale reads