Advanced System Design Trade-offs — Khi "It depends" là câu trả lời đúng
Ở level Senior, bạn thiết kế hệ thống. Ở level Staff, bạn defend trade-offs và navigate constraints mà Senior không thấy. Đây là nơi "kinh nghiệm chiến trường" tạo ra sự khác biệt.
System design không phải về việc vẽ ra kiến trúc đẹp nhất — mà là chọn được kiến trúc phù hợp nhất với constraints cụ thể: team size, timeline, budget, traffic pattern, consistency requirements.
1. Back-of-Envelope Estimation — Tính nhẩm như Google
1.1 Tại sao cần?
Interviewer: "Design a URL shortener"
Bạn: "Dùng MySQL"
Interviewer: "OK, MySQL handle được bao nhiêu requests?"
Bạn: "Uh..."
Estimation giúp bạn:
→ Chọn đúng technology (Redis vs MySQL vs DynamoDB)
→ Size infrastructure (bao nhiêu server? bao nhiêu storage?)
→ Identify bottleneck TRƯỚC khi build
→ Prove tính khả thi cho stakeholders
1.2 Magic Numbers cần nhớ
Latency:
L1 cache: 1 ns
L2 cache: 4 ns
RAM access: 100 ns
SSD random read: 16 µs
HDD random read: 2 ms (100x chậm hơn SSD)
Round trip same DC: 0.5 ms
Round trip US→EU: 75 ms
Throughput:
MySQL: ~5K-10K QPS (single node, depends on query)
PostgreSQL: ~5K-15K QPS
Redis: ~100K-500K QPS (single node)
Kafka: ~100K-1M msg/s (per partition)
Storage:
1 character: 1 byte (ASCII) hoặc 1-4 bytes (UTF-8)
1 UUID: 36 bytes (string) hoặc 16 bytes (binary)
1 timestamp: 8 bytes
1 email: ~30 bytes average
1 tweet: ~300 bytes (140 chars + metadata)
1 image: ~300 KB (compressed JPEG)
1 video minute: ~50 MB (720p)
Scale:
1 million = 10^6 ≈ 1 MB data nếu mỗi record 1 byte
1 billion = 10^9 ≈ 1 GB
1 day = 86,400 seconds ≈ ~100K seconds (dễ nhớ)
1.3 Estimation Framework
Ví dụ: URL Shortener
Step 1: Traffic estimation
→ 100M URLs created/month = ~40 URLs/second (write)
→ Read:write ratio = 100:1
→ Read: 4000 URLs/second (read)
→ Peak: 3x average = 12K QPS read
Step 2: Storage estimation
→ URL record: short_url(7B) + long_url(200B) + created(8B) ≈ 250B
→ 100M/month × 12 months × 5 years = 6B records
→ 6B × 250B = 1.5 TB
→ Với index: ~2-3 TB total
Step 3: Bandwidth
→ Read: 4000 QPS × 250B = 1 MB/s
→ Manageable cho single server
Step 4: Technology decision
→ 12K QPS read → MySQL OK (với cache)
→ Hoặc Redis cho cache layer: 12K QPS = trivial cho Redis
→ 3 TB storage → single MySQL server đủ (partition sau)
Kết luận: Single MySQL + Redis cache. Simple. Đủ dùng 5 năm.
Đừng over-engineer.
2. Consistency Models — Deep Dive
2.1 Spectrum of Consistency
Strong ←────────────────────────────────→ Weak
Linearizable → Sequential → Causal → Eventual
↑ ↑ ↑ ↑
Banking Inventory Social Analytics
Payment Stock level Feed Page views
2.2 Khi nào dùng cái nào?
Linearizability (strongest):
Mọi đọc đều thấy write mới nhất. Như 1 copy duy nhất.
Use cases:
→ Bank balance (không được âm)
→ Distributed lock (chỉ 1 holder)
→ Leader election
Trade-off: CHẬM (cần coordination), low throughput
Tools: etcd, ZooKeeper, CockroachDB (serializable)
Causal consistency:
Operations có quan hệ nhân quả → ordered.
Concurrent operations → có thể thấy thứ tự khác nhau.
Use cases:
→ Social media: reply phải thấy SAU post gốc
→ Chat: message order within conversation
→ Document editing (Google Docs style)
Trade-off: Tốt hơn eventual, nhẹ hơn linearizable
Eventual consistency:
Mọi replica SẼ converge... eventually.
Use cases:
→ View count, like count (sai vài giây OK)
→ DNS propagation
→ Shopping cart (Amazon nổi tiếng dùng)
→ CDN content
Trade-off: Fast, highly available, nhưng "read your own write"
có thể bị miss.
Trick: Eventual + "read-your-writes" consistency
→ Client luôn thấy writes CỦA CHÍNH MÌNH
→ Implement: read from primary sau khi write, sticky sessions
→ Best of both worlds cho nhiều use cases
3. Multi-Region Architecture
3.1 Patterns
Active-Passive (simple):
┌──────────┐ ┌──────────┐
│ Region A │ async repl │ Region B │
│ (active) │ ──────────────→│ (standby)│
│ All traffic │ No traffic│
└──────────┘ └──────────┘
Pros: Simple, strong consistency within region
Cons: Failover = downtime (minutes), wasted standby capacity
Use when: DR requirement, RPO minutes OK
Active-Active (complex):
┌──────────┐ ┌──────────┐
│ Region A │ ←── sync ───→ │ Region B │
│ Serves │ hoặc │ Serves │
│ traffic │ async │ traffic │
└──────────┘ └──────────┘
Pros: No failover downtime, utilize all capacity, low latency
Cons: Conflict resolution, complex, eventual consistency
Use when: Global users, ultra-high availability (99.99%+)
3.2 Data Strategies cho Multi-Region
Strategy 1: Shard by geography
→ User ở VN → data ở Singapore region
→ User ở US → data ở US region
→ Mỗi user thuộc 1 "home region"
→ Cross-region reads cho social features
→ Đơn giản nhất, avoid conflicts
Strategy 2: Conflict-free Replicated Data Types (CRDTs)
→ Data structure tự merge conflicts
→ Counter: tăng ở region A và B → merge = tổng
→ Set: add ở A, add ở B → merge = union
→ Shopping cart (Amazon-style)
→ Use khi: mọi region cần write same data
Strategy 3: Last-Write-Wins (LWW)
→ Mỗi write có timestamp, mới nhất thắng
→ Simple nhưng có thể LOSE DATA
→ Chỉ dùng khi data loss acceptable
→ Profile update, preferences
3.3 Geo-Routing
┌─── VN users ──→ SG Region
│
DNS/LB ──┼─── EU users ──→ EU Region
│
└─── US users ──→ US Region
Options:
→ DNS-based (Route 53 Geo): Simple, ~TTL delay khi failover
→ Anycast (Cloudflare): Fastest, IP-level routing
→ Application-level: Token/cookie chứa region hint
4. Large-Scale Migration Strategies
4.1 Strangler Fig Pattern
Thay vì rewrite toàn bộ (Big Bang = Big Risk):
Phase 1: [Old System] ← 100% traffic
Phase 2: [Old System] ← 90% traffic
[New System] ← 10% traffic (1 feature)
Phase 3: [Old System] ← 50%
[New System] ← 50%
Phase 4: [New System] ← 100% traffic
[Old System] ← shutdown 🎉
Tên gọi từ Strangler Fig tree: dây leo bao quanh cây chủ,
dần dần thay thế hoàn toàn. Không cần chặt cây cũ.
4.2 Zero-Downtime Database Migration
Step 1: Dual-write
App writes → Old DB + New DB (async hoặc via CDC)
App reads → Old DB only
Step 2: Shadow read
App writes → Old DB + New DB
App reads → Old DB + New DB (compare results, log diff)
Fix inconsistencies
Step 3: Switch read
App writes → Old DB + New DB
App reads → New DB (fallback to Old DB nếu error)
Step 4: Stop dual-write
App writes → New DB only
App reads → New DB only
Keep Old DB read-only (rollback safety, 1-2 weeks)
Step 5: Decommission
Drop Old DB connection, shutdown
Mỗi step có rollback plan. Nếu step 3 fail → quay về step 2.
4.3 Feature Flags cho Migration
// Feature flag control migration
func GetUserProfile(ctx context.Context, userID string) (*Profile, error) {
if featureflag.IsEnabled("use-new-profile-service", userID) {
// New service — migrate dần dần
profile, err := newProfileService.Get(ctx, userID)
if err != nil {
// Fallback to old service nếu new service lỗi
metrics.Inc("new_profile_fallback")
return oldProfileService.Get(ctx, userID)
}
return profile, nil
}
return oldProfileService.Get(ctx, userID)
}
5. Data Modeling Decisions
5.1 Decision Matrix
Relational (PostgreSQL, MySQL):
✅ Complex queries, JOINs, transactions
✅ ACID compliance
✅ Well-understood, mature tooling
❌ Horizontal scaling khó (sharding complexity)
→ Use for: Core business data, financial transactions
Document (MongoDB, DynamoDB):
✅ Flexible schema, easy horizontal scaling
✅ Good for hierarchical data
❌ No JOINs (denormalize hoặc application-side join)
❌ Eventual consistency (default)
→ Use for: Product catalogs, user profiles, content
Columnar (ClickHouse, BigQuery):
✅ Blazing fast aggregation queries
✅ Excellent compression
❌ Slow single-row operations
❌ Not for OLTP
→ Use for: Analytics, time-series, logs
Graph (Neo4j, Neptune):
✅ Relationship-heavy queries (friends of friends)
✅ Pattern matching
❌ Niche, smaller ecosystem
→ Use for: Social networks, recommendation, fraud detection
Time-series (TimescaleDB, InfluxDB):
✅ Optimized for time-ordered data
✅ Built-in downsampling, retention policies
→ Use for: Metrics, IoT sensor data, stock prices
6. API Design at Scale
6.1 Backward Compatibility — Hyrum's Law
Hyrum's Law: "Với đủ số lượng users, mọi observable
behavior của API sẽ được ai đó depend vào."
Hệ quả:
→ Error message thay đổi → client parse error message sẽ break
→ Response field order thay đổi → ai đó đang compare JSON string
→ Thêm field mới → client strict deserialization sẽ fail
Nguyên tắc:
→ Additive changes OK: thêm field, thêm endpoint
→ Breaking changes CẦN versioning
→ Document contract, không chỉ behavior
→ Breaking change = new version + deprecation notice + sunset date
6.2 Pagination at Scale
Offset-based (simple, flawed):
GET /items?offset=1000&limit=20
❌ Offset lớn → query chậm (MySQL: OFFSET 1000000 = scan 1M rows)
❌ Data thay đổi giữa pages → miss/duplicate items
Cursor-based (scalable):
GET /items?cursor=eyJpZCI6MTAwMH0&limit=20
Response:
{
"items": [...],
"next_cursor": "eyJpZCI6MTAyMH0",
"has_more": true
}
✅ Consistent O(1) query time
✅ No duplicate/missing items
❌ Không jump to page N (nhưng hầu hết UI không cần)
Keyset-based (best for sorted data):
GET /items?created_after=2024-03-15T10:00:00Z&limit=20
WHERE created_at > '2024-03-15T10:00:00Z'
ORDER BY created_at ASC LIMIT 20
✅ Sử dụng index hiệu quả
✅ Consistent performance
6.3 Deprecation Strategy
Lifecycle:
ACTIVE → DEPRECATED → SUNSET → REMOVED
DEPRECATED (6 months):
→ Thêm header: Deprecation: true
→ Thêm header: Sunset: Sat, 01 Mar 2025 00:00:00 GMT
→ Log usage để track migration progress
→ Email/notify consumers
SUNSET (3 months):
→ Return warning trong response body
→ Throttle deprecated endpoint (giảm rate limit)
→ Contact remaining consumers trực tiếp
REMOVED:
→ Return 410 Gone với message hướng dẫn migration
→ Sau 1 tháng → 404
7. Capacity Planning
7.1 Framework
Step 1: Current baseline
→ Current QPS, storage, compute usage
→ Growth rate (monthly/quarterly)
Step 2: Project forward
→ "Nếu grow 3x trong 12 tháng, cần gì?"
→ Compute: CPU cores, memory
→ Storage: disk, backup
→ Network: bandwidth, connections
Step 3: Identify cliff
→ "Ở mức nào system sẽ break?"
→ Database: max connections, disk IOPS
→ Application: memory limit, CPU saturation
→ Network: bandwidth limit
Step 4: Plan ahead
→ Buffer: plan cho 2x expected peak
→ Lead time: ordering hardware/instances
→ Scaling strategy: horizontal vs vertical
7.2 Cost-Aware Architecture
Ở Staff level, bạn phải think về $$ nữa:
"Solution A: 99.99% availability, $50K/month"
"Solution B: 99.9% availability, $8K/month"
Difference: 0.09% = ~39 phút/tháng downtime thêm
Savings: $42K/month = $504K/year
Business question: 39 phút downtime/tháng
có cost > $504K/year không?
→ Nếu là e-commerce $1M revenue/day: YES, choose A
→ Nếu là internal tool: NO, choose B
→ Đây là cách Staff Engineer think.
8. System Design Interview — Staff Level Framework
Framework (35-40 phút):
1. Requirements (5 min) — DRIVE conversation
→ Functional: "System cần làm gì?"
→ Non-functional: Scale? Latency? Consistency?
→ Constraints: "Existing infra? Team size?"
→ KHÔNG assume — ASK
2. High-level Design (5 min)
→ Sketch components trên whiteboard
→ "Có 2 approaches, tôi sẽ explain trade-offs..."
→ Chọn approach, explain WHY
3. Deep Dive (15-20 min)
→ Interviewer sẽ chọn area focus
→ Data model, API design, scaling strategy
→ Nói TRADE-OFFS, không chỉ "tôi dùng X"
4. Scale & Edge Cases (5 min)
→ "Nếu traffic 100x, bottleneck ở đâu?"
→ "Failure mode: DB down thì sao?"
→ Monitoring & alerting strategy
Staff signals interviewer tìm:
✅ Clarify trước khi design (không assume)
✅ Nói trade-offs cho MỌI decision
✅ Acknowledge uncertainty: "Tôi cần benchmark, nhưng..."
✅ Think about cost, team, timeline — not just tech
✅ Listen & adjust khi interviewer push back
✅ Drive conversation, không chờ hỏi
Tài liệu tham khảo
- Designing Data-Intensive Applications — Martin Kleppmann
- System Design Interview Vol 1 & 2 — Alex Xu
- Martin Fowler: Strangler Fig
- AWS Multi-Region Architectures
- Google SRE: Data Processing Pipelines
💡 Remember: Staff-level system design = "I chose X over Y because Z, and here's when we'd reconsider." Không phải "Dùng Kafka vì nó tốt." 🎯