📦 Cache✍️ Khoa📅 22/04/2026☕ 21 phút đọc

Cache — Interview & Big Picture

Cache xuất hiện trong gần như mọi system design interview. Không phải vì nó là silver bullet — mà vì quyết định "cache cái gì, ở đâu, bao lâu, và invalidate như thế nào" reveal rất nhiều về cách engineer suy nghĩ về consistency, performance, và operational complexity.

Bài này tổng hợp big picture: cache trong system design interviews (session cache, leaderboard với Redis sorted sets, distributed rate limiter, cache warming), sizing calculation, và những câu hỏi interviewer thực sự muốn nghe bạn đặt ra — vì câu hỏi đúng thường quan trọng hơn câu trả lời đúng.

1. Session Cache & Distributed Session

Theory: Vấn đề là gì?

Ngày xưa, session đơn giản lắm: user login → server tạo session → lưu vào memory → xong. Đời đẹp. Rồi scale ngang xuất hiện và phá tan tất cả.

Khi bạn có 3 server, user login vào Server A, request tiếp theo đi vào Server B — Server B không biết bạn là ai, bắt login lại. User tức, churn, viết review 1 sao. Product Manager khóc.

Ba cách giải quyết kinh điển:

Sticky session (session affinity): Load balancer nhớ "user này phải vào Server A mãi mãi". Đơn giản, không cần thay đổi code. Nhưng khi Server A chết, toàn bộ session bay theo. Không phải distributed, chỉ là trốn tránh vấn đề.

Session replication: Mỗi server sync session với tất cả server còn lại. Nghe hay, thực tế tệ — N server thì O(N²) network traffic. Đừng làm với hệ thống lớn.

Centralized session store (đây mới là đáp án): Tất cả server đọc/ghi session từ một nơi duy nhất — Redis. Server stateless, Redis stateful. Request vào server nào cũng được, đều lấy session như nhau.

User Request
     │
     ▼
[Load Balancer]
   /    |    \
  S1    S2    S3   ← Stateless, không biết nhau
   \    |    /
     ▼  ▼  ▼
    [Redis Cluster]  ← Session sống ở đây

Deep dive: Thiết kế Session Store

Data model:

Key:   session:{session_id}
Value: {
  user_id: "usr_123",
  roles: ["admin", "editor"],
  created_at: 1700000000,
  last_active: 1700003600,
  metadata: { ip: "1.2.3.4", user_agent: "..." }
}
TTL: 86400 (24 giờ)

Tạo session:

import redis
import uuid
import json
from datetime import timedelta

r = redis.Redis(host='redis-cluster', port=6379)

def create_session(user_id: str, roles: list) -> str:
    session_id = str(uuid.uuid4())
    session_data = {
        "user_id": user_id,
        "roles": roles,
        "created_at": int(time.time()),
        "last_active": int(time.time()),
    }
    r.setex(
        f"session:{session_id}",
        timedelta(hours=24),
        json.dumps(session_data)
    )
    return session_id

def get_session(session_id: str) -> dict | None:
    data = r.get(f"session:{session_id}")
    if not data:
        return None  # Session expired hoặc không tồn tại
    session = json.loads(data)
    # Sliding window: reset TTL mỗi lần access
    r.expire(f"session:{session_id}", timedelta(hours=24))
    return session

Sliding vs Fixed TTL — trade-off thực tế:

	Fixed TTL	Sliding TTL
UX	Session hết hạn dù đang dùng	Session tồn tại khi dùng liên tục
Security	Tốt hơn	Rủi ro nếu token bị steal và dùng liên tục
Redis load	Thấp	`EXPIRE` thêm mỗi request
Dùng khi	Banking, healthcare	Social media, e-commerce

Session invalidation — phần mà mọi người hay quên:

def logout(session_id: str):
    r.delete(f"session:{session_id}")

def logout_all_devices(user_id: str):
    # Cần index ngược: user → list of sessions
    # Lưu thêm: SET user_sessions:{user_id} → {session_id_1, session_id_2, ...}
    session_ids = r.smembers(f"user_sessions:{user_id}")
    pipe = r.pipeline()
    for sid in session_ids:
        pipe.delete(f"session:{sid}")
    pipe.delete(f"user_sessions:{user_id}")
    pipe.execute()

Interview insight: Interviewer hay hỏi "làm sao force logout tất cả thiết bị?" — đây là câu để phân biệt người nghĩ đến edge case và người chỉ nghĩ happy path. Luôn maintain user_sessions:{user_id} index.

Câu hỏi trade-off thường gặp:

Redis vs JWT? — JWT stateless, không cần lookup, nhưng không revoke được. Redis stateful, có thể revoke ngay lập tức. Hệ thống cần logout ngay → Redis. Microservice internal auth → JWT.
Redis single point of failure? → Redis Cluster với replication. Sentinel cho failover tự động.
Session hijacking? → Bind session với IP + User-Agent. Nếu thay đổi, invalidate và yêu cầu re-auth.

2. Leaderboard với Redis Sorted Sets

Theory: Tại sao Sorted Set?

Leaderboard là bài toán tưởng đơn giản nhưng naive approach chết rất nhanh.

Naive approach với SQL:

SELECT user_id, score, RANK() OVER (ORDER BY score DESC) as rank
FROM scores
WHERE game_id = 'game_xyz'
ORDER BY score DESC
LIMIT 100;

Với 10 triệu user, query này là O(N log N) mỗi lần gọi. Page load 5 giây. User bỏ đi. Sếp hỏi tại sao.

Redis Sorted Set: Mỗi element có một score. Redis maintain thứ tự tự động. Insert/update O(log N). Range query O(log N + K) với K là số phần tử trả về. Với 10 triệu user, vẫn nhanh.

Internal structure: Redis dùng skip list + hash table. Skip list cho range queries nhanh, hash table cho O(1) lookup theo member. Không cần biết internals để interview, nhưng nên biết để trả lời câu "tại sao Redis Sorted Set lại nhanh?"

Deep dive: Thiết kế Leaderboard

Basic operations:

LEADERBOARD_KEY = "leaderboard:game_xyz"

# Cập nhật điểm (ZADD tự sort)
def update_score(user_id: str, score: float):
    r.zadd(LEADERBOARD_KEY, {user_id: score})

# Lấy top 100 (index từ 0, WITHSCORES trả kèm điểm)
def get_top_100() -> list:
    return r.zrevrange(LEADERBOARD_KEY, 0, 99, withscores=True)

# Lấy rank của một user (0-indexed, nên +1)
def get_rank(user_id: str) -> int | None:
    rank = r.zrevrank(LEADERBOARD_KEY, user_id)
    return rank + 1 if rank is not None else None

# Lấy điểm của user
def get_score(user_id: str) -> float | None:
    return r.zscore(LEADERBOARD_KEY, user_id)

# Lấy neighbors: user xung quanh rank hiện tại (±5)
def get_neighborhood(user_id: str, window: int = 5) -> list:
    rank = r.zrevrank(LEADERBOARD_KEY, user_id)
    if rank is None:
        return []
    start = max(0, rank - window)
    end = rank + window
    return r.zrevrange(LEADERBOARD_KEY, start, end, withscores=True)

Incremental scoring (thường dùng hơn absolute):

# Tăng điểm thay vì set tuyệt đối
def add_points(user_id: str, points: float):
    r.zincrby(LEADERBOARD_KEY, points, user_id)

Leaderboard nhiều chiều — bài toán thực tế hơn:

leaderboard:global          → top toàn cầu
leaderboard:country:vn      → top Việt Nam
leaderboard:weekly:2024-48  → top tuần này
leaderboard:friends:{user}  → top bạn bè

def update_score_all_dimensions(user_id: str, country: str, points: float):
    current_week = get_current_week_key()  # e.g., "2024-48"
    
    pipe = r.pipeline()
    pipe.zincrby("leaderboard:global", points, user_id)
    pipe.zincrby(f"leaderboard:country:{country}", points, user_id)
    pipe.zincrby(f"leaderboard:weekly:{current_week}", points, user_id)
    pipe.execute()  # Atomic, tất cả hoặc không cái nào

Weekly leaderboard expiry:

def setup_weekly_leaderboard(week_key: str):
    key = f"leaderboard:weekly:{week_key}"
    # Expire sau 8 ngày (1 tuần + buffer để xem kết quả)
    r.expire(key, timedelta(days=8))

Friends leaderboard — bài toán hay nhất:

def get_friends_leaderboard(user_id: str) -> list:
    friends = get_friend_ids(user_id)  # Từ DB hoặc cache khác
    friends.append(user_id)  # Include bản thân
    
    # ZUNIONSTORE tạo sorted set tạm từ nhiều key
    # Nhưng với friends list, cách đơn giản hơn:
    pipe = r.pipeline()
    for friend_id in friends:
        pipe.zscore("leaderboard:global", friend_id)
    scores = pipe.execute()
    
    result = [
        (fid, score) for fid, score in zip(friends, scores)
        if score is not None
    ]
    result.sort(key=lambda x: x[1], reverse=True)
    return result

Interview insight: Khi interviewer hỏi "tính năng nào khó nhất?", câu trả lời là friends leaderboard — vì nó kết hợp social graph với real-time ranking, và social graph thường không fit vào Redis đơn thuần.

Scale issues và solutions:

Vấn đề	Khi nào xảy ra	Solution
Redis memory	>10M users × nhiều leaderboard	Chỉ giữ top N, archive phần còn lại
Write hotspot	Event đột biến (game show, flash sale)	Queue updates, batch write
Friends leaderboard N+1	Friends list lớn	Pre-compute hoặc limit friends
Tie-breaking	Hai user cùng điểm	Score = (points * 10^9) + (10^9 - timestamp)

Tie-breaking trick:

import time

def compute_score_with_tiebreak(points: int) -> float:
    # Người đạt điểm trước xếp trên
    # Nhân 10^9 để có room cho timestamp
    timestamp = int(time.time())
    return points * 1_000_000_000 + (1_000_000_000 - timestamp)

3. Distributed Rate Limiter

Theory: Tại sao cần distributed?

Rate limiter trên single server dễ: dùng in-memory counter. Vấn đề khi scale ngang: user gửi 100 request/s, mỗi server xử lý 10 request, không server nào biết tổng là 100. Rate limit bị bypass hoàn toàn.

Bốn algorithm kinh điển:

Fixed Window Counter: Chia thời gian thành windows cố định (e.g., 1 phút). Đếm request trong window hiện tại. Đơn giản nhưng có boundary problem: user có thể gửi 2× limit trong 2 giây spanning across window boundary.

Window 1 (00:00 - 01:00): 100 requests ← limit
Window 2 (01:00 - 02:00): 100 requests ← limit
Nhưng 00:30 - 01:30: 200 requests → Bypass!

Sliding Window Log: Lưu timestamp của mỗi request. Count request trong [now - window, now]. Chính xác tuyệt đối. Nhưng memory O(requests) — với high traffic, không scale.

Sliding Window Counter (hybrid): Kết hợp hai cái trên. Ước lượng count dựa trên current + previous window với weight. Accuracy ~0.1% error, memory O(1). Đây là algorithm dùng nhiều nhất trong practice.

Rate = prev_count × (1 - elapsed/window) + curr_count

Token Bucket: Bucket có capacity N token. Refill với rate R token/giây. Mỗi request tiêu 1 token. Cho phép burst (tốt cho API) nhưng phức tạp hơn khi distributed.

Deep dive: Sliding Window Counter với Redis

import redis
import time

r = redis.Redis()

def is_allowed(user_id: str, limit: int, window_seconds: int) -> bool:
    now = time.time()
    window_start = now - window_seconds
    
    # Current và previous window key
    curr_window = int(now // window_seconds)
    prev_window = curr_window - 1
    
    curr_key = f"rate:{user_id}:{curr_window}"
    prev_key = f"rate:{user_id}:{prev_window}"
    
    pipe = r.pipeline()
    pipe.get(prev_key)
    pipe.incr(curr_key)
    pipe.expire(curr_key, window_seconds * 2)
    results = pipe.execute()
    
    prev_count = int(results[0] or 0)
    curr_count = results[1]
    
    # Tỷ lệ thời gian đã trôi qua trong window hiện tại
    elapsed_in_window = now % window_seconds
    weight = elapsed_in_window / window_seconds
    
    # Ước lượng request trong sliding window
    estimated = prev_count * (1 - weight) + curr_count
    
    return estimated <= limit

Lua script cho atomic operations — production-grade hơn:

-- rate_limiter.lua
local key_curr = KEYS[1]
local key_prev = KEYS[2]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local weight = tonumber(ARGV[4])

local prev_count = tonumber(redis.call('GET', key_prev)) or 0
local curr_count = tonumber(redis.call('INCR', key_curr))
redis.call('EXPIRE', key_curr, window * 2)

local estimated = prev_count * (1 - weight) + curr_count

if estimated > limit then
    -- Rollback increment
    redis.call('DECR', key_curr)
    return 0
end
return 1

# Load script một lần, dùng SHA để gọi lại
with open('rate_limiter.lua', 'r') as f:
    script = f.read()

sha = r.script_load(script)

def is_allowed_atomic(user_id: str, limit: int, window: int) -> bool:
    now = time.time()
    curr_window = int(now // window)
    elapsed = now % window
    weight = elapsed / window
    
    result = r.evalsha(
        sha,
        2,  # số KEYS
        f"rate:{user_id}:{curr_window}",
        f"rate:{user_id}:{curr_window - 1}",
        limit, window, now, weight
    )
    return bool(result)

Response headers chuẩn (đừng bỏ qua):

def add_rate_limit_headers(response, user_id: str, limit: int, window: int):
    remaining = get_remaining(user_id, limit, window)
    reset_time = get_window_reset_time(window)
    
    response.headers['X-RateLimit-Limit'] = str(limit)
    response.headers['X-RateLimit-Remaining'] = str(max(0, remaining))
    response.headers['X-RateLimit-Reset'] = str(reset_time)
    response.headers['Retry-After'] = str(reset_time - int(time.time()))

Multi-tier rate limiting — thực tế hơn:

Per IP:     1000 req/min   (chặn bot, DDoS)
Per User:   100 req/min    (fair use)
Per API key: 10000 req/min (business tier)
Global:     1M req/min     (protect downstream)

def check_all_tiers(ip: str, user_id: str, api_key: str) -> tuple[bool, str]:
    checks = [
        (f"ip:{ip}", 1000, 60, "IP rate limit exceeded"),
        (f"user:{user_id}", 100, 60, "User rate limit exceeded"),
        (f"key:{api_key}", 10000, 60, "API key rate limit exceeded"),
    ]
    for identifier, limit, window, message in checks:
        if not is_allowed(identifier, limit, window):
            return False, message
    return True, "OK"

Interview insight: Câu hỏi hay nhất về rate limiter: "Điều gì xảy ra khi Redis down?" — Câu trả lời reveal rủi ro bạn chấp nhận. Option 1: fail open (cho qua hết) — ưu tiên availability. Option 2: fail closed (block hết) — ưu tiên protection. Không có đáp án đúng, chỉ có trade-off đúng ngữ cảnh.

4. Cache Warming Strategies

Theory: Cold start problem

Mọi cache đều bắt đầu từ cold — hoàn toàn rỗng. Khi deploy mới hoặc sau incident buộc flush cache, toàn bộ traffic đổ thẳng vào database. Gọi là cache stampede hay thundering herd.

Thundering herd pattern:

Deploy mới → Cache empty
→ 10,000 request/giây hit DB cùng lúc
→ DB quá tải, latency tăng
→ Request timeout, retry
→ Nhiều request hơn đổ vào
→ DB chết
→ Cả system chết
→ Bạn bị gọi lúc 3 giờ sáng

Ba chiến lược warming:

Lazy warming (cache-aside): Request đầu tiên miss cache → đọc DB → populate cache. Đơn giản nhất, không cần setup. Nhưng mọi user đầu tiên sau cold start đều chịu latency cao.
Eager warming (pre-warming): Chủ động populate cache trước khi traffic đến. Cần biết "cái gì cần warm" — thường dựa trên historical data hoặc business knowledge.
Lazy + protection (mutex/probabilistic): Vẫn lazy nhưng ngăn thundering herd với lock hoặc probabilistic early expiration.

Deep dive: Các pattern cụ thể

Pattern 1: Script-based pre-warming

# Chạy trước khi deploy hoặc sau flush
async def warm_cache():
    print("Starting cache warm-up...")
    
    # Warm top N products (business critical)
    top_products = db.query("""
        SELECT product_id FROM products
        ORDER BY view_count DESC
        LIMIT 10000
    """)
    
    batch_size = 100
    for i in range(0, len(top_products), batch_size):
        batch = top_products[i:i+batch_size]
        products = db.get_products_by_ids([p.id for p in batch])
        
        pipe = r.pipeline()
        for product in products:
            pipe.setex(
                f"product:{product.id}",
                timedelta(hours=1),
                json.dumps(product.to_dict())
            )
        pipe.execute()
        
        # Rate limit để không overload DB
        await asyncio.sleep(0.1)
    
    print(f"Warmed {len(top_products)} products")

Pattern 2: Mutex để tránh thundering herd

def get_product_with_mutex(product_id: str) -> dict:
    cache_key = f"product:{product_id}"
    lock_key = f"lock:{cache_key}"
    
    # Cache hit → return ngay
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Cache miss → acquire lock
    acquired = r.set(lock_key, "1", nx=True, ex=5)  # 5s timeout
    
    if acquired:
        # Tôi thắng lock → đọc DB, populate cache
        try:
            data = db.get_product(product_id)
            r.setex(cache_key, timedelta(hours=1), json.dumps(data))
            return data
        finally:
            r.delete(lock_key)
    else:
        # Người khác đang populate → chờ và retry
        time.sleep(0.1)
        cached = r.get(cache_key)
        if cached:
            return json.loads(cached)
        # Fallback: đọc thẳng DB nếu vẫn chưa có
        return db.get_product(product_id)

Pattern 3: Probabilistic early expiration (PER) — elegant hơn mutex

import math
import random

def get_with_per(key: str, fetch_fn, ttl: int, beta: float = 1.0):
    """
    PER: Probabilistic Early Recomputation
    Recompute trước khi expire với xác suất tăng dần khi gần hết TTL.
    Không cần lock, không có thundering herd.
    """
    data = r.get(key)
    
    if data:
        cached = json.loads(data)
        remaining_ttl = r.ttl(key)
        
        # Xác suất recompute tăng khi remaining_ttl giảm
        # beta: điều chỉnh độ "hung hăng" của early refresh
        if random.random() < math.exp(-remaining_ttl / (beta * 100)):
            # Recompute trong background, không block request này
            threading.Thread(target=refresh_cache, args=(key, fetch_fn, ttl)).start()
        
        return cached['value']
    
    # Total miss → fetch và cache
    value = fetch_fn()
    r.setex(key, ttl, json.dumps({'value': value}))
    return value

def refresh_cache(key: str, fetch_fn, ttl: int):
    value = fetch_fn()
    r.setex(key, ttl, json.dumps({'value': value}))

Pattern 4: Read-through với background refresh

from dataclasses import dataclass
from typing import Callable, TypeVar

T = TypeVar('T')

@dataclass
class CacheEntry:
    value: any
    expires_at: float
    stale_at: float  # Bắt đầu refresh sớm hơn expire

def get_stale_while_revalidate(
    key: str,
    fetch_fn: Callable,
    ttl: int,
    stale_ttl: int  # Thêm bao lâu nữa vẫn serve stale
) -> any:
    """
    Pattern: stale-while-revalidate
    Serve stale data ngay lập tức + trigger async refresh.
    """
    cached = r.get(key)
    
    if cached:
        entry = CacheEntry(**json.loads(cached))
        now = time.time()
        
        if now < entry.stale_at:
            return entry.value  # Fresh → return
        
        if now < entry.expires_at:
            # Stale nhưng chưa expire → serve + async refresh
            threading.Thread(target=background_refresh, args=(key, fetch_fn, ttl, stale_ttl)).start()
            return entry.value
    
    # Expired hoặc miss → sync fetch
    return sync_fetch_and_cache(key, fetch_fn, ttl, stale_ttl)

def background_refresh(key, fetch_fn, ttl, stale_ttl):
    value = fetch_fn()
    now = time.time()
    entry = CacheEntry(
        value=value,
        expires_at=now + ttl + stale_ttl,
        stale_at=now + ttl
    )
    r.setex(key, ttl + stale_ttl, json.dumps(asdict(entry)))

Warming strategy theo loại data:

Data type	Strategy	Lý do
Top N items (products, articles)	Eager pre-warm từ analytics	Predictable, high value
User profile	Lazy + mutex	Không pre-biết ai sẽ login
Config/feature flags	Eager, refresh on deploy	Must have, nhỏ gọn
Search results	Lazy + PER	Too many combinations
Leaderboard	Eager từ DB snapshot	Needed from first request
Real-time data (price, stock)	Không warm, TTL ngắn	Stale là vấn đề lớn hơn cold

5. Sizing Calculation

Theory: Tại sao phải tính?

Sizing calculation trong interview không phải để chính xác tuyệt đối — mà để demonstrate bạn có mental model về scale, biết đặt câu hỏi đúng, và không đề xuất cache 1TB cho 10K user.

Framework 4 bước:

Traffic: QPS, peak multiplier
Data size: Mỗi item bao nhiêu bytes
Hit rate: % request được serve từ cache
Memory: Data size × số items cần cache

Deep dive: Ví dụ tính thực tế

Scenario: E-commerce product cache

Assumptions (luôn state rõ trước khi tính):
- 10M products
- 1M DAU, peak 100K concurrent users
- 80/20 rule: 20% products = 80% traffic (hot items)
- Mỗi product object: ~2KB (JSON)
- Target cache hit rate: 95%
- Peak traffic: 10× normal

Bước 1: Tính số items cần cache

Hot items = 20% × 10M = 2M products
Cache đủ hot items → hit rate ~80-85%

Để đạt 95%, cần cache thêm:
Thêm 10% products = 1M items more
Total: 3M items

Bước 2: Memory

Memory = 3M items × 2KB/item
       = 6M KB
       = ~6GB raw data

Redis overhead: keys + pointers + metadata ≈ 30-40 overhead
Total Redis memory: 6GB × 1.35 ≈ 8GB

Với replication (1 primary + 2 replica): 8GB × 3 = 24GB total
→ 3 Redis nodes, mỗi node 16GB RAM

Bước 3: Throughput

QPS = 1M DAU × 50 requests/day / 86400s ≈ 580 QPS average
Peak = 580 × 10 = 5,800 QPS

Redis throughput: ~100K ops/s single node (đơn giản operations)
→ 1 Redis node đủ cho throughput, nhưng cần cluster cho reliability

Bước 4: Network bandwidth

Cache responses = 5,800 QPS × 95% hit rate ≈ 5,500 QPS from cache
Data per response = 2KB
Bandwidth = 5,500 × 2KB = 11MB/s ≈ 88 Mbps
→ Không đáng lo với 1Gbps NIC

Bước 5: TTL strategy

Hot items (top 10K):     TTL = 5 phút (balance freshness vs hit rate)
Normal items (top 3M):   TTL = 1 giờ
Cold items:              Không cache

Scenario 2: Session cache

Assumptions:
- 5M DAU
- Mỗi user có 1 active session
- Mỗi session: 500 bytes
- Peak concurrent sessions: 10% DAU = 500K sessions
- Session TTL: 24 giờ (sliding)

Memory = 500K sessions × 500 bytes = 250MB
→ Rất nhỏ, không phải bottleneck

Throughput:
- 1M DAU × 50 requests/day = 50M requests/day
- 50M / 86400 ≈ 580 QPS session validation
→ Single Redis node đủ

Kết luận: Session cache không phải scaling problem.
          Vấn đề thực sự là session replication và failover.

Scenario 3: Rate limiter

Assumptions:
- 100K API keys
- Mỗi key: limit 1000 req/min
- Sliding window: 2 Redis keys/user (curr + prev window)

Memory per user = 2 keys × (key overhead ~50 bytes + value ~8 bytes) = ~120 bytes
Total = 100K × 120 bytes = 12MB → Không đáng kể

Throughput:
- 100K keys × 1000 req/min / 60s = 1.67M ops/s  ← Đây mới là vấn đề
- Mỗi request cần 3 Redis ops (GET prev, INCR curr, EXPIRE)
- Total: 5M Redis ops/s
→ Cần Redis Cluster 5-10 nodes, sharded by API key

Interview insight: Khi tính xong, hãy identify bottleneck thực sự. Product cache → memory. Rate limiter → throughput. Session → reliability/failover. Biết cái gì là giới hạn quan trọng hơn số chính xác.

Bảng tham khảo nhanh:

	Small	Medium	Large
Đơn vị	KB	MB	GB
User	1K	1M	100M
QPS	100	10K	1M
Redis nodes	1	3	10-50
Strategy	Single	Replication	Cluster

6. Câu hỏi hay để hỏi Interviewer

Theory: Tại sao câu hỏi quan trọng hơn câu trả lời?

Một câu hỏi tốt cho thấy bạn đã làm production systems thực sự — nơi requirements mơ hồ, trade-off thực sự tồn tại, và câu trả lời "depends" không phải là né tránh mà là sự thật.

Interviewer thường có môi trường cụ thể trong đầu. Câu hỏi của bạn giúp:

Reveal bạn biết mình cần biết gì
Narrow down design space đến phần có ý nghĩa
Tránh build the wrong thing perfectly

Câu hỏi theo từng scenario

Trước khi bắt đầu bất kỳ cache design nào:

"What's the current read/write ratio?"
→ Read-heavy → cache aggressive
→ Write-heavy → cache có thể harmful (stale data everywhere)

"What's the acceptable staleness for this data?"
→ User profile: vài giây OK
→ Bank balance: zero tolerance
→ News feed: vài phút fine

"What's the scale we're designing for — today or 5 years?"
→ Over-engineer cho 5 năm khi startup = waste
→ Under-design cho growth = pain

"Is this data user-specific or shared across users?"
→ Shared (product catalog): cache rất hiệu quả
→ User-specific (feed): cache per-user, phức tạp hơn

Cache invalidation:

"Who owns this data? Which service writes it?"
→ Nếu nhiều service write, invalidation phức tạp hơn nhiều

"Do we need strong consistency or eventual consistency?"
→ Shopping cart: cần consistent
→ Like count: eventual OK, ai care nếu lệch vài giây

"What happens if cache and DB are out of sync?"
→ Revenue impact? → Strong consistency
→ UX degradation? → Eventual OK

"How often does this data change?"
→ Ít thay đổi (config) → cache lâu, aggressive
→ Thay đổi nhiều (real-time price) → cache ngắn hoặc không cache

Availability và failure:

"What's the expected behavior when Redis is unavailable?"
→ Fail open (serve from DB, slower but works)?
→ Fail closed (return error, protect DB)?
→ Depends on: DB capacity, business criticality

"What's our RTO/RPO for cache layer?"
→ Redis Sentinel cho failover ~30s
→ Redis Cluster cho failover <1s
→ Cost implication của mỗi option

"Do we have budget for standby replicas?"
→ Không hỏi thẳng về tiền, hỏi về reliability expectations

Operations (senior-level signals):

"How do we monitor cache effectiveness?"
→ Hit rate, eviction rate, memory usage, latency percentiles
→ Nếu interviewer chưa nghĩ đến → bạn đang add value

"What's our cache warming strategy after a deployment?"
→ Nếu bỏ qua, thundering herd sau mỗi deploy
→ Cho thấy bạn nghĩ về day-2 operations

"How do we handle cache poisoning?"
→ Validation trước khi cache
→ TTL làm safety net
→ Circuit breaker nếu hit rate đột ngột drop

Câu hỏi phân biệt senior vs. mid-level:

Mid-level hỏi	Senior hỏi
"Nên dùng Redis hay Memcached?"	"What's the write pattern — do we need atomic operations hay pub/sub?"
"TTL bao lâu?"	"What's the business SLA cho data freshness và impact của stale data?"
"Cache ở đâu?"	"Where does cache sit in the request path và ai chịu trách nhiệm invalidation?"
"Cần bao nhiêu memory?"	"Bottleneck của system là memory, throughput, hay network — và cache ảnh hưởng cái nào nhất?"
"Cache có bị down không?"	"Khi cache layer fails, traffic pattern thay đổi như thế nào và DB của chúng ta có absorb được không?"

Câu hỏi không nên hỏi

❌ "Chúng ta nên dùng gì, Redis hay Elasticsearch?"
   → Category error, Elasticsearch không phải cache

❌ "Cache có giúp tăng performance không?"
   → Quá hiển nhiên, không add value

❌ "Bao nhiêu server là đủ?"
   → Quá sớm khi chưa biết requirements

❌ "Có nên dùng cache không?"
   → Nếu đây là câu hỏi về cache, câu trả lời hiển nhiên là có

Framework đặt câu hỏi

Mỗi khi không chắc hỏi gì, đi theo framework này:

1. WHAT: "Cái gì là critical path cần optimize?"
2. WHO: "Ai đọc, ai ghi data này?"
3. WHEN: "Khi nào data thay đổi, và khi nào cần reflect?"
4. WHAT IF: "Điều gì xảy ra khi cache miss? Cache stale? Cache down?"
5. HOW MUCH: "Scale cụ thể — user, QPS, data size?"

Tổng kết

Cache không phải magic. Nó là trade-off machine: đánh đổi consistency để lấy performance, đánh đổi operational complexity để lấy scale, đánh đổi memory để lấy latency.

Interview tốt không phải là bạn biết tất cả câu trả lời — mà là bạn biết đặt câu hỏi đúng, identify trade-off đúng, và communicate rõ ràng tại sao bạn chọn approach này thay vì approach kia.

Checklist cho mọi cache design:

What to cache (và quan trọng hơn: what NOT to cache)
Where to cache (client, CDN, application, database)
How long to cache (TTL strategy, sliding vs fixed)
How to invalidate (event-driven, TTL-only, manual)
What happens on miss (fallback, thundering herd protection)
What happens on failure (fail open vs fail closed)
How to monitor (hit rate, eviction, latency)
How to warm (pre-warm, lazy, hybrid)

Cache được implement đúng là invisible — system nhanh, stable, không ai hỏi tại sao. Cache implement sai là visible nhất lúc 3 giờ sáng.