Chapter 1

Foundations

Requirements · Scale · Latency · Consistency · Tradeoffs

Ready to practise interactively?

Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.

Open Interactive Chapter →

Requirements Framing

Always open here. Jumping to architecture without this is the #1 failure signal at Staff level.

Functional requirements — what the system does (send message, stream video, process payment)
Non-functional requirements — how well it does it (latency, availability, consistency, security)
DAU (Daily Active Users) / MAU (Monthly Active Users) estimation — drives architecture decisions (50M DAU = very different from 500K DAU)
Latency targets — define p50 (50th percentile) and p99 (99th percentile) explicitly before designing
Consistency model — strong consistency (banking) vs eventual consistency (social feeds)
Reliability / SLA (Service Level Agreement) — 99.9% uptime = 8.7 hrs downtime/year; 99.99% = 52 mins/year
Security & privacy — PII (Personally Identifiable Information) handling, compliance (GDPR (General Data Protection Regulation), HIPAA (Health Insurance Portability and Accountability Act)), data residency

SeniorAsks functional requirements; skips non-functional or needs prompting

StaffOpens with both functional and non-functional; derives scale estimate proactively; names consistency model

PrincipalFrames requirements in terms of business constraints and org-wide trade-offs; challenges assumptions in the prompt

Interview tip: Before I design anything, let me clarify requirements. I'll separate functional from non-functional, then nail down scale so my architecture choices are proportionate.

Latency — p50 (50th percentile), p95 (95th percentile), p99 (99th percentile), p999 (99.9th percentile)

Percentile latency measures how fast requests complete for a given % of users. Think of 100 users making the same request — p99 is how long the 99th-slowest user waited.

Metric	Meaning	Who cares
p50	Median — half of requests are faster, half are slower. Typical user experience.	Product metrics, dashboards
p95	95% of requests complete within this time. Catches most slow outliers.	API (Application Programming Interface) SLA agreements
p99	99% complete within this time. 1 in 100 users is slower. Reveals tail latency.	Infrastructure, Staff interviews
p999	99.9% complete within this time. 1 in 1000. Used for payments, trading, safety systems.	Financial, real-time systems

SeniorMentions p50 and p99; knows they differ

StaffSets explicit p99 SLA target before designing; ties tail latency to architectural decisions like caching

PrincipalDefines the latency budget across the full stack — client, network, server — and allocates per layer

Interview tip: At 50M DAU, p99 latency = 500,000 users/day having a bad experience. Optimizing only p50 can mask severe tail latency caused by GC (Garbage Collection) pauses, DB (Database) lock contention, or cold cache misses.

Consistency Models

Understanding consistency models is crucial for making the right architectural decisions.

Model	Guarantee	Use Case
Strong consistency	Every read sees the most recent write. Slower.	Banking, payments, inventory
Eventual consistency	All nodes converge to same value eventually. Faster.	Social feeds, likes, read receipts
Causal consistency	Operations that are causally related are seen in order.	Chat ordering, collaborative editing
Read-your-writes	A user always sees their own writes immediately.	Profile updates, settings

Scale Estimation — DAU (Daily Active Users) to RPS (Requests Per Second)

Every Staff interview expects you to derive RPS (Requests Per Second) from DAU (Daily Active Users) before drawing any architecture. Use this formula once and move on — do not spend more than 2 minutes on it.

Peak multiplier — assume 3x average for peak traffic; 10x for viral events
Read:write ratio — social feed ~100:1; chat ~1:1; payments ~10:1 read-heavy
Mobile-specific — factor in retry storms on reconnect; 10% of requests may be retries

Example: Social Feed App
━━━━━━━━━━━━━━━━━━━━━━━━━━━
50M DAU (Daily Active Users)
x  3 sessions/day
x 20 requests/session
= 3,000,000,000 requests/day
/ 86,400 seconds/day
≈ 35,000 RPS (Requests Per Second) (avg)   →  ~100,000 RPS peak (3x)
━━━━━━━━━━━━━━━━━━━━━━━━━━━
Storage: 50M users x 500 bytes/event x 100 events/day
       = 2.5 TB (Terabyte)/day  →  plan for tiered cold storage

Interview tip: State your assumptions out loud: 50M DAU (Daily Active Users), 3 sessions/day, 20 req/session. Round to 35k RPS (Requests Per Second). Say: this means I need horizontal scaling and a CDN (Content Delivery Network) layer. Then move on — don't get stuck on the maths.

Canonical Mobile Architecture Diagram

Draw this in every App-design interview. Adjust layers to the question — don't draw what you don't need.

┌─────────────────────────────────────────────┐
│  UI (User Interface) Layer  (Compose / View)│
│  Observes StateFlow. Emits user events only.│
└──────────────────────┬──────────────────────┘
                       │ uiState / events
┌──────────────────────▼──────────────────────┐
│  ViewModel  (viewModelScope)                │
│  Holds UiState. Delegates to UseCases.      │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│  Domain Layer  (UseCases)                   │
│  Pure Kotlin. Zero Android imports.         │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│  Repository  (decides: local or network?)   │
│  Cache-aside: L1 (Level 1) → L2 (Level 2) → Network
└───────────┬─────────────────────┬───────────┘
            │                     │
┌───────────▼───────────┐ ┌───────▼─────��─────┐
│  Room DB (L2 cache)   │ │  Retrofit / OkHttp│
│  Outbox table         │ │  (REST / SSE / WS)│
└───────────────────────┘ └───────────────────┘

Interview tip: Dependency rule: Arrows point inward only: UI (User Interface) → ViewModel → UseCase → Repository → Data. No layer imports the layer above it. Domain has zero Android imports.

Trade-off Vocabulary

Staff engineers articulate trade-offs with precision. Interviewers specifically listen for this vocabulary — answering 'it depends' without naming the axes and the deciding constraint is the most common Senior ceiling in design interviews.

Name both sides explicitly — say 'the trade-off is X vs. Y' before stating your choice
State the constraint that tips the balance — 'given our 50M DAU and battery sensitivity, I choose Y'
Acknowledge the cost — 'the downside is Z, which I'd mitigate by...'
Never say 'it depends' without immediately naming what it depends on and how each case resolves

Trade-off	Axis A	Axis B	Classic Android Example
Latency vs. Throughput	How fast one request completes	How many requests complete per second	p99 cache hit (latency priority) vs. batch DB write (throughput priority)
Consistency vs. Availability	Every read sees the most recent write	System stays up during a network partition	Payment status (strong required) vs. read receipts (eventual OK)
Battery vs. Freshness	Fewer wakelocks and background work	Data is more up-to-date	WorkManager periodic sync vs. FCM push-on-change
Memory vs. Speed	Smaller in-memory footprint	Faster access without disk I/O	LruCache bounded size vs. unbounded in-memory map
Complexity vs. Performance	Simpler code, easier to maintain	More performant under load	Room + Paging vs. raw SQLite cursor with manual windowing
Offline-first vs. Simplicity	Works without a network connection	Less sync logic to write and maintain	CRDT merge vs. last-write-wins
Mobile: Real-time vs. Battery	Aggressive transport for live UX	Background-friendly transport	WebSocket always-on vs. FCM + HTTP-on-demand

SeniorMakes a choice and defends it when challenged

StaffNames the trade-off explicitly before deciding; states the constraint that tips the balance; acknowledges the cost of their choice

PrincipalFrames trade-offs in terms of org constraints — team size, platform maturity, compliance obligations — and designs for the org's actual situation, not a generic system

Interview tip: The Staff-level template: 'The trade-off here is [A] vs [B]. Given [constraint], I'd choose [X], accepting [cost], which I'd mitigate by [approach].' Use this structure every time you make an architectural decision. Senior engineers make a choice. Staff engineers name the trade-off, the constraint, the cost, and the mitigation.