Chapter 1

Foundations

Requirements · Scale · Latency · Consistency · Tradeoffs

Ready to practise interactively?

Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.

Open Interactive Chapter →

Requirements Framing

Always open here. Jumping to architecture without this is the #1 failure signal at Staff level.

  • Functional requirements — what the system does (send message, stream video, process payment)
  • Non-functional requirements — how well it does it (latency, availability, consistency, security)
  • DAU (Daily Active Users) / MAU (Monthly Active Users) estimation — drives architecture decisions (50M DAU = very different from 500K DAU)
  • Latency targets — define p50 (50th percentile) and p99 (99th percentile) explicitly before designing
  • Consistency model — strong consistency (banking) vs eventual consistency (social feeds)
  • Reliability / SLA (Service Level Agreement) — 99.9% uptime = 8.7 hrs downtime/year; 99.99% = 52 mins/year
  • Security & privacy — PII (Personally Identifiable Information) handling, compliance (GDPR (General Data Protection Regulation), HIPAA (Health Insurance Portability and Accountability Act)), data residency
SeniorAsks functional requirements; skips non-functional or needs prompting
StaffOpens with both functional and non-functional; derives scale estimate proactively; names consistency model
PrincipalFrames requirements in terms of business constraints and org-wide trade-offs; challenges assumptions in the prompt

Interview tip: Before I design anything, let me clarify requirements. I'll separate functional from non-functional, then nail down scale so my architecture choices are proportionate.

Latency — p50 (50th percentile), p95 (95th percentile), p99 (99th percentile), p999 (99.9th percentile)

Percentile latency measures how fast requests complete for a given % of users. Think of 100 users making the same request — p99 is how long the 99th-slowest user waited.

MetricMeaningWho cares
p50Median — half of requests are faster, half are slower. Typical user experience.Product metrics, dashboards
p9595% of requests complete within this time. Catches most slow outliers.API (Application Programming Interface) SLA agreements
p9999% complete within this time. 1 in 100 users is slower. Reveals tail latency.Infrastructure, Staff interviews
p99999.9% complete within this time. 1 in 1000. Used for payments, trading, safety systems.Financial, real-time systems
SeniorMentions p50 and p99; knows they differ
StaffSets explicit p99 SLA target before designing; ties tail latency to architectural decisions like caching
PrincipalDefines the latency budget across the full stack — client, network, server — and allocates per layer

Interview tip: At 50M DAU, p99 latency = 500,000 users/day having a bad experience. Optimizing only p50 can mask severe tail latency caused by GC (Garbage Collection) pauses, DB (Database) lock contention, or cold cache misses.

Consistency Models

Understanding consistency models is crucial for making the right architectural decisions.

ModelGuaranteeUse Case
Strong consistencyEvery read sees the most recent write. Slower.Banking, payments, inventory
Eventual consistencyAll nodes converge to same value eventually. Faster.Social feeds, likes, read receipts
Causal consistencyOperations that are causally related are seen in order.Chat ordering, collaborative editing
Read-your-writesA user always sees their own writes immediately.Profile updates, settings

Scale Estimation — DAU (Daily Active Users) to RPS (Requests Per Second)

Every Staff interview expects you to derive RPS (Requests Per Second) from DAU (Daily Active Users) before drawing any architecture. Use this formula once and move on — do not spend more than 2 minutes on it.

  • Peak multiplier — assume 3x average for peak traffic; 10x for viral events
  • Read:write ratio — social feed ~100:1; chat ~1:1; payments ~10:1 read-heavy
  • Mobile-specific — factor in retry storms on reconnect; 10% of requests may be retries
Example: Social Feed App
━━━━━━━━━━━━━━━━━━━━━━━━━━━
50M DAU (Daily Active Users)
x  3 sessions/day
x 20 requests/session
= 3,000,000,000 requests/day
/ 86,400 seconds/day
≈ 35,000 RPS (Requests Per Second) (avg)   →  ~100,000 RPS peak (3x)
━━━━━━━━━━━━━━━━━━━━━━━━━━━
Storage: 50M users x 500 bytes/event x 100 events/day
       = 2.5 TB (Terabyte)/day  →  plan for tiered cold storage

Interview tip: State your assumptions out loud: 50M DAU (Daily Active Users), 3 sessions/day, 20 req/session. Round to 35k RPS (Requests Per Second). Say: this means I need horizontal scaling and a CDN (Content Delivery Network) layer. Then move on — don't get stuck on the maths.

Canonical Mobile Architecture Diagram

Draw this in every App-design interview. Adjust layers to the question — don't draw what you don't need.

┌─────────────────────────────────────────────┐
│  UI (User Interface) Layer  (Compose / View)│
│  Observes StateFlow. Emits user events only.│
└──────────────────────┬──────────────────────┘
                       │ uiState / events
┌──────────────────────▼──────────────────────┐
│  ViewModel  (viewModelScope)                │
│  Holds UiState. Delegates to UseCases.      │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│  Domain Layer  (UseCases)                   │
│  Pure Kotlin. Zero Android imports.         │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│  Repository  (decides: local or network?)   │
│  Cache-aside: L1 (Level 1) → L2 (Level 2) → Network
└───────────┬─────────────────────┬───────────┘
            │                     │
┌───────────▼───────────┐ ┌───────▼─────��─────┐
│  Room DB (L2 cache)   │ │  Retrofit / OkHttp│
│  Outbox table         │ │  (REST / SSE / WS)│
└───────────────────────┘ └───────────────────┘

Interview tip: Dependency rule: Arrows point inward only: UI (User Interface) → ViewModel → UseCase → Repository → Data. No layer imports the layer above it. Domain has zero Android imports.

Trade-off Vocabulary

Staff engineers articulate trade-offs with precision. Interviewers specifically listen for this vocabulary — answering 'it depends' without naming the axes and the deciding constraint is the most common Senior ceiling in design interviews.

  • Name both sides explicitly — say 'the trade-off is X vs. Y' before stating your choice
  • State the constraint that tips the balance — 'given our 50M DAU and battery sensitivity, I choose Y'
  • Acknowledge the cost — 'the downside is Z, which I'd mitigate by...'
  • Never say 'it depends' without immediately naming what it depends on and how each case resolves
Trade-offAxis AAxis BClassic Android Example
Latency vs. ThroughputHow fast one request completesHow many requests complete per secondp99 cache hit (latency priority) vs. batch DB write (throughput priority)
Consistency vs. AvailabilityEvery read sees the most recent writeSystem stays up during a network partitionPayment status (strong required) vs. read receipts (eventual OK)
Battery vs. FreshnessFewer wakelocks and background workData is more up-to-dateWorkManager periodic sync vs. FCM push-on-change
Memory vs. SpeedSmaller in-memory footprintFaster access without disk I/OLruCache bounded size vs. unbounded in-memory map
Complexity vs. PerformanceSimpler code, easier to maintainMore performant under loadRoom + Paging vs. raw SQLite cursor with manual windowing
Offline-first vs. SimplicityWorks without a network connectionLess sync logic to write and maintainCRDT merge vs. last-write-wins
Mobile: Real-time vs. BatteryAggressive transport for live UXBackground-friendly transportWebSocket always-on vs. FCM + HTTP-on-demand
SeniorMakes a choice and defends it when challenged
StaffNames the trade-off explicitly before deciding; states the constraint that tips the balance; acknowledges the cost of their choice
PrincipalFrames trade-offs in terms of org constraints — team size, platform maturity, compliance obligations — and designs for the org's actual situation, not a generic system

Interview tip: The Staff-level template: 'The trade-off here is [A] vs [B]. Given [constraint], I'd choose [X], accepting [cost], which I'd mitigate by [approach].' Use this structure every time you make an architectural decision. Senior engineers make a choice. Staff engineers name the trade-off, the constraint, the cost, and the mitigation.

Test your knowledge

This chapter includes 8 quiz questions covering all core concepts. Open the interactive experience to test yourself.

Start Quiz →