Chapter 5

Data & Persistence

Caching L1/L2/L3 · Room · Paging 3 · Offline-First · Sync · Message Reliability

Ready to practise interactively?

Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.

Open Interactive Chapter →

Caching — L1 (Level 1), L2 (Level 2), L3 (Level 3) Architecture

Caching is multi-layered. Always describe all three layers. Each trades speed for capacity and durability.

  • 1. Check L1 (memory) — if hit, return immediately
  • 2. Miss → check L2 (disk/Room) — if hit, populate L1 and return
  • 3. Miss → fetch from network — populate L2 then L1, then return
  • 4. On write — update L1 and L2 immediately; sync to server asynchronously
LayerWhat It IsSpeedSurvives Process Death?Android Example
L1 (Level 1) — In-MemoryData in RAM (Random Access Memory) inside the running process~nsNoLruCache (Least Recently Used Cache), HashMap, StateFlow value
L2 (Level 2) — Disk CacheData written to local storage~msYesRoom DB (Database), OkHttp disk cache, DataStore
L3 (Level 3) — Network/CDN (Content Delivery Network)Data from remote server or CDN (Content Delivery Network) edge100ms+Yes (remote)Retrofit API (Application Programming Interface), CDN (Content Delivery Network)-served images

Interview tip: Say: 'I use a three-layer cache — in-memory LruCache for hot data, Room for warm data that survives process death, and the network as source of truth. I invalidate with ETags for REST and TTL for feeds.'

Cache Invalidation Strategies

Knowing when to invalidate cache is as important as knowing how to cache.

StrategyHow It WorksUse Case
TTL (Time-To-Live)Entry expires after fixed durationNews feeds, product listings
ETag (Entity Tag) / Last-ModifiedServer returns hash; client sends it back; 304 if unchangedREST (Representational State Transfer) APIs (Application Programming Interface) — avoid re-downloading unchanged data
Stale-while-revalidateServe stale immediately, refresh in backgroundProfile screens — fast UX (User Experience), eventual freshness
Write-throughEvery write updates cache and server simultaneouslyUser preferences, critical settings
LRU (Least Recently Used) evictionDrop least recently used when capacity is reachedImage caches, in-memory stores

Room & Local Persistence

Key concepts for local data storage in Android.

  • Room + SQLite (Structured Query Language) — primary structured store; use transactions for atomicity
  • DAO (Data Access Object) with Flow — Room DAOs (Data Access Objects) can return Flow<List<T>>; automatically emits on DB (Database) change
  • Database transactions — wrap multi-table writes in withTransaction to prevent partial state
  • Migrations — always write forward migrations; test with MigrationTestHelper on every version bump
  • DataStore — replace SharedPreferences; coroutine-safe, Flow-based, type-safe with Proto DataStore
  • Indices — add @Index on columns used in WHERE clauses; critical for query performance on large tables

Recommended Libraries

  • Room Jetpack's SQLite abstraction. Type-safe queries, compile-time checks, Flow support. Standard for Android persistence.
  • SQLDelight Square's SQL-first database library. Write SQL, generate Kotlin. Multiplatform support (KMP).
  • SQLCipher AES-256 encryption for Room/SQLite. Protects data at rest. Essential for sensitive data.
  • DataStore Jetpack's replacement for SharedPreferences. Coroutine-safe, Flow-based. Use Proto DataStore for typed data.

Paging 3 — Deep Dive

Paging 3 is the correct way to load large datasets. Know the internals, not just the API (Application Programming Interface).

  • Cursor pagination preferred over offset — consistent results under concurrent inserts; offset-based pagination can skip or duplicate items when the list changes during scroll
  • Error handling — LoadState.Error in header/footer items; retry via lazyPagingItems.retry()
  • Prepend vs append — append = load next page; prepend = load newer items at top (e.g. new messages in chat)
ComponentRole
PagingSourceDefines how to load a page. Implement load() to fetch from network or DB (Database).
RemoteMediatorBridges network and DB (Database). Loads from network into Room; PagingSource reads from Room. Required for offline support.
PagerCreates PagingData flow from PagingSource + optional RemoteMediator.
PagingDataStream of paginated data. Collected in ViewModel, passed to UI (User Interface).
LazyPagingItemsCompose adapter. Use collectAsLazyPagingItems() in composable.

Offline-First Architecture

The gold standard for mobile apps that need to work without connectivity.

  • Write local DB first, sync to server asynchronously — never block UI on network
  • Optimistic UI — show result immediately; rollback on server error
  • Outbox pattern — persist pending operations in a queue (Room table); drain with WorkManager
  • Conflict resolution — last-write-wins (simple), server-wins (safe), custom merge (chat, docs)
  • Eventual consistency — client and server may temporarily diverge; this is acceptable for most mobile apps
Outbox StepDetail
1. User actionWrite to local DB and outbox table atomically in one Room transaction
2. Optimistic UIRender from local DB immediately — user sees no loading state
3. Background drainWorkManager picks up outbox entries on connectivity, retries with backoff
4. Server confirmOn success: mark sent, update local record with server-assigned ID
5. ConflictOn conflict: apply resolution strategy; notify user if data was overwritten
SeniorKnows to show cached data while fetching
StaffDesigns full offline-first: atomic Room + outbox write, optimistic UI, WorkManager drain, conflict resolution
PrincipalDefines the sync contract for the platform — reliability guarantees, conflict policies, sync latency SLA

Synchronization Strategies

Different sync strategies for different use cases.

StrategyHow It WorksBest For
Short PollingClient requests every N secondsSimple, low-frequency data (dashboards)
Long PollingServer holds request until data availableNear real-time without WebSocket infra
Push (FCM/WebSocket)Server pushes changes to clientChat, notifications — battery efficient
Delta SyncClient sends lastSyncedTimestamp; server returns only changed recordsLarge datasets, frequent incremental syncs
Background SyncWorkManager triggers on CONNECTED constraintOffline-first apps, outbox draining

Multi-Device Sync

User sends a message on their phone — their tablet must reflect it. This requires a device-aware sync model, not just client-server sync.

  • deviceId — each device has a unique ID; server uses it to exclude the sender from push fanout
  • Server timestamp — server assigns authoritative timestamp; never trust client clocks for ordering
  • lastSyncedTimestamp per device — each device tracks its own sync cursor; delta sync fetches only events after that cursor
  • Deduplication on receive — if device receives its own message via push, check clientMessageId against local outbox; discard if already SENT
  • Conflict window — two devices editing the same record simultaneously; resolve with last-write-wins (simple), CRDT (collaborative docs), or server-merge
SeniorSyncs one device
StaffHandles multi-device fanout with deduplication and ordered delivery
PrincipalDesigns the event stream topology — push fanout vs pull per device, consistency guarantees

Message Reliability

Critical for chat, payments — any system where duplicate or lost messages are unacceptable.

  • clientMessageId = UUID — client generates before send; server deduplicates on it
  • Idempotency keys — same key = same result; always safe to retry
  • At-least-once + server dedup — safer than at-most-once for message delivery
  • Exponential backoff — never retry immediately on failure

Interview tip: Message state machine: PENDING → SENT → DELIVERED → READ (and FAILED from any state on terminal error)

CRDT & Conflict Resolution — G-Set, LWW-Register, OT, RGA

When two devices edit the same data simultaneously, you need a conflict resolution strategy. Last-Write-Wins (LWW) is simple but lossy. CRDTs (Conflict-free Replicated Data Types) are mathematically merge-safe — each type suits different use cases.

  • CRDTs guarantee that any two replicas converge to the same state regardless of operation order or network partitions
  • Vector clocks vs wall clocks — never use wall clock for causality; use a Lamport timestamp or vector clock
  • Yjs (2023) and Automerge (2.0, 2023) are production-ready CRDT libraries used in Notion, Linear, and Figma
  • For mobile offline-first: LWW-Register per field covers 90% of use cases; only reach for full CRDT for collaborative editing
  • Three-way merge — git-style: find common ancestor, merge both branches' diffs; used by some sync engines (e.g. Electric SQL)
StrategyMechanismData Loss?Use Case
Last-Write-Wins (LWW)Highest timestamp wins; other write discardedYes — concurrent edits lostUser profile fields, settings
Server-WinsServer version always authoritativeYes — client edit discardedRead-heavy data, inventory
G-Set (CRDT)Grow-only set — union of all adds, no deletesNo — monotonicReaction sets, tag collections
LWW-Register (CRDT)Per-field LWW with vector clock, not wall clockPartial — field-level, not record-levelUser profile with concurrent field edits
Operational Transformation (OT)Transform ops against each other before applyingNo — but complex to implementGoogle Docs-style text editing
RGA / CRDT Text (e.g. Yjs, Automerge)Each character has a unique ID; inserts never conflictNo — fully concurrentCollaborative text editing (Notion, Figma)
SeniorUses last-write-wins with server timestamps
StaffKnows when LWW is insufficient; proposes CRDT or OT for collaborative features
PrincipalEvaluates Yjs/Automerge for the platform; defines the sync model guarantees in the architecture spec

Interview tip: Say: for a collaborative document feature I'd use a CRDT — specifically Yjs or Automerge — because they provide conflict-free merges without a central server arbitrating. For simpler cases like profile edits I'd use LWW-Register with vector clocks rather than wall-clock timestamps.

Test your knowledge

This chapter includes 7 quiz questions covering all core concepts. Open the interactive experience to test yourself.

Start Quiz →