Chapter 10

Scale & Cross-Platform

SDK Design · Design Systems · Accessibility · i18n · KMP · Image Pipelines

Ready to practise interactively?

Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.

Open Interactive Chapter →

SDK / Library Design

Expected at Stripe, Palo Alto Networks, Anthropic, and any company shipping a developer-facing SDK.

Minimal API surface — expose only what is necessary; every public API is a contract you must maintain
Backward compatibility — use @Deprecated with replacement; never remove public APIs in minor versions
Semantic versioning — MAJOR.MINOR.PATCH; breaking changes = major bump
Binary compatibility — use Binary Compatibility Validator (Kotlin) to catch ABI breaks in CI
ProGuard consumer rules — ship consumer-rules.pro so consumers don't need to add keep rules
Initialization — support both manual init and auto-init via ContentProvider (like Firebase)
Avoid leaking internal types — internal classes must not appear in public API signatures

Design System Architecture

L8+ engineers are expected to think about design systems, not just use them.

Token-based theming — define semantic tokens (colorPrimary, spacingMd, typographyHeadline) not raw values; map tokens to platform values per theme
MaterialTheme extension — extend Compose MaterialTheme with custom tokens via CompositionLocal
Component versioning — breaking design changes = new component (Button vs ButtonV2) until migration is complete
Multi-theme support — light, dark, high-contrast; tokens map differently per theme
Shared across platforms — with KMP, design tokens can be shared; platform renders natively

Accessibility (a11y)

Google tests this at every level. Airbnb has one of the strongest a11y cultures in the industry.

Content descriptions — every icon, image, and non-text element needs a meaningful description
Semantic properties in Compose — use Modifier.semantics { } to provide role, state, and actions to TalkBack
Touch target size — minimum 48x48dp; use Modifier.minimumInteractiveComponentSize()
Color contrast — 4.5:1 for normal text; 3:1 for large text (WCAG AA standard)
Focus order — ensure TalkBack traversal order matches visual order; use isTraversalGroup and traversalIndex to correct
Screen reader testing — test with TalkBack on a real device before shipping

Internationalization (i18n) & Localization

Building apps for a global audience.

RTL layout support — use start/end not left/right; test with Arabic or Hebrew locale; Compose handles RTL automatically
Plurals — use plurals resource type; never concatenate strings for counts
String formatting — use getString(R.string.x, arg); never concatenate translated strings with hardcoded text
Locale-aware formatting — dates, times, currencies must use system locale; never hardcode format strings
Pseudo-localization — enable in developer options to catch layout truncation and hardcoded strings early
Font scaling — test at 200% font size; use sp for text; ensure layouts don't break

Scalable Image Pipeline

Beyond 'use Coil'. Relevant at Instagram, Airbnb, Netflix — any image-heavy app.

Memory cache (L1) — in-memory LruCache keyed by URL + size; bounded by available RAM
Disk cache (L2) — DiskLruCache; keyed by URL hash; bounded by configured disk quota
Transformations — resize, crop, circle-crop applied before caching; cache stores transformed result not original
Priority queuing — visible items load first; prefetch off-screen items at lower priority
Animated images — WebP preferred over GIF (smaller); use Coil's AsyncImage with enableAnimatedImage = true
Placeholder strategy — dominant color placeholder (Airbnb Lottie pattern) vs blur hash vs skeleton; all better than blank space

Kotlin Multiplatform (KMP) — Staff-Level Deep Dive

KMP shares business logic across Android, iOS, desktop, and web while keeping UI fully native. In 2025–2026 this is increasingly in scope for Staff-level platform strategy discussions at companies like Netflix, Touchlab clients, and any team that ships on both mobile platforms.

Shared code targets: business logic, domain UseCases, data models, repository interfaces, validation, network client (Ktor), local storage (SQLDelight), and analytics events
Native-only code: UI layer (Compose Multiplatform on Android, SwiftUI on iOS), platform APIs (camera, biometrics, BLE, push, file system), and navigation stacks
expect / actual — declare an interface in commonMain with expect; each platform provides an actual implementation; used for platform-specific clocks, UUID generation, crypto, file IO
Ktor for KMP networking — multiplatform HTTP client; uses OkHttp engine on Android, Darwin (NSURLSession) on iOS; same coroutine-based API across platforms; serialization via kotlinx.serialization
SQLDelight for KMP persistence — generates type-safe Kotlin APIs from SQL; SQLite on Android/iOS/JVM; multiplatform transactions, migrations, reactive queries via coroutines
Compose Multiplatform (CMP) — Jetbrains extension of Jetpack Compose that targets Android, iOS (Beta), Desktop, Web; shares UI code beyond just logic; appropriate when team wants near-100% code share
Module structure — :shared (commonMain + androidMain + iosMain) produces an Android AAR and an iOS Framework (XCFramework); iOS team consumes via CocoaPods or Swift Package Manager
KMP vs Flutter — KMP: native rendering, native feel, existing team skills; Flutter: single codebase including UI, own rendering engine (non-native feel), strong for new products
KMP vs React Native — KMP keeps native UI, RN uses JS bridge or JSI; KMP is better for performance-critical paths; RN is better for web-to-mobile teams
When to recommend KMP — existing Android/iOS teams, complex business logic that must stay in sync (e.g. pricing rules, validation), gradual adoption possible (start with one UseCase)
When NOT to recommend KMP — small team with only Android engineers, timeline pressure, UI-heavy app where shared logic savings are minimal, or CMP iOS Beta stability is unacceptable

Layer	Shared (commonMain)	Native (androidMain / iosMain)
Network client	Ktor HttpClient (platform engine injected)	OkHttp engine (Android), Darwin engine (iOS)
Persistence	SQLDelight queries & migrations	SQLiteDriver (Android), NativeSqliteDriver (iOS)
Business logic	UseCases, domain models, validation	—
Platform APIs	expect declarations	actual: Camera, Biometrics, Push, BLE
UI	Compose Multiplatform (optional)	Compose (Android), SwiftUI (iOS)
DI	Koin (multiplatform)	Hilt (Android only, if not using Koin)

Recommended Libraries

Ktor — Multiplatform async HTTP client. OkHttp engine on Android, Darwin on iOS. Coroutine-based. Best for KMP networking.
SQLDelight — Generates type-safe Kotlin from SQL. Multiplatform SQLite. Reactive queries via coroutines. Standard for KMP persistence.
Koin — Lightweight DI framework with multiplatform support. Works in commonMain. Alternative to Hilt for KMP projects.
Compose Multiplatform — JetBrains extension of Compose for Android, iOS (Beta), Desktop, Web. Shares UI across platforms.
kotlinx.serialization — Multiplatform JSON/Protobuf serialization. No reflection. Works in commonMain alongside Ktor.

SeniorKnows KMP exists and what can be shared. Can set up a :shared module with Ktor + SQLDelight.

StaffDesigns the shared/native boundary, chooses KMP vs Flutter vs RN with trade-off reasoning, sets up expect/actual for platform APIs, and advises iOS team on XCFramework integration.

PrincipalOwns the multi-platform strategy across org. Decides when to adopt CMP vs native UI per product line, defines module structure conventions, and evaluates build performance implications of shared module graph.

ML/AI on Android — On-Device Inference

TFLite, ML Kit, and on-device LLM inference are Staff-level topics at most major Android shops in 2025. You are expected to reason about when to run inference on-device vs server-side, and what the performance and privacy trade-offs are.

TensorFlow Lite (TFLite) — run quantized ML models on-device; no network required; model bundled in assets or downloaded via Firebase Model Delivery
ML Kit — Google's on-device ML SDK; pre-built models for text recognition, face detection, barcode scanning, translation; wraps TFLite; zero ML expertise required
NNAPI (Neural Networks API) — Android hardware abstraction layer for ML; routes inference to GPU, DSP, or NPU when available; TFLite and ML Kit use it automatically
Model quantization — INT8 quantization reduces model size 4x and speeds up inference 2–4x; quality loss is typically <1% for vision models; required for mobile deployment
INT8 vs FP16 — INT8 is faster on NNAPI/NPU, uses less memory; FP16 retains more precision; FP32 is full precision training format — never ship FP32 to mobile
On-device vs server inference — on-device: no latency, no cost per call, privacy-preserving, works offline, but limited model size; server: larger models, always up to date, but adds RTT and cost
MediaPipe — Google's on-device ML framework for real-time pipelines (pose estimation, hand landmarks, face mesh); hardware accelerated; multiplatform
Firebase ML — model hosting with versioning; A/B test model versions; deliver model updates OTA without app release
On-device LLM — Google AI Edge (formerly LiteRT) runs Gemma 2B/7B on Pixel 8+ NPU; MediaPipe LLM Inference API; typical token throughput: 20–40 tok/s on Pixel 8 Pro

Approach	Model Size	Latency	Privacy	Use When
ML Kit (pre-built)	Built-in / ~10MB	<10ms most tasks	On-device, no data leaves	Barcode, face, OCR, translation — standard tasks
TFLite custom model	0.5MB–50MB quantized	10–200ms	On-device	Custom classification, NLP, anomaly detection
MediaPipe	Varies	Real-time (camera)	On-device	Pose, hand, face tracking in live video
On-device LLM (Gemma 2B)	~1.5GB INT4	20–40 tok/s on Pixel 8 Pro	On-device	Chat, summarization without server cost
Server inference (Gemini API)	Unlimited	100–300ms + RTT	Data sent to server	Complex reasoning, large context, latest model

Recommended Libraries

ML Kit — Google's on-device ML SDK. Pre-built models for text, face, barcode, translation. No ML expertise needed.
TensorFlow Lite — Run quantized TF models on-device. NNAPI/GPU delegate for acceleration. Flexible for custom models.
MediaPipe — Real-time on-device ML pipelines. Pose, hand, face, object detection. Multiplatform, hardware accelerated.
Google AI Edge (LiteRT) — On-device LLM inference. Runs Gemma 2B/7B on NPU. MediaPipe LLM Inference API.
Firebase ML — Host, version, and A/B test TFLite models. Deliver model updates OTA without app release.

SeniorKnows ML Kit exists, can integrate a pre-built model. Understands on-device vs server trade-off at a surface level.

StaffChooses between ML Kit, custom TFLite, and server inference with explicit reasoning about latency, cost, and privacy. Designs model update pipeline via Firebase ML. Knows quantization options.

PrincipalDesigns the org's on-device AI strategy — which models run on NPU vs server, how model versioning integrates with app release cadence, and how to build feedback loops that improve models without sending raw user data off-device.

Interview tip: When asked about ML features, immediately frame it as a make-vs-buy and on-device-vs-server decision. ML Kit for standard tasks (barcode, OCR) is almost always right — built-in, maintained by Google, zero cost per call. Custom TFLite is warranted when no pre-built model covers your use case. Server inference is warranted when model quality matters more than latency and privacy. Saying 'it depends on privacy requirements and offline needs' scores Staff-level points.

GenAI Integration Patterns for Android

In 2025, integrating LLMs into Android apps is a Staff-level expectation at FAANG and most tier-1 shops. The patterns differ significantly from standard API calls — streaming responses, token budgets, on-device vs server routing, and prompt security all apply.

Streaming responses — LLMs emit tokens, not complete responses; use SSE or streaming HTTP to render progressively; prevents 5–30 second blank-screen wait
Token streaming to Android — Gemini API supports streamGenerateContent; parse Server-Sent Events; append each token to a StateFlow<String>; Compose LazyColumn auto-scrolls as text grows
Prompt injection — user input that attempts to override system prompt instructions; mitigate by never concatenating user content directly into system prompts; use role-separated message format
Context window budget — LLMs have token limits (Gemini Flash: 1M tokens, but cost scales); send only relevant context; summarize conversation history beyond N turns
On-device LLM routing — use Gemma 2B on-device for short, privacy-sensitive tasks; route to Gemini server API for complex reasoning; routing decision can be heuristic (message length, topic classification)
Grounding — LLMs hallucinate; ground responses with retrieved context (RAG); for Android: fetch user's relevant data before prompt; include it explicitly in the prompt as context
Function calling — Gemini/Claude support structured function call responses; parse the JSON response to trigger native Android actions (open camera, make payment) from LLM output
Rate limiting and cost control — LLM API calls cost money per token; implement per-user rate limits; debounce streaming calls; cache identical prompts (semantic caching if needed)

SeniorCan integrate Gemini API, handle streaming, and display tokens progressively in Compose.

StaffDesigns on-device vs server routing, implements prompt injection safeguards, manages context window budget, and adds cost monitoring per user.

PrincipalOwns the org's GenAI platform — model selection strategy, cost allocation, privacy architecture (what data can enter which model), and feedback loop design.

Interview tip: For any AI feature question, structure your answer around: (1) on-device vs server — privacy and latency, (2) streaming vs batch — user experience, (3) prompt security — injection prevention, (4) cost control — token budget and caching. These four dimensions show Staff-level thinking.

API Gateway & Edge Architecture

Staff engineers reason about the full request path, not just what happens inside the app.

CDN — serve static assets and cacheable API responses from edge nodes; reduces origin load and latency globally
BFF (Backend For Frontend) — a gateway layer tailored to mobile; aggregates multiple service calls into one mobile-optimised response; reduces round trips and over-fetching
Rate limiting at gateway — protects origin from DDoS and runaway clients; return 429 with Retry-After header; client should respect it
Edge auth — validate JWT at edge before request reaches origin; fail fast, save origin compute

Mobile App
    │  HTTPS
    ▼
CDN / Edge Cache  (CloudFront, Fastly)
├── cache static assets, API responses with Cache-Control
├── edge auth, rate limiting, geo-routing
    │
    ▼
API Gateway  (Kong, AWS API GW, custom)
├── auth token validation (JWT verify)
├── rate limiting per user/IP
├── request aggregation / BFF (Backend For Frontend)
├── protocol translation (REST → gRPC to internal services)
    │
    ▼
Service Layer  (microservices / monolith)
    │
    ▼
Data Stores  (DB, cache, object store)

SeniorKnows the app talks to an API

StaffDescribes CDN, gateway, and BFF layers and their purpose

PrincipalDesigns the mobile API platform including BFF, versioning, and observability across layers

Cost Awareness

Principal engineers think in cost. Every architectural choice has a dollar cost at scale.

Decision	Cheaper Option	More Expensive Option	Cost Driver
Real-time transport	SSE — stateless HTTP, scales with standard infra	WebSocket — requires sticky sessions or connection broker	Server connection state
Data format	Protobuf — 3-10x smaller payload	JSON — verbose	Egress bandwidth at 50M DAU
Update delivery	Push (FCM) — server pushes only on change	Polling — client hits server every N seconds regardless	Origin server compute + DB reads
Caching	CDN edge cache — serve from edge, zero origin cost	No cache — every request hits origin	Origin compute + DB cost
Image storage	WebP at CDN — compressed, edge-served	Original PNG served from origin	Storage + egress
Search	Client-side filter on cached list	Server search on every keystroke	Server compute + DB query cost