Chapter 10

Scale & Cross-Platform

SDK Design · Design Systems · Accessibility · i18n · KMP · Image Pipelines

Ready to practise interactively?

Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.

Open Interactive Chapter →

SDK / Library Design

Expected at Stripe, Palo Alto Networks, Anthropic, and any company shipping a developer-facing SDK.

  • Minimal API surface — expose only what is necessary; every public API is a contract you must maintain
  • Backward compatibility — use @Deprecated with replacement; never remove public APIs in minor versions
  • Semantic versioning — MAJOR.MINOR.PATCH; breaking changes = major bump
  • Binary compatibility — use Binary Compatibility Validator (Kotlin) to catch ABI breaks in CI
  • ProGuard consumer rules — ship consumer-rules.pro so consumers don't need to add keep rules
  • Initialization — support both manual init and auto-init via ContentProvider (like Firebase)
  • Avoid leaking internal types — internal classes must not appear in public API signatures

Design System Architecture

L8+ engineers are expected to think about design systems, not just use them.

  • Token-based theming — define semantic tokens (colorPrimary, spacingMd, typographyHeadline) not raw values; map tokens to platform values per theme
  • MaterialTheme extension — extend Compose MaterialTheme with custom tokens via CompositionLocal
  • Component versioning — breaking design changes = new component (Button vs ButtonV2) until migration is complete
  • Multi-theme support — light, dark, high-contrast; tokens map differently per theme
  • Shared across platforms — with KMP, design tokens can be shared; platform renders natively

Accessibility (a11y)

Google tests this at every level. Airbnb has one of the strongest a11y cultures in the industry.

  • Content descriptions — every icon, image, and non-text element needs a meaningful description
  • Semantic properties in Compose — use Modifier.semantics { } to provide role, state, and actions to TalkBack
  • Touch target size — minimum 48x48dp; use Modifier.minimumInteractiveComponentSize()
  • Color contrast — 4.5:1 for normal text; 3:1 for large text (WCAG AA standard)
  • Focus order — ensure TalkBack traversal order matches visual order; use isTraversalGroup and traversalIndex to correct
  • Screen reader testing — test with TalkBack on a real device before shipping

Internationalization (i18n) & Localization

Building apps for a global audience.

  • RTL layout support — use start/end not left/right; test with Arabic or Hebrew locale; Compose handles RTL automatically
  • Plurals — use plurals resource type; never concatenate strings for counts
  • String formatting — use getString(R.string.x, arg); never concatenate translated strings with hardcoded text
  • Locale-aware formatting — dates, times, currencies must use system locale; never hardcode format strings
  • Pseudo-localization — enable in developer options to catch layout truncation and hardcoded strings early
  • Font scaling — test at 200% font size; use sp for text; ensure layouts don't break

Scalable Image Pipeline

Beyond 'use Coil'. Relevant at Instagram, Airbnb, Netflix — any image-heavy app.

  • Memory cache (L1) — in-memory LruCache keyed by URL + size; bounded by available RAM
  • Disk cache (L2) — DiskLruCache; keyed by URL hash; bounded by configured disk quota
  • Transformations — resize, crop, circle-crop applied before caching; cache stores transformed result not original
  • Priority queuing — visible items load first; prefetch off-screen items at lower priority
  • Animated images — WebP preferred over GIF (smaller); use Coil's AsyncImage with enableAnimatedImage = true
  • Placeholder strategy — dominant color placeholder (Airbnb Lottie pattern) vs blur hash vs skeleton; all better than blank space

Kotlin Multiplatform (KMP) — Staff-Level Deep Dive

KMP shares business logic across Android, iOS, desktop, and web while keeping UI fully native. In 2025–2026 this is increasingly in scope for Staff-level platform strategy discussions at companies like Netflix, Touchlab clients, and any team that ships on both mobile platforms.

  • Shared code targets: business logic, domain UseCases, data models, repository interfaces, validation, network client (Ktor), local storage (SQLDelight), and analytics events
  • Native-only code: UI layer (Compose Multiplatform on Android, SwiftUI on iOS), platform APIs (camera, biometrics, BLE, push, file system), and navigation stacks
  • expect / actual — declare an interface in commonMain with expect; each platform provides an actual implementation; used for platform-specific clocks, UUID generation, crypto, file IO
  • Ktor for KMP networking — multiplatform HTTP client; uses OkHttp engine on Android, Darwin (NSURLSession) on iOS; same coroutine-based API across platforms; serialization via kotlinx.serialization
  • SQLDelight for KMP persistence — generates type-safe Kotlin APIs from SQL; SQLite on Android/iOS/JVM; multiplatform transactions, migrations, reactive queries via coroutines
  • Compose Multiplatform (CMP) — Jetbrains extension of Jetpack Compose that targets Android, iOS (Beta), Desktop, Web; shares UI code beyond just logic; appropriate when team wants near-100% code share
  • Module structure — :shared (commonMain + androidMain + iosMain) produces an Android AAR and an iOS Framework (XCFramework); iOS team consumes via CocoaPods or Swift Package Manager
  • KMP vs Flutter — KMP: native rendering, native feel, existing team skills; Flutter: single codebase including UI, own rendering engine (non-native feel), strong for new products
  • KMP vs React Native — KMP keeps native UI, RN uses JS bridge or JSI; KMP is better for performance-critical paths; RN is better for web-to-mobile teams
  • When to recommend KMP — existing Android/iOS teams, complex business logic that must stay in sync (e.g. pricing rules, validation), gradual adoption possible (start with one UseCase)
  • When NOT to recommend KMP — small team with only Android engineers, timeline pressure, UI-heavy app where shared logic savings are minimal, or CMP iOS Beta stability is unacceptable
LayerShared (commonMain)Native (androidMain / iosMain)
Network clientKtor HttpClient (platform engine injected)OkHttp engine (Android), Darwin engine (iOS)
PersistenceSQLDelight queries & migrationsSQLiteDriver (Android), NativeSqliteDriver (iOS)
Business logicUseCases, domain models, validation
Platform APIsexpect declarationsactual: Camera, Biometrics, Push, BLE
UICompose Multiplatform (optional)Compose (Android), SwiftUI (iOS)
DIKoin (multiplatform)Hilt (Android only, if not using Koin)

Recommended Libraries

  • Ktor Multiplatform async HTTP client. OkHttp engine on Android, Darwin on iOS. Coroutine-based. Best for KMP networking.
  • SQLDelight Generates type-safe Kotlin from SQL. Multiplatform SQLite. Reactive queries via coroutines. Standard for KMP persistence.
  • Koin Lightweight DI framework with multiplatform support. Works in commonMain. Alternative to Hilt for KMP projects.
  • Compose Multiplatform JetBrains extension of Compose for Android, iOS (Beta), Desktop, Web. Shares UI across platforms.
  • kotlinx.serialization Multiplatform JSON/Protobuf serialization. No reflection. Works in commonMain alongside Ktor.
SeniorKnows KMP exists and what can be shared. Can set up a :shared module with Ktor + SQLDelight.
StaffDesigns the shared/native boundary, chooses KMP vs Flutter vs RN with trade-off reasoning, sets up expect/actual for platform APIs, and advises iOS team on XCFramework integration.
PrincipalOwns the multi-platform strategy across org. Decides when to adopt CMP vs native UI per product line, defines module structure conventions, and evaluates build performance implications of shared module graph.

ML/AI on Android — On-Device Inference

TFLite, ML Kit, and on-device LLM inference are Staff-level topics at most major Android shops in 2025. You are expected to reason about when to run inference on-device vs server-side, and what the performance and privacy trade-offs are.

  • TensorFlow Lite (TFLite) — run quantized ML models on-device; no network required; model bundled in assets or downloaded via Firebase Model Delivery
  • ML Kit — Google's on-device ML SDK; pre-built models for text recognition, face detection, barcode scanning, translation; wraps TFLite; zero ML expertise required
  • NNAPI (Neural Networks API) — Android hardware abstraction layer for ML; routes inference to GPU, DSP, or NPU when available; TFLite and ML Kit use it automatically
  • Model quantization — INT8 quantization reduces model size 4x and speeds up inference 2–4x; quality loss is typically <1% for vision models; required for mobile deployment
  • INT8 vs FP16 — INT8 is faster on NNAPI/NPU, uses less memory; FP16 retains more precision; FP32 is full precision training format — never ship FP32 to mobile
  • On-device vs server inference — on-device: no latency, no cost per call, privacy-preserving, works offline, but limited model size; server: larger models, always up to date, but adds RTT and cost
  • MediaPipe — Google's on-device ML framework for real-time pipelines (pose estimation, hand landmarks, face mesh); hardware accelerated; multiplatform
  • Firebase ML — model hosting with versioning; A/B test model versions; deliver model updates OTA without app release
  • On-device LLM — Google AI Edge (formerly LiteRT) runs Gemma 2B/7B on Pixel 8+ NPU; MediaPipe LLM Inference API; typical token throughput: 20–40 tok/s on Pixel 8 Pro
ApproachModel SizeLatencyPrivacyUse When
ML Kit (pre-built)Built-in / ~10MB<10ms most tasksOn-device, no data leavesBarcode, face, OCR, translation — standard tasks
TFLite custom model0.5MB–50MB quantized10–200msOn-deviceCustom classification, NLP, anomaly detection
MediaPipeVariesReal-time (camera)On-devicePose, hand, face tracking in live video
On-device LLM (Gemma 2B)~1.5GB INT420–40 tok/s on Pixel 8 ProOn-deviceChat, summarization without server cost
Server inference (Gemini API)Unlimited100–300ms + RTTData sent to serverComplex reasoning, large context, latest model

Recommended Libraries

  • ML Kit Google's on-device ML SDK. Pre-built models for text, face, barcode, translation. No ML expertise needed.
  • TensorFlow Lite Run quantized TF models on-device. NNAPI/GPU delegate for acceleration. Flexible for custom models.
  • MediaPipe Real-time on-device ML pipelines. Pose, hand, face, object detection. Multiplatform, hardware accelerated.
  • Google AI Edge (LiteRT) On-device LLM inference. Runs Gemma 2B/7B on NPU. MediaPipe LLM Inference API.
  • Firebase ML Host, version, and A/B test TFLite models. Deliver model updates OTA without app release.
SeniorKnows ML Kit exists, can integrate a pre-built model. Understands on-device vs server trade-off at a surface level.
StaffChooses between ML Kit, custom TFLite, and server inference with explicit reasoning about latency, cost, and privacy. Designs model update pipeline via Firebase ML. Knows quantization options.
PrincipalDesigns the org's on-device AI strategy — which models run on NPU vs server, how model versioning integrates with app release cadence, and how to build feedback loops that improve models without sending raw user data off-device.

Interview tip: When asked about ML features, immediately frame it as a make-vs-buy and on-device-vs-server decision. ML Kit for standard tasks (barcode, OCR) is almost always right — built-in, maintained by Google, zero cost per call. Custom TFLite is warranted when no pre-built model covers your use case. Server inference is warranted when model quality matters more than latency and privacy. Saying 'it depends on privacy requirements and offline needs' scores Staff-level points.

GenAI Integration Patterns for Android

In 2025, integrating LLMs into Android apps is a Staff-level expectation at FAANG and most tier-1 shops. The patterns differ significantly from standard API calls — streaming responses, token budgets, on-device vs server routing, and prompt security all apply.

  • Streaming responses — LLMs emit tokens, not complete responses; use SSE or streaming HTTP to render progressively; prevents 5–30 second blank-screen wait
  • Token streaming to Android — Gemini API supports streamGenerateContent; parse Server-Sent Events; append each token to a StateFlow<String>; Compose LazyColumn auto-scrolls as text grows
  • Prompt injection — user input that attempts to override system prompt instructions; mitigate by never concatenating user content directly into system prompts; use role-separated message format
  • Context window budget — LLMs have token limits (Gemini Flash: 1M tokens, but cost scales); send only relevant context; summarize conversation history beyond N turns
  • On-device LLM routing — use Gemma 2B on-device for short, privacy-sensitive tasks; route to Gemini server API for complex reasoning; routing decision can be heuristic (message length, topic classification)
  • Grounding — LLMs hallucinate; ground responses with retrieved context (RAG); for Android: fetch user's relevant data before prompt; include it explicitly in the prompt as context
  • Function calling — Gemini/Claude support structured function call responses; parse the JSON response to trigger native Android actions (open camera, make payment) from LLM output
  • Rate limiting and cost control — LLM API calls cost money per token; implement per-user rate limits; debounce streaming calls; cache identical prompts (semantic caching if needed)
SeniorCan integrate Gemini API, handle streaming, and display tokens progressively in Compose.
StaffDesigns on-device vs server routing, implements prompt injection safeguards, manages context window budget, and adds cost monitoring per user.
PrincipalOwns the org's GenAI platform — model selection strategy, cost allocation, privacy architecture (what data can enter which model), and feedback loop design.

Interview tip: For any AI feature question, structure your answer around: (1) on-device vs server — privacy and latency, (2) streaming vs batch — user experience, (3) prompt security — injection prevention, (4) cost control — token budget and caching. These four dimensions show Staff-level thinking.

API Gateway & Edge Architecture

Staff engineers reason about the full request path, not just what happens inside the app.

  • CDN — serve static assets and cacheable API responses from edge nodes; reduces origin load and latency globally
  • BFF (Backend For Frontend) — a gateway layer tailored to mobile; aggregates multiple service calls into one mobile-optimised response; reduces round trips and over-fetching
  • Rate limiting at gateway — protects origin from DDoS and runaway clients; return 429 with Retry-After header; client should respect it
  • Edge auth — validate JWT at edge before request reaches origin; fail fast, save origin compute
Mobile App
    │  HTTPS
    ▼
CDN / Edge Cache  (CloudFront, Fastly)
├── cache static assets, API responses with Cache-Control
├── edge auth, rate limiting, geo-routing
    │
    ▼
API Gateway  (Kong, AWS API GW, custom)
├── auth token validation (JWT verify)
├── rate limiting per user/IP
├── request aggregation / BFF (Backend For Frontend)
├── protocol translation (REST → gRPC to internal services)
    │
    ▼
Service Layer  (microservices / monolith)
    │
    ▼
Data Stores  (DB, cache, object store)
SeniorKnows the app talks to an API
StaffDescribes CDN, gateway, and BFF layers and their purpose
PrincipalDesigns the mobile API platform including BFF, versioning, and observability across layers

Cost Awareness

Principal engineers think in cost. Every architectural choice has a dollar cost at scale.

DecisionCheaper OptionMore Expensive OptionCost Driver
Real-time transportSSE — stateless HTTP, scales with standard infraWebSocket — requires sticky sessions or connection brokerServer connection state
Data formatProtobuf — 3-10x smaller payloadJSON — verboseEgress bandwidth at 50M DAU
Update deliveryPush (FCM) — server pushes only on changePolling — client hits server every N seconds regardlessOrigin server compute + DB reads
CachingCDN edge cache — serve from edge, zero origin costNo cache — every request hits originOrigin compute + DB cost
Image storageWebP at CDN — compressed, edge-servedOriginal PNG served from originStorage + egress
SearchClient-side filter on cached listServer search on every keystrokeServer compute + DB query cost

Test your knowledge

This chapter includes 7 quiz questions covering all core concepts. Open the interactive experience to test yourself.

Start Quiz →