Chapter 9

Quality & Reliability

Testing · Observability · Feature Flags · CI/CD · A/B Testing · Push Notifications

Ready to practise interactively?

Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.

Testing Strategy

A balanced testing strategy for reliable mobile apps.

Test UseCases independently — zero Android deps means pure JUnit; no instrumentation needed
Fake repositories, not mocks — fakes are more maintainable; mocks couple tests to implementation
Turbine — library for testing Kotlin Flow; replaces runBlocking + toList() patterns

Test Type	Scope	Tools	Target %
Unit	Single class/function in isolation	JUnit 5, MockK, Turbine (Flow testing)	70%+
Integration	Multiple classes together; real Room DB	Robolectric, in-memory Room	20%
UI / E2E	Full screen or user journey	Espresso, Compose UI Tests	~10%
Screenshot	Visual regression detection	Paparazzi (offline), Shot	Key screens
Performance	Startup time, frame rate regressions	Macrobenchmark, Benchmark library	Critical paths

JUnit 5 — Modern testing framework for JVM. Parameterized tests, extensions, better assertions.
MockK — Kotlin-first mocking library. Coroutine support, relaxed mocks. Preferred over Mockito for Kotlin.
Turbine — Flow testing library by CashApp. Clean API for asserting emissions, errors, completion.
Espresso — Android UI testing framework. Synchronizes with UI thread. Standard for instrumented tests.
Compose UI Test — Testing library for Jetpack Compose. Semantic matchers, test rules, screenshot testing.
Paparazzi — Snapshot testing without emulator. Fast, runs on JVM. Good for design system validation.
Macrobenchmark — Jetpack library for startup/scroll performance testing. Integrates with CI.

Monitoring and understanding production behavior.

Crashlytics / Sentry — crash reporting with breadcrumbs; group similar crashes; alert on spikes
Firebase Performance / Datadog RUM — network latency, screen rendering time, custom traces
Structured logging — include severity, requestId, userId (hashed), feature name in every log line
Distributed tracing — attach mobile request ID to API calls; correlate with backend spans in Datadog/Jaeger
Error budgets — define acceptable error rate per feature (e.g. <0.1% crash-free session threshold); alert when budget is 50% consumed
ANR rate — track separately from crash rate; Play Store flags apps with elevated ANR rates through its Vitals system. Google does not publish an exact threshold; industry targets are typically <0.5% ANR rate.

Controlling feature rollout and risk management.

Firebase Remote Config — free; integrates with A/B Testing; suitable for most teams
LaunchDarkly — enterprise-grade; user targeting, percentage rollouts, flag dependencies
Gradual rollout — 1% → 5% → 10% → 25% → 50% → 100%; monitor metrics at each stage
Kill switch — disable any feature server-side without a release; critical for high-risk launches
Flag hygiene — remove flags within 2 sprints of full rollout; stale flags = hidden tech debt

Firebase Remote Config — Free feature flags with A/B testing. Integrates with Firebase Analytics. Good for most teams.
LaunchDarkly — Enterprise feature flag platform. User targeting, flag dependencies, audit logs. Production-grade for large teams.
Statsig — Feature flags with built-in experimentation and product analytics. Growing alternative to LaunchDarkly.

Meta runs thousands of simultaneous experiments. L7+ engineers are expected to understand this.

User bucketing — hash(userId + experimentId) % 100 for consistent, stable assignment
Server-side assignment — bucketing done on server; client receives assigned variant; prevents client-side manipulation
Experiment logging — log exposure event the moment user sees the variant (not on app launch)
Holdout groups — reserve 1–5% of users excluded from all experiments to measure aggregate experiment impact
Mutual exclusion — ensure user is not simultaneously in conflicting experiments (e.g. two experiments modifying the same button)
Guard rails — define metrics that must not regress (crash rate, payment success rate) regardless of experiment outcome

Safe and reliable release processes.

Pipeline — every PR: lint → unit tests → instrumented tests (emulator) → build
Release track — Internal (team) → Closed Testing (QA) → Open Testing (beta) → Production
Staged rollout — release to 1% → 5% → 25% → 100% in Play Console; halt on metric regression
Modularization + build caching — Gradle remote cache + parallel compilation reduces CI time
Baseline Profiles in CI — regenerate and commit profiles on every release build

Effective push notification implementation.

FCM high-priority — bypasses Doze; for time-sensitive alerts (messages, calls)
Data messages (silent push) — triggers background sync without visible notification
Notification channels — required Android 8+; group by type; users control per-channel
Deep links from notifications — explicit Intent + TaskStackBuilder to reconstruct correct back stack
Push vs Pull — push is battery-efficient; pull is simpler; hybrid = push for trigger + pull for data

Firebase Cloud Messaging (FCM) — Google's push notification service. Free, reliable, integrates with Firebase Analytics.
Firebase Crashlytics — Crash reporting with AI-powered insights. Tracks ANRs, non-fatal errors. Standard for Android.
Firebase Analytics — App analytics with user properties, events, funnels. Foundation for A/B testing and Remote Config.