Chapter 9
Quality & Reliability
Testing · Observability · Feature Flags · CI/CD · A/B Testing · Push Notifications
Ready to practise interactively?
Explore this chapter with quizzes, diagrams, and real-world examples in the full interactive experience.
Testing Strategy
A balanced testing strategy for reliable mobile apps.
- Test UseCases independently — zero Android deps means pure JUnit; no instrumentation needed
- Fake repositories, not mocks — fakes are more maintainable; mocks couple tests to implementation
- Turbine — library for testing Kotlin Flow; replaces runBlocking + toList() patterns
| Test Type | Scope | Tools | Target % |
|---|---|---|---|
| Unit | Single class/function in isolation | JUnit 5, MockK, Turbine (Flow testing) | 70%+ |
| Integration | Multiple classes together; real Room DB | Robolectric, in-memory Room | 20% |
| UI / E2E | Full screen or user journey | Espresso, Compose UI Tests | ~10% |
| Screenshot | Visual regression detection | Paparazzi (offline), Shot | Key screens |
| Performance | Startup time, frame rate regressions | Macrobenchmark, Benchmark library | Critical paths |
Recommended Libraries
- JUnit 5 — Modern testing framework for JVM. Parameterized tests, extensions, better assertions.
- MockK — Kotlin-first mocking library. Coroutine support, relaxed mocks. Preferred over Mockito for Kotlin.
- Turbine — Flow testing library by CashApp. Clean API for asserting emissions, errors, completion.
- Espresso — Android UI testing framework. Synchronizes with UI thread. Standard for instrumented tests.
- Compose UI Test — Testing library for Jetpack Compose. Semantic matchers, test rules, screenshot testing.
- Paparazzi — Snapshot testing without emulator. Fast, runs on JVM. Good for design system validation.
- Macrobenchmark — Jetpack library for startup/scroll performance testing. Integrates with CI.
Observability
Monitoring and understanding production behavior.
- Crashlytics / Sentry — crash reporting with breadcrumbs; group similar crashes; alert on spikes
- Firebase Performance / Datadog RUM — network latency, screen rendering time, custom traces
- Structured logging — include severity, requestId, userId (hashed), feature name in every log line
- Distributed tracing — attach mobile request ID to API calls; correlate with backend spans in Datadog/Jaeger
- Error budgets — define acceptable error rate per feature (e.g. <0.1% crash-free session threshold); alert when budget is 50% consumed
- ANR rate — track separately from crash rate; Play Store flags apps with elevated ANR rates through its Vitals system. Google does not publish an exact threshold; industry targets are typically <0.5% ANR rate.
Feature Flags
Controlling feature rollout and risk management.
- Firebase Remote Config — free; integrates with A/B Testing; suitable for most teams
- LaunchDarkly — enterprise-grade; user targeting, percentage rollouts, flag dependencies
- Gradual rollout — 1% → 5% → 10% → 25% → 50% → 100%; monitor metrics at each stage
- Kill switch — disable any feature server-side without a release; critical for high-risk launches
- Flag hygiene — remove flags within 2 sprints of full rollout; stale flags = hidden tech debt
Recommended Libraries
- Firebase Remote Config — Free feature flags with A/B testing. Integrates with Firebase Analytics. Good for most teams.
- LaunchDarkly — Enterprise feature flag platform. User targeting, flag dependencies, audit logs. Production-grade for large teams.
- Statsig — Feature flags with built-in experimentation and product analytics. Growing alternative to LaunchDarkly.
A/B Testing Architecture at Scale
Meta runs thousands of simultaneous experiments. L7+ engineers are expected to understand this.
- User bucketing — hash(userId + experimentId) % 100 for consistent, stable assignment
- Server-side assignment — bucketing done on server; client receives assigned variant; prevents client-side manipulation
- Experiment logging — log exposure event the moment user sees the variant (not on app launch)
- Holdout groups — reserve 1–5% of users excluded from all experiments to measure aggregate experiment impact
- Mutual exclusion — ensure user is not simultaneously in conflicting experiments (e.g. two experiments modifying the same button)
- Guard rails — define metrics that must not regress (crash rate, payment success rate) regardless of experiment outcome
CI/CD & Release Strategy
Safe and reliable release processes.
- Pipeline — every PR: lint → unit tests → instrumented tests (emulator) → build
- Release track — Internal (team) → Closed Testing (QA) → Open Testing (beta) → Production
- Staged rollout — release to 1% → 5% → 25% → 100% in Play Console; halt on metric regression
- Modularization + build caching — Gradle remote cache + parallel compilation reduces CI time
- Baseline Profiles in CI — regenerate and commit profiles on every release build
Push Notifications
Effective push notification implementation.
- FCM high-priority — bypasses Doze; for time-sensitive alerts (messages, calls)
- Data messages (silent push) — triggers background sync without visible notification
- Notification channels — required Android 8+; group by type; users control per-channel
- Deep links from notifications — explicit Intent + TaskStackBuilder to reconstruct correct back stack
- Push vs Pull — push is battery-efficient; pull is simpler; hybrid = push for trigger + pull for data
Recommended Libraries
- Firebase Cloud Messaging (FCM) — Google's push notification service. Free, reliable, integrates with Firebase Analytics.
- Firebase Crashlytics — Crash reporting with AI-powered insights. Tracks ANRs, non-fatal errors. Standard for Android.
- Firebase Analytics — App analytics with user properties, events, funnels. Foundation for A/B testing and Remote Config.