criticalstaff2022

OkCredit Android

The ANR Epidemic

OkCredit's ANR rate was 3x above Google's bad-behaviour threshold. Cold start was failing 4% of sessions. Seven root causes. All main thread. All fixable.

The Incident

OkCredit, a B2B bookkeeping app popular with small businesses in India, found themselves in Google Play's 'bad behaviour' zone: ANR rate well above the 0.47% threshold, and cold start failure rate at 4% when the target was under 1.2%. Their app was being backgrounded for offline service calls, and each of those background starts was 2.3x slower than a foreground launch at P99. Seven independent root causes were identified through systematic main-thread profiling.

Evidence from the Scene

  • ANR rate was flagged by Google Play — above the 0.47% bad-behaviour threshold
  • Cold start failure rate: 4% (target: under 1.2%)
  • Background-woken starts (from BroadcastReceiver / Service) were 2.3x slower than foreground at P99
  • Firebase Performance Monitoring's ContentProvider appeared first in startup traces
  • WorkManager's ContentProvider initialized even on cold starts that had nothing to do with background sync
  • SharedPreferences.apply() calls were found inside onPause() and onStop() lifecycle callbacks

The Suspects

3 of these are the real root causes. The others are plausible-sounding distractors.

Firebase Performance Monitoring ContentProvider adding latency to every app start

WorkManager ContentProvider initializing eagerly even when no background work was scheduled

SharedPreferences.apply() in lifecycle callbacks causing ANRs during system pressure

Room database migration running on the main thread on first launch after update

Glide AppGlideModule generating code that runs on Application.onCreate()

The Verdict

Real Root Causes

  • Firebase Performance Monitoring ContentProvider adding latency to every app start

    Firebase Performance registers a ContentProvider that runs before Application.onCreate() on every launch — including background wakes for BroadcastReceivers. Removing Firebase Performance and replacing with in-house instrumentation eliminated this from the startup critical path.

  • WorkManager ContentProvider initializing eagerly even when no background work was scheduled

    WorkManager's default initialization uses a ContentProvider that runs on every app start. Switching to On-Demand Initialization (removing the ContentProvider, calling WorkManager.initialize() only when needed) eliminated it from cold start entirely.

  • SharedPreferences.apply() in lifecycle callbacks causing ANRs during system pressure

    apply() is asynchronous but Android waits for its pending writes to complete when the system services trigger Activity/Service lifecycle transitions. Under memory pressure, this wait can exceed the ANR timeout. Moving writes to background threads with commit() or DataStore resolved the ANR spike.

Plausible But Wrong

  • Room database migration running on the main thread on first launch after update

    Room migrations on the main thread are a known ANR source — but the clues here point specifically to ContentProvider chains and SharedPreferences apply() as the identified root causes.

  • Glide AppGlideModule generating code that runs on Application.onCreate()

    Glide's generated code adds some startup overhead but was not among the seven root causes identified in OkCredit's profiling. The ContentProvider chain was the dominant startup cost.

Summary

OkCredit's ANR and startup crisis came from seven independent main-thread violations — all fixable, none individually catastrophic, but combined pushing them into Google Play's bad-behaviour zone. The biggest wins: removing Firebase Performance SDK (replaced with custom instrumentation), switching WorkManager to On-Demand Initialization, and moving SharedPreferences writes off lifecycle callbacks. Post-fix: ANR rate dropped 60% to 0.03% (15x below the threshold), cold start failure rate fell from 4% to 1.2%. Featured at Google I/O 2022 as a case study in ANR remediation at production scale.

The Real Decision That Caused This

Integrating SDKs with ContentProviders without measuring their startup cost, and using SharedPreferences.apply() inside lifecycle callbacks without understanding Android's write-flush behaviour under system pressure.

Lesson Hint

Chapter 7 (Platform & Performance) covers ANR prevention, startup optimization, and StrictMode. Chapter 9 (Quality & Reliability) covers Android Vitals, ANR rate thresholds, and Play Store quality gates.

Want to test yourself before reading the verdict?

Open Interactive Case in Autopsy Lab