criticalstaff2026

WHOOP Android

The Infinite Suspend

WHOOP's app was burning users' phone batteries all night. Google Play flagged it. 15% of sessions had excessive wake locks. Two suspended coroutines were the culprits.

The Incident

In early 2026, Google Play's new Excessive Partial Wake Lock metric flagged WHOOP's app: 15% of user sessions exceeded 2 cumulative hours of CPU wake lock in a 24-hour period — triple the 5% threshold. The Play Store warning label was imminent. Two CoroutineWorker instances were the culprits, and both were doing the same thing: suspending indefinitely instead of returning when their preconditions weren't met.

Evidence from the Scene

  • Google Play Console: 15% of sessions flagging excessive partial wake lock (threshold: 5%)
  • Average WorkManager worker runtime: 35 seconds. P95: 2 minutes 27 seconds
  • Both offending workers were CoroutineWorkers that could suspend for minutes
  • One worker called StateFlow.filterNotNull().first() — which suspends until an emission arrives
  • Worker 2 waited for an event that was never emitted when the WHOOP strap was disconnected
  • Removing the strap from the charger before bed triggered both workers simultaneously

The Suspects

2 of these are the real root causes. The others are plausible-sounding distractors.

StateFlow.filterNotNull().first() suspending forever when the strap was disconnected (value permanently null)

Worker encoding 'wait for conditions to change' logic instead of 'check and exit' logic

No withTimeout() guard on the worker's suspending operations

PeriodicWorkRequest configured with too-short an interval, scheduling workers every few minutes

PowerManager.WakeLock acquired manually and not released in a finally block

The Verdict

Real Root Causes

  • StateFlow.filterNotNull().first() suspending forever when the strap was disconnected (value permanently null)

    filterNotNull().first() suspends until the StateFlow emits a non-null value. When the WHOOP strap was disconnected, the sensor StateFlow's value was null and no new emission was pending — so the coroutine suspended indefinitely until WorkManager's timeout, holding a CPU wake lock the entire time. Fix: replayCache.firstOrNull() exits in milliseconds if the value is null.

  • Worker encoding 'wait for conditions to change' logic instead of 'check and exit' logic

    Both workers were designed to wait inside their doWork() body for preconditions that might never be met. WorkManager workers should check their preconditions immediately and return Result.failure() or Result.retry() if not met — never suspend indefinitely waiting for state changes inside the worker body.

Plausible But Wrong

  • No withTimeout() guard on the worker's suspending operations

    A withTimeout() wrapper would have limited the damage, but it's a mitigation — not the fix. The real issue is the architectural pattern: workers must not suspend waiting for events. Fixing the pattern (check-then-exit) is the correct solution.

  • PeriodicWorkRequest configured with too-short an interval, scheduling workers every few minutes

    Frequent scheduling would increase the number of wake events — a real battery concern — but would not produce the 2-minute P95 runtime observed here. The indefinite suspension is what drove the extreme wake lock duration.

  • PowerManager.WakeLock acquired manually and not released in a finally block

    Manual wake lock mismanagement is a classic battery drain cause — but WorkManager manages its own wake lock automatically. The issue here is WorkManager holding its internal wake lock while the worker's coroutine is suspended indefinitely.

Summary

WHOOP's two offending CoroutineWorkers shared the same anti-pattern: they encoded 'wait for state to change' logic inside the worker body instead of 'check state and exit'. When the WHOOP strap was disconnected, the StateFlow had a null value with no pending emission — so filterNotNull().first() suspended forever. The fix was one line: replace filterNotNull().first() with replayCache.firstOrNull() and return Result.failure() immediately if null. Average worker runtime: 35 seconds → 3 seconds. P95: 2 minutes 27 seconds → 4 seconds. Excessive wake lock sessions: 15% → under 1%. Published March 2026 — the month Google began enforcing excessive wake lock as a Play Store quality gate.

The Real Decision That Caused This

Writing CoroutineWorkers that suspend waiting for external state instead of checking state synchronously and exiting — turning a WorkManager 'unit of work' into an indefinite event listener that holds a CPU wake lock.

Lesson Hint

Chapter 6 (Concurrency) covers coroutine cancellation, StateFlow replay cache, and structured concurrency. Chapter 7 (Platform & Performance) covers WorkManager patterns, battery optimization, and Google Play's wake lock enforcement.

Want to test yourself before reading the verdict?

Open Interactive Case in Autopsy Lab