A confounder is a hidden variable that drives both sides of a correlation. The textbook example: ice cream sales correlate with drownings. Neither causes the other. Summer drives both. If you forget to control for the season, you can confidently report a relationship that doesn't exist.
Consumer health data has a confounder most products ignore. It is called the menstrual cycle.
The mechanism
Across four to seven days of any given month, a person who menstruates moves through hormonal states that change resting heart rate, body temperature, sleep architecture, glucose response, mood, appetite, hydration, perceived exertion, and inflammation markers. The shifts are not subtle. Resting heart rate rises by three to seven beats per minute in the luteal phase. Body temperature rises by about half a degree Celsius. Sleep efficiency drops measurably in the days before bleeding.
These shifts are biological reality, not noise. They are also systematic — the same person experiences them in the same direction each cycle, with timing predictable to within a day or two.
If you collect health data from a population that includes menstruating people, and you do not condition on cycle phase, every single one of those signals is contaminated. Not occasionally. Structurally.
What this does to a correlation
Suppose you are running a wellness study. You find that a particular intervention — a sleep app, a meal pattern, a meditation practice — correlates with improved heart-rate variability over four weeks. You publish.
The signal is real in the data. It may also be entirely an artifact of when in the cycle each participant happened to start the intervention. If the trial began in the follicular phase (when HRV trends higher anyway) and you measured outcomes in the next follicular phase, you have measured the cycle, not the intervention. A study that started a week earlier or later would produce a different number.
The naive fix is to "control for sex" by running male-only or female-only cohorts. That isn't a fix. A female cohort still has within-subject cycle variation. Pooling across phases hides it. The honest fix is to condition on phase — record where each measurement sits in the participant's cycle and either stratify the analysis or include phase as a covariate.
Almost no consumer health product does this. The infrastructure for it doesn't exist as a default. Most data pipelines treat the user as a single time series with no phase context, and most product teams do not know which of their findings would survive conditioning.
Why this matters now
Wearables collect billions of physiological measurements a day. They are increasingly used to train models that predict health states, recommend behaviors, and feed clinical decision support. If those models are trained on data that ignores the largest within-subject biological variable affecting half the species, the models will work in the sense of producing predictions, and they will be wrong in ways no validation set will catch — because the validation set shares the same blind spot.
This is not a fairness complaint, though it is also that. It is a correctness complaint. A correlation that does not survive conditioning on cycle phase is not a finding. It is a coincidence between an exposure and a hormonal state.
What honest practice looks like
I am building toward this in Fyll. The minimum bar I hold to:
- Every measurement is timestamped against the user's predicted cycle phase.
- Aggregations across users are stratified or weighted by phase composition, never pooled.
- Within-user trends are compared across the same phase, not across the calendar week.
- Any insight surfaced to the user names the phase the comparison was drawn from.
These are not exotic techniques. They are what Pearl would call basic conditioning. They cost more compute, more storage, and a few extra columns in every table. They make the math harder to publish because the sample sizes per phase are smaller. That is the trade. The alternative is a product that confidently tells women things about their bodies that are functions of timing rather than truth.
The wider point
Most consumer health data is collected without recording the largest biological covariate driving its variance. The field has spent fifteen years optimizing devices, models, and dashboards on top of that omission. Some of what has been claimed will hold up under proper conditioning. Much of it will not. Sorting which is which is the work of the next decade.
Conditioning on cycle is the cleanest place to start, because the mechanism is well-understood, the timing is predictable, and the signal is large. Almost everything else in consumer health is harder.