Why Raw Hunting Averages Lie (And How We Fix It)
Before we can ask "does rain help duck hunting?" we have to solve a harder problem: the same data that contains the answer is contaminated by a dozen things that have nothing to do with rain. Here's how every OutdoorStats analysis separates the signal from the noise.
The problem: everything is correlated with everything
Imagine you plot daily rainfall against ducks-per-hunter across 25,000 hunt days. You'd find that rainy days have better harvest. Rain helps — case closed, right?
Not even close. Rain is more common in December and January, when harvest is already higher because of migrant birds. Coastal refuges are rainier AND have different bird populations than Sacramento Valley rice country. And rainy days might fall disproportionately on Wednesdays vs. Saturdays in a given season. Any of those patterns could create a fake correlation between rain and harvest.
This is the confounder problem. If you don't account for when and where a hunt happened, you can't say anything about conditions.
The confounders we control for
Four variables dominate harvest variation and have nothing to do with weather or moon phase:
| Confounder | Why it matters | Magnitude |
|---|---|---|
| Refuge | Some areas consistently produce 3x the harvest of others | Huge |
| Week of season | Early and late season outperform mid-season by 40%+ | Large |
| Season (year) | Some years are simply better than others — flyway conditions, water levels, breeding success | Large |
| Day of week | Sundays average 0.4 fewer ducks/hunter than Saturdays — accumulated pressure from the day before | Moderate |
Together, these four factors explain the vast majority of day-to-day harvest variation. Weather and moon phase explain almost none of it — but you can only see that after removing the big stuff first.
Our approach: two-stage residuals
We strip confounders in two stages. Think of it as peeling layers off the data until only the weather signal remains.
Stage 1: Remove location and timing. Group every hunt-day by refuge, season, and week of season. Subtract the group average from each day's harvest. What's left is how each day differed from its own group — same refuge, same week, same year.
Stage 2: Remove day-of-week pressure. Take those stage-1 residuals and group them by refuge, season, and day of week. Subtract those group averages. This removes the systematic Wednesday/Saturday/Sunday pattern.
What survives both stages can't be explained by where you hunted, when in the season it was, what year it was, or what day of the week it was. Only conditions — weather, moon phase, cloud cover — remain as possible explanations.
A concrete example
Gray Lodge Wildlife Area, week 8 of the 2019 season. Three hunt days that week: Wednesday, Saturday, Sunday. The group average was 2.1 ducks per hunter.
Stage 1 asks: did each day beat or miss that 2.1? Say Wednesday came in at 2.4 (+0.3 residual), Saturday at 2.3 (+0.2), and Sunday at 1.6 (-0.5). We now have three numbers that describe how each day performed relative to its own context — not relative to some unrelated refuge three valleys over.
Stage 2 asks: Sundays at Gray Lodge in 2019 always ran about 0.4 below the weekly average. So that -0.5 residual from stage 1 becomes -0.1 after accounting for the Sunday penalty. Meanwhile, Wednesday's +0.3 might shrink to +0.15 after accounting for a slight midweek bonus.
Now compare those final residuals to the weather that day. If Wednesday was rainy and had a +0.15 residual while Saturday was dry and had +0.05, the rain signal is +0.10 — measured cleanly within the same refuge, same week, same year, and adjusted for day-of-week pressure.
Why not just use a regression?
A linear regression with dummy variables for refuge, week, and day would do something similar. We use the residual approach because it's more transparent — you can literally inspect the intermediate values and see exactly what's being subtracted at each stage. With hunting data that spans 22 seasons and 37 refuges, we prefer being able to audit the method over marginal statistical efficiency.
Guarding against false positives
Every analysis on this site tests multiple variables. Our weather analysis tested five things: precipitation, wind, temperature, moon illumination, and clear-sky full moons. Run five tests, and there's roughly a 1-in-4 chance that at least one looks significant by pure chance.
We handle this with the Bonferroni correction: divide the significance threshold by the number of tests. At five tests, a result needs a p-value below 0.01 (not the usual 0.05) to count. This is a conservative bar — it makes us more likely to miss real effects than to report fake ones. We'd rather understate a finding than overstate it.
How small an effect can we detect?
With 25,000 observations, we can reliably detect effects as small as 0.04 standard deviations — roughly +0.03 ducks per hunter per trip. That's tiny. If a weather condition had a meaningful impact on your hunting, we'd see it.
When we report that rain adds +0.06 ducks per trip, that's not an underestimate limited by sample size. It's the actual size of the signal. The data is big enough to tell us the truth, even when the truth is "this barely matters."
The short version: Raw averages are contaminated by location, timing, and pressure. We remove those layer by layer until only the signal remains. Then we set a high bar for calling anything "real." The result is conservative, auditable, and honest — even when the findings are smaller than hunters expect.
This methodology applies to all OutdoorStats analysis posts. Data: 25,040 hunt-days across 37 CDFW wildlife areas, 2003–2026. Weather from Open-Meteo ERA5 reanalysis. Moon phase via PyEphem.
See it applied: Does Weather Actually Affect Duck Hunting? · The Full Moon Myth · Interactive map of all 37 areas