Mathing Bears Getting Screwed

Let’s begin this story where many stories start; somebody posted some shit on Twitter. Specifically, Warren Sharp stirred it up with this:

Now, if you chum the waters from the back of the Orca, sometimes you’re gonna need a bigger boat. Sometimes sharks appear. So let’s be a Great White, shall we, and take a big bite out of this. Because it started to get legs with this Tanis lady picking that ball up and running with it.

Tanis goes on to say, “Either the Bears have the worst luck in the history of history, or the NFL is going out of their way to screw over the bears (which seems most likely seeing how they were not awarded the comp picks when Cunningham was hired by ATL). Every team thinks the refs are screwing their team over. However there are stats that show it’s actually happening to the Bears. On a yearly basis. And you can’t say it’s coaching/play calling, bc there have been two (technically three) different head coaches for the bears in the last 4 years. “

Can you leave that alone? I decided not to. Bears Tax. It’s all that the regs on DBB and DBB2 ever talked about for years, how the NFL hates Chicago and how the Packers get their ring rimmed constantly, esp. in the 4th Q when the fix is sent in on carrier pigeon by Goodell himself. I mean, seriously, fuck that guy. FTP. All of it.

So, let’s preface first. There was a DBB reg named Data. Or that’s what we called him. Nice guy, Jeff at DBB used to let him guest post. He used to post “statistics” as conversation starters, and that was all well and good. His idea of statistics was to open up Excel and make a pivot table, and punch some information in, then draw a conclusion from said information, and the debate was on the way. Nice guy. It had no bearing on actual stats, but it was pat on the head nice. I offered to help him with actual statistical analysis, but he shooed me away (politely) and said that he was quite happy in his life doing his tables and drawing his conclusions, and not to mess with his Wizard of Oz magical ways. So, I did not.

But a Bears tax won’t be served by a pivot table. We need stats from Yahoo and we need to tranquilize this Lion and open it’s mouth and check it’s teeth. So, in case you were wondering, we shall now engage in ACTUAL STATISTICS.

GP calculating tip at restaurant for four separate parties

Our postulation: the NFL or given crews have it out for the Chicago Bears. That a Bears Tax exists, or are these just journalists who are bad at stats doing what the Twatter was made for: stirring the pot. Getting a big wooden spoon and swirling the shit until the scent attracts real sharks. Let’s take a bite of that swordfish on the line being towed. As Khan himself said, Sharp tasks me! He tasks me, and I shall have him!

Step 1- Summary Stat

^{Bears total beneficiary yards (2022-25)}

^{_{= 652 + 530 + 794 + 748 = 2,724
Bears total games = 17 + 17 + 17 + 19 = 70
Bears yds/game (x_CHI)= 2,724 / 70 = 38.9143}}

Step 2 – We need a cross-team reference distribution, For each of the 32 teams I computed cumulative yds/game (total beneficiary yds ÷ total games over the four seasons). What do we see?

^{_{Unweighted mean x̄ = (1/n) Σ x_i = 1557.0305 / 32 = 48.6572
Weighted mean μ_w = ΣY / ΣG = 110,739 / 2,278 = 48.6124
SS deviations Σ(x_i – x̄)² = 481.9894
Sample variance s² = SS / (n-1) = 481.9894 / 31 = 15.5480
Sample SD s = √15.5480 = 3.9431}}

The two means differ by 0.04 because teams have different game counts (playoff teams play more).
Use the unweighted x̄ as the location parameter for the cross-team distribution since the SD is also computed across teams.

Step 3 – Z-Score
^{_{z = (x_CHI – x̄) / s = (38.9143 – 48.6572) / 3.9431 = -2.4709}}
Step 4 – P-values

Bears weren’t pre-specified. We’re picking them because they’re the outlier. Correcting for 32 teams (i.e., asking “what’s P[some team this extreme]”):

^{_{P(any team z ≤ -2.47) = 1 – (1 – 0.006739)^32 = 0.1946 ≈ 1 in 5
Bonferroni upper bound = 32 × 0.006739 = 0.2157}}

Step 5 – Cumulative deficit interpretation.

^{Expected Bears yds at league rate}
^{= 70 × 48.6124 = 3,402.87
Actual Bears yds = 2,724
Deficit = 678.87 yds over 4 seasons
= ~170 yds/season
= ~10 yds/game}

Step 6 – What is this test rejecting?

The cross-team SD of 3.94 includes both sampling noise and legitimate between-team variation (scheme, pace, QB style). So z = -2.47 against this SD says: “the Bears are far below where teams typically end up, including the noise of 70 games and the spread of 32 different football operations.” Rejecting at p = 0.007 means rejecting “all 32 teams have identical underlying penalty-drawing rates” — a null that was already false on inspection (Minnesota at 57.7, Bears at 38.9, both with 70 games).

It does not mean rejecting “no anti-Chicago referee bias.” So we must continue forward.

A more aggressive test using only sampling variance (treating per-game yds as iid within each team and ignoring between-team scheme effects) would give a smaller p, but the inferential gap to “bias” gets wider, not narrower, because more of the variance gets attributed to legitimate team differences.

Now we need a Bayesian wrapper on this: _{^{P(bias | data) = P(data | bias) × P(bias) / P(data)
= P(data | bias) × P(bias) / [P(data|bias)P(bias) + P(data|¬bias)P(¬bias)]}}

If your prior P(systematic anti-Chicago bias) is 5%, and you generously assume P(z ≤ -2.47 | bias) = 0.50 vs. P(z ≤ -2.47 | no bias) = 0.195 (the look-elsewhere baseline — under “no bias” we still expect some team to be the outlier ~1 year in 5):
_{^{Posterior = (0.50 × 0.05) / (0.50 × 0.05 + 0.195 × 0.95)
= 0.025 / 0.21025
= 0.119}}

So the data moves the bias hypothesis from a 5% prior to ~12% posterior. Real movement, but nothing close to “proven.”

The data is moderately suggestive, not damning, and the dominant alternative explanation — that the Bears have run an anemic offense for most of this window and bad offenses don’t draw flags — costs nothing extra in plausibility.

Oh GP, you fucker, you say, sure… Jussie was a shitbird and could not run an offense. Eberflus was so ill-regarded that other teams could see into their setups and adjust. Floose running the Cowboys D in such an inept manner showed his scheme just didn’t get breaks. But what about Caleb? What about BJ? That Tanis broad, she was on it, was she not? She points out that the Bears tax runs between regimes! Ha ha! Checkmate! Take your math and shove it you Grey Poupon eating pretentious math ass!

Well… hold on there Cowboy.

2025 only: League mean 50.47 yds/g, SD 6.83, Bears 39.37. Bears z = −1.63, one-sided p = 0.052, look-elsewhere p = 0.82. So the single season isn’t statistically remarkable — the Bears were lowest, but only by 0.68 yds/g over Buffalo. What makes 2025 important isn’t the single-season significance; it’s that it removes the easiest counter-narrative.

The asymmetry signal in 2025 is the cleaner one:

_{^{Against/g Benef/g Net/g
Bears 52.58 39.37 -13.21
Buffalo 50.26 40.05 -10.21
Denver 61.79 48.21 -13.58
Detroit 42.24 41.24 -1.00
SF 36.95 41.42 +4.47
LA Rams 34.55 51.55 +17.00}}

Yahoo has the 2025 Bears 6th in total yds/g (379.2) and 9th in PPG (25.9). I’ll grant top-10. Caleb’s 388 rushing yards is good-not-elite (Lamar’s typically 700–900) but the scramble-draws-flags mechanism is real. And the Bears threw 574 pass attempts — lots of dropbacks where DPI/holding could be called. So you’ve largely killed the “low pace / low pass volume” version of the alternative. What survives:

Scheme-specific: Ben Johnson’s offense is precise and quick-throw oriented — fewer extended plays, fewer deep shots, fewer of the situations that draw DPI and defensive holding. Detroit ran a similar offense and was 3rd lowest in 2025.
Sack rate: Bears took 24 sacks, well below average. Lower QB-hit volume = fewer roughing-the-passer chances. The Caleb-doesn’t-get-hit story works partly against the scramble-draws-flags story.
Pure noise on a one-season sample.Sorry but that’s what it is.

Tanis and her 2025 evidence kills the strongest version of the boring alternative (“Bears suck offensively, of course they don’t draw flags “). It does NOT kill the scheme-and-style version, and the look-elsewhere correction still applies to the cumulative case. If I redo the Bayes with prior 5%, P(data|bias) = 0.50, and revise P(data|no bias) down from 0.195 to maybe 0.12 (since the boring alternative is partly defanged):

_{^{Posterior = (0.50 × 0.05) / (0.50 × 0.05 + 0.12 × 0.95) = 0.180}}

You’re at ~18% posterior on bias. That’s three-and-a-half times the prior, but it’s still a minority position. The data is suggestive. To push it toward damning, the test you’d want is referee-crew breakdown — if the Bears’ deficit holds across all 17 crews, that’s hard to explain without bias; if it concentrates in three of them, you’ve got something more specific and frankly more interesting than “the league hates Chicago.” So, that is the rabbit hole we must go down in a future article. As Hippy likes to say, “where the fuck do you find this stuff?”

Per-game data with referee crews — pulled from the team-by-game pages:

nflpenalties.com/team/chicago-bears?year=2022 (and the matching &view=games_for for opponent penalties)
Same URL pattern for 2023, 2024, and 2025

These give every Bears game with the date, opponent, ref crew, count, yards — both for penalties against the Bears and for penalties against their opponents.

So, for article 1 I have tried to show that Jussie + Floose just made for bad years, and I cannot suss out a systemic bias from the data. No way some journos without a quant background can do it. Easier to chum the water than reel the shark in, isn’t it? It took a scuba tank and Quint’s old M1 Garand to finish that monster off. If you are gonna get out there in public and declare a Bears tax exists, then extraordinary claims require extraordinary proof. And so we shall venture forth looking for Part 2 Hypothesis, which we shall call the Billy Butcher hypothesis, being: “yeah, ok, but I bet some ref crews are rite cunts!” Onward we go.

Bear fans fighting refs every Sunday

Da Dos

Mathing Bears Getting Screwed

Comments

Leave a Reply Cancel reply

More posts

GP 2. Electric Boogaloo

Mathing Bears Getting Screwed

Rolling Stones’ Top 50

Tulip Mania