GP: the 3rd Dimension is Terror

You will want to read Pt. 1 and 2 before reading this article.

Aggregating the data from sources is a big PITA and so something has to go, and that is me summarizing too much. What I will say is that Pt. 1 looks at the high level deficit and see how many standard deviations we were off from other teams. Pt. 2 is looking at crews and seeing if all crews or some crews contributed to the Bears being bad.

Last article, I mentioned that Ben Johnson talked last camp (his 1st camp) about the culture and all the talk was how he was gonna be the Caleb whisperer, and for the most part, Caleb became a better QB under BJ. But that the teams’s OL woes kepts at him, and how he didn’t implement his full suite of tricks, formations and isolated mismatches. And that the Bears were, again, the lowest team in the league at getting flags thrown on their opponents. What we found was contra-examples of ref crews who seemed to be pro-Bears, but many krews called few penalties against Bears oppo.

My speculation is that for many years, Nagy and then Floose – Jussie back there – it was amazing to watch them even FIELD an offense. When a hobbled Darnell Mooney is your deep threat, and your OL has Jussie running for his life exactly 2 seconds after each snap, yeah, you ain’t getting a lot of DPI. You need a pocket to begin with to get roughing the passer.

I also stated that, organization wide, the Bears were a historically inept organization. Kevin Byard, on the Thanksgiving day meltdown, threw a water bottle at Floose’s head when he tried to address the team after the game. A bad analyst would state that Floose lost the team that day. A good analyst would state that Floose never had the team to begin with. Nobody bought in on that guy. Did you see Hard Knocks? Caleb is openly rolling his eyes having to listen to that guy, that Poles asked the HBO crew for shots of Floose looking thoughtfully at playbooks. As if.

So, to really bury this thing, we need some things. Bears against per game / crew non-Bears baseline: 1.00x
Bears benefit per game / crew non-Bears baseline: 0.79x

The Bears get penalized at exactly the league-average rate by these same crews. They draw 21% fewer flags on their opponents than those same crews call on other teams’ opponents. The “flag-prone Bears” hypothesis is dead. The deficit lives entirely on the benefit side.

Per-crew breakdown — sorted by Ben/Base ratio (lowest = most asymmetric against Bears):

_{^{Crew BG Crew_G Baseline Agst/g Ben/g Agst/x Ben/x
Scott Novak 5 66 46.06 56.40 21.00 1.22 0.46
Ron Torbert 4 69 51.30 54.50 23.75 1.06 0.46
Brad Rogers 6 65 53.36 46.17 29.17 0.87 0.55
John Hussey 4 67 46.87 55.25 27.50 1.18 0.59
Tra Blake 3 48 50.14 45.67 30.00 0.91 0.60
Alex Kemp 2 69 51.28 90.50 32.00 1.76 0.62
Shawn Hochuli 2 68 52.95 32.00 37.50 0.60 0.71
Alex Moore 2 16 64.86 61.50 47.50 0.95 0.73
Carl Cheffers 6 68 49.61 41.50 38.33 0.84 0.77
Clete Blakeman 5 68 50.88 49.80 40.80 0.98 0.80
Bill Vinovich 2 69 44.81 39.00 39.50 0.87 0.88
Adrian Hill 7 65 52.91 37.29 48.71 0.70 0.92
Clay Martin 5 68 48.10 60.00 45.80 1.25 0.95
Shawn Smith 2 68 47.34 67.00 47.50 1.42 1.00
Brad Allen 5 66 43.83 22.60 45.00 0.52 1.03
Land Clark 4 66 46.73 55.25 49.50 1.18 1.06
Alan Eck 3 48 45.01 79.00 48.00 1.76 1.07
Craig Wrolstad 3 69 47.61 42.67 56.67 0.90 1.19}}

Sign tests on the new metrics
_{^{Bears Agst > crew baseline: 8 of 18 crews (p = 0.76) ← right at chance
Bears Ben < crew baseline: 13 of 18 crews (p = 0.048) ← directional}}

This is the real finding.

What the test was supposed to distinguish:

Flag-prone Bears → Agst > 1, Ben ≈ 1 (both teams take normal flags from these crews; Bears just commit more)
Anti-Bears bias → Agst > 1 AND Ben < 1 (asymmetric on both sides)
Bears get fewer opponent flags specifically → Agst ≈ 1, Ben < 1 (asymmetric only on the benefit side)

The data shows the third pattern, decisively. The Bears commit penalties at the same rate every other team does under these crews. The deficit is entirely about opponents not getting flagged when playing them. That kills the lazy “Bears just take a lot of flags” story — it’s empirically wrong.

Crews where the asymmetry is strongest (sample size matters):

Scott Novak (5 games): 0.46x benefit. His non-Bears games average 46 yds against per team; his Bears games give Chicago 21 yds of opponent calls per game.
Brad Rogers (6 games): 0.55x benefit. Calls fewer opponent penalties on Bears’ opponents than on others’ opponents.
John Hussey (4 games): 0.59x benefit, 1.18x against — the only crew showing the textbook two-sided pattern at meaningful sample size.
Adrian Hill (7 games): 0.92x benefit but only 0.70x against — actually treats Bears slightly favorably on the benefit side, and unusually leniently on the penalty side.
Brad Allen (5 games, retired after ’23): 1.03x benefit, 0.52x against — he was actively the most pro-Bears crew in the league before he left.

What this means for the bias hypothesis.

With the flag-prone-team alternative now empirically dead, the surviving non-bias explanations are scheme/personnel:

Bears defense doesn’t induce opponent penalties — Bears don’t pressure QBs in ways that trigger holding/false-start rates from opposing OLs. Possible – if you have a quiet front seven. Did the Bears have a poor front seven these years? Ask Waffle.
Bears offense doesn’t draw DPI — fewer deep shots or contested catches, so opposing DBs aren’t put in flag-drawing situations. Plausible for Fields/Bagent/Caleb-rookie eras; harder to credit for Ben Johnson’s 2025 attack.
Bears don’t play teams that commit a lot of penalties — schedule effect. Plausible but the NFC North includes Minnesota, the most-flagged opponent in the league. So let’s toss that one.

Can you give a bitch a break?

Sure. The 0.79x aggregate is striking enough that some combination of those factors has to be doing real work, OR there’s a genuine officiating tilt. Updated Bayesian: with prior 5%, P(asymmetric pattern | bias) ≈ 0.6, P(asymmetric pattern | scheme effects) ≈ 0.20:

_{^{Posterior = (0.6 × 0.05) / (0.6 × 0.05 + 0.20 × 0.95)
= 0.03 / 0.22 = 0.136}}

You have names now, you can hold onto “the Bears specifically don’t draw flags on their opponents when these particular crews are calling the game,” and that’s a much more salty question. But what you cannot do is show it’s anything more than those crews having a bias towards their back judges, and looking for DPI. We know that the NFL keeps changing what it wants with regards to DPI and how each crew is choosing to implement that across all games is probably the story. Not that the Bears get fucked.

Did we sink the Orca? Fuck no. What would the NFL be without third rate journos chumming the water for cheap engagement? Making wild claims that analysis does not support?

Roughly 14% on bias — modest update, because scheme effects remain a credible explanation for opponents-only deficits. And that’s where I fall. I fall on the team having a shit OL, an anemic DL; the swing is not across all crews. A WIDE variation exists, but when you put in the fix, maybe Goodell only bribes a few crews. If you want to hang on to the anti-Bears argument with your fingernails, you have enough data to die on that hill. But I don’t think so.

The next test, which I am not doing because now this is getting to be real work, would be against-side breakdowns by penalty type. If Bears’ deficit in opponent DPI calls specifically tracks with their offensive air-yards profile, scheme wins. If it doesn’t track — if Bears get short-changed on garden-variety opponent holding and false starts that aren’t scheme-dependent — then bias gets harder to explain away. I am pretty certain that were I to dig further into that hole, I would not find that.

Well, analysis done. Nothing stands out, esp. when you compare the crews to how they behave across the league. There is no there there. Would it be fun if there was? Maybe.

But I prefer the narrative that George is just a massive bonehead and good coaches just didn’t want to come here until Poles realized he was on his way out unless he convinced them to open the checkbook for a Ben Johnson. Which he did. He used Warren to tie to the move. It was savvy politics. Bears need to be a playoff team to help the case w. the Illinois legislature. Being an inept doormat had run it’s course. If they were competent they would have hired real head coaches instead of Trestman and Nagy. And now Ben Johnson is working towards a team that CAN draw DPI and false starts. Well, more on the DPI, not investing anything in the front seven is going to do nothing for drawing false starts.

Conclusion:

I made my arguments, showed that there is a statistical residual effect that crap journos can point to, but there are many other contradictory truths here that are more suggestive of a deeper problem that NFL officating has, which is that krews are not consistently applying standards to DPI. Is this news, really? Because any true football fan has known this for years. Let me wrap up the findings then, with the risk-reward ratio. If that does not dispel the story then nothing will and you may want to join Irish at breaking into Area 51 because conspiracy is just your groove dude, get a Q-Anon tshirt.

Here is what you have to believe to think this is a real directive: The NFL risk – if any of these krews,made up of SEVEN men plus a dedicated reply official tied to that crew in New York – if any of those guys get a hankering to talk to the Guardian or Intercept about this, you blow the league up. Goodell better have a Deadpool class assassin at the ready since fucking a major market like Chicago will gain you all kinds of 2nd rate journalists picking up the story once the first-rate ones open that gate.

The reward: One flag per game. Either holding on the oppo OL or OPI on their wideouts. You have to imagine a conversation that singles out 10 of 15 crews, because the NFL psych profile won’t risk it on the other five, and you say to them “don’t call MORE penalties on the Bears, just DON’T call one penalty on their opponents that you otherwise would have. Don’t go more than one, or the stats will start to look too bad, just one flag and then go back to being a fair ref. Maybe you just need the line judge or the back judge and not the whole crew. You have heard of point shaving, this is penalty shaving.

What does that buy you in terms of W-L? A single flag? Almost nothing. It’s so improbable to change a game the math isn’t worth showing. So, you risk the state of the whole league to just, perhaps, do a 2nd order point shave? Wouldn’t it be easier to just Art Schliester that shit and get to a QB who needs money? Or a RB? Just about any player is a better risk than a ref, who isn’t paid that well, may be a lawyer himself in real life, and is likely to be much more savvy about leaking the story? Or can we just say that the Bears didn’t draw holding because the DL was not that good at getting sacks? Or the Bears OL didn’t get DPI because for alot of that time their QB was basically another running back? We live in insane times where all manner of insane conspiracies get chummed. It’s sad that it’s coming from people who hold themselves out as experts, because their “expertise” is shit.

Da Dos

GP: the 3rd Dimension is Terror

Comments