Let’s say I’m a professional coin flipper. I make my money going to tournaments, flipping coins against other pros. The League’s tightly regulated; coins are provided to competitors and are always fair. But there’s no regulation on headwear. That’s where I get my edge.
While the formats vary by continent, the one universal rule is that if I flip “heads” more times than my opponent, I win. So I’m looking for headwear that most improves my chances. My current go-to is an old ballcap, but I just bought a tiara from a retired colleague that he claims to have had average success with.
I’m not one to follow superstition, though. I’d like some assurance that this tiara is better than my ballcap before switching. From the back of my collectible trading card, I find that I’ve flipped with my ballcap 10000 times, and gotten heads 4980 of those times.
Within a 95% confidence interval, I’m flipping:
p = 4980/10000 ≈ 49.8%
Z = 1.9599 (95% confidence)
E = Z * sqrt(p * (1-p) / 10000) ≈ 1%
about 49.8% ± 1%
Now I’ll test out my new tiara. I flip 100 times with it on: 44 heads. Not great, but clearly a small sample size. Let’s do another 100: 47 more, so now we’re 91/200. I spent a lot of money on this thing and I want to make sure it’s not better than my cap before throwing it away. And more data is always better than less; let me just flip it in batches of 100 to gather more data.
15 batches later… 851/1700 total flips! I knew the tiara was decent. Just slightly better than 50/50, which is 0.2% better than what I’ve done with my ballcap. (50% ± 2%)
Now, I know I’ve got wide error bounds for the results from the two pieces of headwear. I wouldn’t make the claim that the tiara is definitely better than the ballcap. But I’ve got a tournament coming up, and I’ve got to pick one. From my data, the tiara’s done better than the ballcap on average, so even if it’s not “proven” to be better, it’s more likely that the tiara’s better than the ballcap than the opposite, right? In the tournament, I seemed to be proven right. Of 10 flips, 6 were heads. Of course it’s not proof, but it’s a better performance than I would have expected from my old ballcap.
I decided I want to be systematic about this, so I visit a hat store. Starting with my tiara as my “control hat” I’ll flip a coin wearing it and then an “experimental hat,” recording the results. When one proves statistically better than the other, I’d use that one as the next control and move on to a new hat. If I make it through 1000 flips without finding a winner, I’ll just keep whichever has the higher average rate and move on.
After walking through 100 hats I ended up with a fetching beret that I’m pretty sure will win me some championships. Here’s a summary of the results:
16 times I found a hat that was statistically better (p<0.05) than the hat I was wearing. Even if each improvement was slight, I’ve got a hat that is 16 steps better than the one I started with.
60 times I tried a hat and found my existing hat was proven better. So 76 times there was a significant difference between hats! Even if 5% of those 100 are false positives, there’s still an overwhelming amount of support that my system works. It’s amazing that The League hasn’t locked down this loophole yet.
Only 24 times out of 100 was there no discernible difference between the effectiveness of the hats. Of those, I switched to the new hat 9 times because it was performing better on average, even if it wasn’t statistically significant.
I plan to repeat this set of experiments continually, leading me to eventually dominate the coin-flipping circuit and become world champion!