Estimation Culture

Estimation Culture
Butterfly Nebula NGC 6302 captured by the Hubble Wide Field Camera 3 in 2009

Reason recently published a lovely article by Davidson Heath about the history of statistical significance and estimation. I enjoyed the article very much, but I'm afraid that even for people who use the tools described in the article in their day-to-day work, the message might be a little too abstract. So I decided an illustrative example might be helpful to readers who enjoyed the article.

The Problem: Is this Inventor a Superstar?

I'm a patent lawyer who works for a large corporation that employs many inventors. Over a period of years, the same inventors may submit multiple ideas, which may turn into multiple patent applications. Each of those patent applications is examined by the United States Patent & Trademark Office, with a certain probability of being granted a patent.

Let's say I wanted to understand whether invention disclosures from particular inventors were more likely to be granted as patents if filed as patent applications. How would the "statistical significance" approach differ from the "estimation approach" to answering that question? Heath's article is compelling but abstract from how the work of answering this question would be done. So here's an explanation of how it would be done in practice to offer a solution to this problem. (The numbers are fake, chosen deliberately to emphasize the point in the article.)

Two Solutions: Statistical Significance vs. Estimation

The Scenario:

  • Company Average: Historically, 50% of invention disclosures filed as applications get granted.
  • The "Star" Inventor: You have an inventor who has filed 10 applications and had 8 granted (80% success rate).

Is this inventor actually better at getting patents granted, or did they just get lucky?

1. The Statistical Significance Approach (The "Fisher" Way)

This approach treats the problem as a yes/no test. It starts with the assumption (Null Hypothesis) that the inventor is average (50% success rate) and asks: "Is the data weird enough to disprove that?"

It calculates a p-value. If the p-value is above an arbitrary cutoff (usually 0.05), it dismisses the result.

from scipy import stats

# DATA
avg_grant_rate = 0.50
inventor_apps = 10
inventor_grants = 8

# METHOD: Binomial Test
# Null Hypothesis: Inventor's true rate is 0.50
# Question: What is the probability of getting 8 (or more) wins out of 10 by pure luck?
p_value = stats.binom_test(inventor_grants, n=inventor_apps, p=avg_grant_rate, alternative='greater')

# OUTPUT / DECISION
print(f"P-value: {p_value:.4f}") # Result is approx 0.0547

if p_value < 0.05:
    print("Result: Statistically Significant. The inventor is better.")
else:
    print("Result: Not Significant. We cannot conclude this inventor is special.")

The Practical Outcome:

Because the p-value is 0.054 (just barely above 0.05), a strict statistical significance test tells you "Not significant." You might naively then treat this inventor as average, and potentially miss out on a high-performing talent because the sample size (10 apps) was too small to satisfy the strict math of "significance." Many difficult to measure real world scenarios have this property.


2. The Estimation Approach (The "Gosset" Way)

This approach focuses on magnitude and uncertainty. It asks: "What is our best guess of this inventor's success rate, and what is the range of realistic possibilities?"

Instead of a yes/no, it gives you a probability distribution that you can use to make a business decision (Cost vs. Benefit).

import numpy as np
import statsmodels.stats.proportion as proportion

# DATA
inventor_apps = 10
inventor_grants = 8

# METHOD: Confidence Intervals & Effect Size
# What is the actual observed rate?
observed_rate = inventor_grants / inventor_apps # 0.80 (80%)

# Calculate the 95% Confidence Interval (the range of plausible true rates)
# This tells us: "We are 95% sure the inventor's true skill lies between X and Y"
conf_interval = proportion.proportion_confint(inventor_grants, inventor_apps, alpha=0.05, method='wilson')

# OUTPUT / DECISION
print(f"Estimated Success Rate: {observed_rate:.0%}")
print(f"Plausible Range (95% CI): {conf_interval[0]:.0%} to {conf_interval[1]:.0%}")

The Practical Outcome:

The output would look like:

  • Estimated Rate: 80% (30 points higher than average!)
  • Plausible Range: 49% to 94%

The Decision:

Even though the lower end of the range (49%) is technically close to the average (50%), the estimation approach shows you that the "center of gravity" for this inventor is way higher.

  • Fisher says: "Not proven. Ignore."
  • Gosset says: "Our best guess is they are 30% better than average. Even in the worst-case scenario, they are average. In the best case, they are a superstar. This is a safe bet to invest in."

Summary

  • Significance is a gatekeeper that often rejects useful signals (like your 8/10 inventor) just because the sample size is small.
  • Estimation quantifies the signal (80% success) and the noise (the confidence interval), allowing you to make a bet based on the likely return on investment.

Subscribe to symmetry, broken

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe