Skip to content

Latest commit

 

History

History
290 lines (213 loc) · 10.7 KB

File metadata and controls

290 lines (213 loc) · 10.7 KB

Understanding p-Values! A Python Guide for Data Science

Slide 1: What is a p-Value?

The p-value is a statistical measure used to assess the strength of evidence against a null hypothesis in hypothesis testing. It represents the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.

import numpy as np
from scipy import stats

# Generate random data
np.random.seed(42)
group1 = np.random.normal(100, 15, 50)
group2 = np.random.normal(105, 15, 50)

# Perform t-test
t_statistic, p_value = stats.ttest_ind(group1, group2)

print(f"T-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4f}")

Slide 2: Interpreting p-Values

A smaller p-value suggests stronger evidence against the null hypothesis. Typically, a p-value below a predetermined significance level (often 0.05) is considered statistically significant, leading to the rejection of the null hypothesis.

def interpret_p_value(p_value, alpha=0.05):
    if p_value <= alpha:
        return f"p-value ({p_value:.4f}) <= {alpha}, reject null hypothesis"
    else:
        return f"p-value ({p_value:.4f}) > {alpha}, fail to reject null hypothesis"

print(interpret_p_value(0.03))
print(interpret_p_value(0.07))

Slide 3: Calculating p-Values

p-Values are calculated using test statistics and their corresponding probability distributions. The choice of test statistic depends on the hypothesis test being performed.

import scipy.stats as stats

# Z-test example
sample_mean = 52
population_mean = 50
sample_size = 100
population_std = 10

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print(f"Z-score: {z_score:.4f}")
print(f"p-value: {p_value:.4f}")

Slide 4: p-Values and Effect Size

While p-values indicate statistical significance, they don't provide information about the magnitude of the effect. Effect size measures, such as Cohen's d, complement p-values by quantifying the strength of the relationship between variables.

def cohens_d(group1, group2):
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
    return (np.mean(group1) - np.mean(group2)) / pooled_std

effect_size = cohens_d(group1, group2)
print(f"Cohen's d: {effect_size:.4f}")

Slide 5: Common Misconceptions about p-Values

p-Values are often misinterpreted. They do not represent the probability that the null hypothesis is true, nor do they indicate the probability of replication. p-Values simply quantify the compatibility between the data and the null hypothesis.

import matplotlib.pyplot as plt

def plot_p_value_distribution(n_simulations=10000):
    p_values = [stats.ttest_ind(np.random.normal(0, 1, 30), 
                                np.random.normal(0, 1, 30))[1] 
                for _ in range(n_simulations)]
    
    plt.hist(p_values, bins=50, edgecolor='black')
    plt.title('Distribution of p-values under the null hypothesis')
    plt.xlabel('p-value')
    plt.ylabel('Frequency')
    plt.show()

plot_p_value_distribution()

Slide 6: p-Values and Sample Size

As sample size increases, the likelihood of obtaining a statistically significant result (small p-value) also increases, even for small effect sizes. This highlights the importance of considering practical significance alongside statistical significance.

def simulate_p_values(effect_size, sample_sizes, n_simulations=1000):
    results = []
    for n in sample_sizes:
        significant_count = 0
        for _ in range(n_simulations):
            group1 = np.random.normal(0, 1, n)
            group2 = np.random.normal(effect_size, 1, n)
            _, p_value = stats.ttest_ind(group1, group2)
            if p_value < 0.05:
                significant_count += 1
        results.append(significant_count / n_simulations)
    return results

sample_sizes = [20, 50, 100, 200, 500, 1000]
small_effect = simulate_p_values(0.2, sample_sizes)
large_effect = simulate_p_values(0.8, sample_sizes)

plt.plot(sample_sizes, small_effect, label='Small effect (d=0.2)')
plt.plot(sample_sizes, large_effect, label='Large effect (d=0.8)')
plt.xlabel('Sample Size')
plt.ylabel('Proportion of Significant Results')
plt.legend()
plt.title('Effect of Sample Size on Statistical Significance')
plt.show()

Slide 7: Multiple Comparisons Problem

When performing multiple hypothesis tests, the probability of obtaining at least one false positive result increases. This is known as the multiple comparisons problem, and various correction methods exist to address it.

from statsmodels.stats.multitest import multipletests

# Simulate 20 p-values
np.random.seed(42)
p_values = np.random.uniform(0, 1, 20)

# Apply Bonferroni correction
bonferroni_results = multipletests(p_values, method='bonferroni')

# Apply Benjamini-Hochberg (FDR) correction
fdr_results = multipletests(p_values, method='fdr_bh')

print("Original p-values:", p_values)
print("Bonferroni-corrected:", bonferroni_results[1])
print("FDR-corrected:", fdr_results[1])

Slide 8: p-Values in Machine Learning

In machine learning, p-values can be used for feature selection, model comparison, and assessing the significance of model coefficients. However, their interpretation should be cautious due to the often large datasets and multiple testing scenarios.

from sklearn.feature_selection import f_regression
from sklearn.datasets import make_regression

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)

# Perform F-test for feature selection
f_statistic, p_values = f_regression(X, y)

# Display features and their corresponding p-values
for i, (f, p) in enumerate(zip(f_statistic, p_values)):
    print(f"Feature {i+1}: F-statistic = {f:.4f}, p-value = {p:.4f}")

Slide 9: Alternatives to p-Values

While p-values are widely used, alternative approaches like confidence intervals, Bayesian methods, and effect sizes can provide more informative and robust statistical inferences.

def bayes_factor(t_statistic, n1, n2):
    import math
    from scipy.special import gamma

    df = n1 + n2 - 2
    r = 0.5  # Prior scale

    bf10 = math.exp(df/2 * math.log(1 + t_statistic**2/df) +
                    math.log(r) - 0.5*math.log(math.pi*df) +
                    math.lgamma((df+1)/2) - math.lgamma(df/2) -
                    (df+1)/2 * math.log(1 + r**2*t_statistic**2/df))
    
    return bf10

t_statistic = 2.5
n1, n2 = 30, 30
bf = bayes_factor(t_statistic, n1, n2)
print(f"Bayes Factor (BF10): {bf:.4f}")

Slide 10: p-Values in Reproducibility Crisis

The overreliance on p-values has contributed to the reproducibility crisis in science. p-Hacking, publication bias, and misinterpretation of p-values have led to false positive results and non-replicable findings.

def simulate_p_hacking(n_experiments=20, n_samples=30):
    significant_results = []
    for _ in range(n_experiments):
        group1 = np.random.normal(0, 1, n_samples)
        group2 = np.random.normal(0, 1, n_samples)
        _, p_value = stats.ttest_ind(group1, group2)
        if p_value < 0.05:
            significant_results.append(p_value)
            break
    return len(significant_results) > 0

n_simulations = 10000
p_hacked_results = sum(simulate_p_hacking() for _ in range(n_simulations))
print(f"Proportion of 'significant' results: {p_hacked_results/n_simulations:.4f}")

Slide 11: Best Practices for Using p-Values

To use p-values responsibly, consider preregistering studies, reporting effect sizes, using confidence intervals, and being transparent about all analyses performed. Avoid dichotomous thinking based solely on statistical significance.

import seaborn as sns

# Simulate data
np.random.seed(42)
x = np.random.normal(0, 1, 100)
y = 0.5 * x + np.random.normal(0, 1, 100)

# Calculate correlation and p-value
r, p = stats.pearsonr(x, y)

# Create scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(x=x, y=y, ci=95)
plt.title(f"Correlation: r = {r:.4f}, p-value = {p:.4f}")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

Slide 12: Real-Life Example: A/B Testing

A/B testing is a common application of hypothesis testing and p-values in product development. Consider an online educational platform testing two different layouts for their course page.

import numpy as np
from scipy import stats

# Simulate click-through rates for two layouts
np.random.seed(42)
layout_a = np.random.binomial(1, 0.12, 1000)  # 12% CTR
layout_b = np.random.binomial(1, 0.15, 1000)  # 15% CTR

# Perform chi-square test
contingency_table = np.array([[sum(layout_a), len(layout_a) - sum(layout_a)],
                              [sum(layout_b), len(layout_b) - sum(layout_b)]])
chi2, p_value, _, _ = stats.chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Layout A CTR: {sum(layout_a)/len(layout_a):.4f}")
print(f"Layout B CTR: {sum(layout_b)/len(layout_b):.4f}")

Slide 13: Real-Life Example: Environmental Science

Researchers studying the impact of a new waste management system on local air quality can use p-values to determine if there's a significant difference in pollution levels before and after implementation.

import numpy as np
from scipy import stats

# Simulate air quality index (AQI) before and after implementation
np.random.seed(42)
aqi_before = np.random.normal(100, 15, 50)  # Mean AQI of 100
aqi_after = np.random.normal(90, 15, 50)   # Mean AQI of 90

# Perform paired t-test
t_statistic, p_value = stats.ttest_rel(aqi_before, aqi_after)

print(f"T-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Mean AQI before: {np.mean(aqi_before):.2f}")
print(f"Mean AQI after: {np.mean(aqi_after):.2f}")

Slide 14: Additional Resources

For those interested in deepening their understanding of p-values and statistical inference, the following resources are recommended:

  1. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129-133. ArXiv: https://arxiv.org/abs/1603.00505
  2. Greenland, S., et al. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350. ArXiv: https://arxiv.org/abs/1603.07532
  3. Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. ArXiv: https://arxiv.org/abs/1301.3718

These papers provide in-depth discussions on the proper interpretation and use of p-values in scientific research.