Introduction to scipy.stats
The scipy.stats module is one of the most powerful and versatile parts of SciPy.
It provides a full suite of tools for statistical analysis — including probability distributions, hypothesis testing, and summary statistics.
Setting Up
First, import the required modules:
import numpy as np from scipy import stats
Example 1: Summary Statistics
You can use scipy.stats to calculate key descriptive statistics such as the mean, median, and mode — all in just a few lines of code.
data = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11] mean = np.mean(data) median = np.median(data) mode = stats.mode(data, keepdims=True) print("Mean:", mean) print("Median:", median) print("Mode:", mode.mode[0], "Frequency:", mode.count[0])
In this example, we calculate the mean, median, and mode of the dataset.
Example 2: Hypothesis Testing
You can also use scipy.stats to perform hypothesis testing, such as a one-sample t-test to compare a sample mean against a known value.
# Test if the mean of data is significantly different from 5 t_stat, p_value = stats.ttest_1samp(data, 5) print("t-statistic:", t_stat) print("p-value:", p_value)
If the p-value is below 0.05, we reject the null hypothesis — meaning the sample mean is statistically different from 5.
Example 3: Probability Distributions
You can also use scipy.stats to work with probability distributions, such as generating the probability density function (PDF) for a normal curve.
x = np.linspace(-3, 3, 100) pdf = stats.norm.pdf(x, loc=0, scale=1) print("First 5 PDF values:", pdf[:5])
In this example, we generate the PDF of a normal distribution with a mean of 0 and a standard deviation of 1.
Key Takeaways
scipy.stats is your go-to toolkit for statistical analysis in Python. It provides powerful methods for:
- Summary statistics – quick descriptive insights
- Hypothesis testing – comparing groups or means
- Probability distributions – modeling and simulation
Which feature is not included in scipy.stats?
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help