how to performe anova on grouped variable in python

Solutions on MaxInterview for how to performe anova on grouped variable in python by the best coders in the world

showing results for - "how to performe anova on grouped variable in python"
Augustin
21 Jun 2018
1# load packages
2import scipy.stats as stats
3# stats f_oneway functions takes the groups as input and returns F and P-value
4fvalue, pvalue = stats.f_oneway(d['A'], d['B'], d['C'], d['D'])
5print(fvalue, pvalue)
6# 17.492810457516338 2.639241146210922e-05
7
8# get ANOVA table as R like output
9import statsmodels.api as sm
10from statsmodels.formula.api import ols
11# reshape the d dataframe suitable for statsmodels package 
12d_melt = pd.melt(d.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])
13# replace column names
14d_melt.columns = ['index', 'treatments', 'value']
15# Ordinary Least Squares (OLS) model
16model = ols('value ~ C(treatments)', data=d_melt).fit()
17anova_table = sm.stats.anova_lm(model, typ=2)
18anova_table
19
20|                |  df   | sum_sq  | mean_sq  |  F       |  PR(>F)  |
21|---------------|--------|---------|----------|----------|----------|
22| C(treatments) | 3.0    | 3010.95 | 1003.650 | 17.49281 | 0.000026 |
23| Residual      | 16.0   | 918.00  | 57.375   | NaN      | NaN      |
24
25# note: if the data is balanced (equal sample size for each group), Type 1, 2, and 3 sums of squares
26# (typ parameter) will produce similar results.
27