python anova

Solutions on MaxInterview for python anova by the best coders in the world

showing results for - "python anova"
Léandre
07 Jan 2019
1# load packages
2import scipy.stats as stats
3# stats f_oneway functions takes the groups as input and returns F and P-value
4fvalue, pvalue = stats.f_oneway(d['A'], d['B'], d['C'], d['D'])
5print(fvalue, pvalue)
6# 17.492810457516338 2.639241146210922e-05
7
8# get ANOVA table as R like output
9import statsmodels.api as sm
10from statsmodels.formula.api import ols
11# reshape the d dataframe suitable for statsmodels package 
12d_melt = pd.melt(d.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])
13# replace column names
14d_melt.columns = ['index', 'treatments', 'value']
15# Ordinary Least Squares (OLS) model
16model = ols('value ~ C(treatments)', data=d_melt).fit()
17anova_table = sm.stats.anova_lm(model, typ=2)
18anova_table
19
20|                |  df   | sum_sq  | mean_sq  |  F       |  PR(>F)  |
21|---------------|--------|---------|----------|----------|----------|
22| C(treatments) | 3.0    | 3010.95 | 1003.650 | 17.49281 | 0.000026 |
23| Residual      | 16.0   | 918.00  | 57.375   | NaN      | NaN      |
24
25# note: if the data is balanced (equal sample size for each group), Type 1, 2, and 3 sums of squares
26# (typ parameter) will produce similar results.
27
Andrea
31 Sep 2016
1# I am using Python 3
2# load packages
3import pandas as pd
4# load data file
5d = pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/onewayanova.txt", sep="\t")
6# generate a boxplot to see the data distribution by treatments. Using boxplot, we can easily detect the differences 
7# between different treatments
8d.boxplot(column=['A', 'B', 'C', 'D'], grid=False)
9