Statistics
ns stats
(:require [fastmath.stats :as stats]
(
:as codox])) [fastmath.dev.codox
Reference
fastmath.stats
Statistics functions.
- Descriptive statistics.
- Correlation / covariance
- Outliers
- Confidence intervals
- Extents
- Effect size
- Tests
- Histogram
- ACF/PACF
- Bootstrap (see
fastmath.stats.bootstrap
) - Binary measures
Functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.
##### Descriptive statistics
All in one function stats-map contains:
:Size
- size of the samples,(count ...)
:Min
- minimum value:Max
- maximum value:Range
- range of values:Mean
- mean/average:Median
- median, see also: median-3:Mode
- mode, see also: modes:Q1
- first quartile, use: percentile, quartile:Q3
- third quartile, use: percentile, quartile:Total
- sum of all samples:SD
- sample standard deviation:Variance
- variance:MAD
- median-absolute-deviation:SEM
- standard error of mean:LAV
- lower adjacent value, use: adjacent-values:UAV
- upper adjacent value, use: adjacent-values:IQR
- interquartile range,(- q3 q1)
:LOF
- lower outer fence,(- q1 (* 3.0 iqr))
:UOF
- upper outer fence,(+ q3 (* 3.0 iqr))
:LIF
- lower inner fence,(- q1 (* 1.5 iqr))
:UIF
- upper inner fence,(+ q3 (* 1.5 iqr))
:Outliers
- list of outliers, samples which are outside outer fences:Kurtosis
- kurtosis:Skewness
- skewness
Note: percentile and quartile can have 10 different interpolation strategies. See docs
->confusion-matrix
(->confusion-matrix tp fn fp tn)
(->confusion-matrix confusion-matrix)
(->confusion-matrix actual prediction)
(->confusion-matrix actual prediction encode-true)
Convert input to confusion matrix
L0
Count equal values in both seqs. Same as count==
L1
(L1 [vs1 vs2-or-val])
(L1 vs1 vs2-or-val)
Manhattan distance
L2
(L2 [vs1 vs2-or-val])
(L2 vs1 vs2-or-val)
Euclidean distance
L2sq
(L2sq [vs1 vs2-or-val])
(L2sq vs1 vs2-or-val)
Squared euclidean distance
LInf
(LInf [vs1 vs2-or-val])
(LInf vs1 vs2-or-val)
Chebyshev distance
acf
(acf data)
(acf data lags)
Calculate acf (autocorrelation function) for given number of lags or a list of lags.
If lags is omitted function returns maximum possible number of lags.
acf-ci
(acf-ci data)
(acf-ci data lags)
(acf-ci data lags alpha)
acf with added confidence interval data.
:cis
contains list of calculated ci for every lag.
ad-test-one-sample
(ad-test-one-sample xs)
(ad-test-one-sample xs distribution-or-ys)
(ad-test-one-sample xs distribution-or-ys {:keys [sides kernel bandwidth], :or {sides :one-sided-greater, kernel :gaussian}})
Anderson-Darling test
adjacent-values
(adjacent-values vs)
(adjacent-values vs estimation-strategy)
(adjacent-values vs q1 q3 m)
Lower and upper adjacent values (LAV and UAV).
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LAV is smallest value which is greater or equal to the LIF =
(- Q1 (* 1.5 IQR))
. - UAV is largest value which is lower or equal to the UIF =
(+ Q3 (* 1.5 IQR))
. - third value is a median of samples
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
ameasure
(ameasure [group1 group2])
(ameasure group1 group2)
Vargha-Delaney A measure for two populations a and b
binary-measures
(binary-measures tp fn fp tn)
(binary-measures confusion-matrix)
(binary-measures actual prediction)
(binary-measures actual prediction true-value)
Subset of binary measures. See binary-measures-all.
Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalence]
binary-measures-all
(binary-measures-all tp fn fp tn)
(binary-measures-all confusion-matrix)
(binary-measures-all actual prediction)
(binary-measures-all actual prediction true-value)
Collection of binary measures.
Arguments: * confusion-matrix
- either map or sequence with [:tp :fn :fp :tn]
values
or
actual
- list of ground truth valuesprediction
- list of predicted valuestrue-value
- optional, true/false encoding, what is true intruth
andprediction
true-value
can be one of:
nil
- values are treating as booleans- any sequence - values from sequence will be treated as
true
- map - conversion will be done according to provided map (if there is no correspondin key, value is treated as
false
) - any predicate
https://en.wikipedia.org/wiki/Precision_and_recall
binomial-ci
(binomial-ci number-of-successes number-of-trials)
(binomial-ci number-of-successes number-of-trials method)
(binomial-ci number-of-successes number-of-trials method alpha)
Return confidence interval for a binomial distribution.
Possible methods are: * :asymptotic
(normal aproximation, based on central limit theorem), default * :agresti-coull
* :clopper-pearson
* :wilson
* :prop.test
- one sample proportion test * :cloglog
* :logit
* :probit
* :arcsine
* :all
- apply all methods and return a map of triplets
Default alpha is 0.05
Returns a triple [lower ci, upper ci, p=successes/trials]
binomial-ci-methods
binomial-test
(binomial-test xs)
(binomial-test xs maybe-params)
(binomial-test number-of-successes number-of-trials {:keys [alpha p ci-method sides], :or {alpha 0.05, p 0.5, ci-method :asymptotic, sides :two-sided}})
Binomial test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
(default),:one-sided-less
(short::one-sided
) or:one-sided-greater
ci-method
- see binomial-ci-methodsp
- tested probability
bootstrap DEPRECATED
Deprecated: Please use fastmath.stats.bootstrap/bootstrap instead
(bootstrap vs)
(bootstrap vs samples)
(bootstrap vs samples size)
Generate set of samples of given size from provided data.
Default samples
is 200, number of size
defaults to sample size.
bootstrap-ci DEPRECATED
Deprecated: Please use fastmath.stats.boostrap/ci-basic instead
(bootstrap-ci vs)
(bootstrap-ci vs alpha)
(bootstrap-ci vs alpha samples)
(bootstrap-ci vs alpha samples stat-fn)
Bootstrap method to calculate confidence interval.
Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.
Returns ci and statistical function value.
brown-forsythe-test
(brown-forsythe-test xss)
(brown-forsythe-test xss params)
chisq-test
(chisq-test contingency-table-or-xs)
(chisq-test contingency-table-or-xs params)
Chi square test, a power divergence test for lambda
1.0
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
ci
(ci vs)
(ci vs alpha)
T-student based confidence interval for given data. Alpha value defaults to 0.05.
Last value is mean.
cliffs-delta
(cliffs-delta [group1 group2])
(cliffs-delta group1 group2)
Cliff’s delta effect size for ordinal data.
coefficient-matrix
(coefficient-matrix vss)
(coefficient-matrix vss measure-fn)
(coefficient-matrix vss measure-fn symmetric?)
Generate coefficient (correlation, covariance, any two arg function) matrix from seq of seqs. Row order.
Default method: pearson-correlation
cohens-d
(cohens-d [group1 group2])
(cohens-d group1 group2)
(cohens-d group1 group2 method)
Cohen’s d effect size for two groups
cohens-d-corrected
(cohens-d-corrected [group1 group2])
(cohens-d-corrected group1 group2)
(cohens-d-corrected group1 group2 method)
Cohen’s d corrected for small group size
cohens-f
(cohens-f [group1 group2])
(cohens-f group1 group2)
(cohens-f group1 group2 type)
Cohens f, sqrt of Cohens f2.
Possible type
values are: :eta
(default), :omega
and :epsilon
.
cohens-f2
(cohens-f2 [group1 group2])
(cohens-f2 group1 group2)
(cohens-f2 group1 group2 type)
Cohens f2, by default based on eta-sq
.
Possible type
values are: :eta
(default), :omega
and :epsilon
.
cohens-kappa
(cohens-kappa group1 group2)
(cohens-kappa contingency-table)
Cohens kappa
cohens-q
(cohens-q r1 r2)
(cohens-q group1 group2a group2b)
(cohens-q group1a group2a group1b group2b)
Comparison of two correlations.
Arity:
- 2 - compare two correlation values
- 3 - compare correlation of
group1
andgroup2a
with correlation ofgroup1
andgroup2b
- 4 - compare correlation of first two arguments with correlation of last two arguments
cohens-u2
(cohens-u2 [group1 group2])
(cohens-u2 group1 group2)
(cohens-u2 group1 group2 estimation-strategy)
Cohen’s U2, the proportion of one of the groups that exceeds the same proportion in the other group.
cohens-u3
(cohens-u3 [group1 group2])
(cohens-u3 group1 group2)
(cohens-u3 group1 group2 estimation-strategy)
Cohen’s U3, the proportion of the second group that is smaller than the median of the first group.
cohens-w
(cohens-w group1 group2)
(cohens-w contingency-table)
Cohen’s W effect size for discrete data.
contingency-2x2-measures
(contingency-2x2-measures & args)
contingency-2x2-measures-all
(contingency-2x2-measures-all a b c d)
(contingency-2x2-measures-all map-or-seq)
(contingency-2x2-measures-all [a b] [c d])
contingency-table
(contingency-table & seqs)
Returns frequencies map of tuples built from seqs.
contingency-table->marginals
(contingency-table->marginals ct)
correlation
(correlation [vs1 vs2])
(correlation vs1 vs2)
Correlation of two sequences.
correlation-matrix
(correlation-matrix vss)
(correlation-matrix vss measure)
Generate correlation matrix from seq of seqs. Row order.
Possible measures: :pearson
(default), :kendall
, :spearman
.
count=
(count= [vs1 vs2-or-val])
(count= vs1 vs2-or-val)
Count equal values in both seqs. Same as L0
covariance
(covariance [vs1 vs2])
(covariance vs1 vs2)
Covariance of two sequences.
covariance-matrix
(covariance-matrix vss)
Generate covariance matrix from seq of seqs. Row order.
cramers-c
(cramers-c group1 group2)
(cramers-c contingency-table)
Cramer’s C effect size for discrete data.
cramers-v
(cramers-v group1 group2)
(cramers-v contingency-table)
Cramer’s V effect size for discrete data.
cramers-v-corrected
(cramers-v-corrected group1 group2)
(cramers-v-corrected contingency-table)
Corrected Cramer’s V
cressie-read-test
(cressie-read-test contingency-table-or-xs)
(cressie-read-test contingency-table-or-xs params)
Cressie-Read test, a power divergence test for lambda
2/3
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
demean
(demean vs)
Subtract mean from sequence
dissimilarity
(dissimilarity method P-observed Q-expected)
(dissimilarity method P-observed Q-expected {:keys [bins probabilities? epsilon log-base power remove-zeros?], :or {probabilities? true, epsilon 1.0E-6, log-base m/E, power 2.0}})
Various PDF distance between two histograms (frequencies) or probabilities.
Q can be a distribution object. Then, histogram will be created out of P.
Arguments:
method
- distance methodP-observed
- frequencies, probabilities or actual data (when Q is a distribution of:bins
is set)Q-expected
- frequencies, probabilities or distribution object (when P is a data or:bins
is set)
Options:
:probabilities?
- should P/Q be converted to a probabilities, default:true
.:epsilon
- small number which replaces0.0
when division or logarithm is used`:log-base
- base for logarithms, default:e
:power
- exponent for:minkowski
distance, default:2.0
:bins
- number of bins or bins estimation method, see histogram.
The list of methods: :euclidean
, :city-block
, :manhattan
, :chebyshev
, :minkowski
, :sorensen
, :gower
, :soergel
, :kulczynski
, :canberra
, :lorentzian
, :non-intersection
, :wave-hedges
, :czekanowski
, :motyka
, :tanimoto
, :jaccard
, :dice
, :bhattacharyya
, :hellinger
, :matusita
, :squared-chord
, :euclidean-sq
, :squared-euclidean
, :pearson-chisq
, :chisq
, :neyman-chisq
, :squared-chisq
, :symmetric-chisq
, :divergence
, :clark
, :additive-symmetric-chisq
, :kullback-leibler
, :jeffreys
, :k-divergence
, :topsoe
, :jensen-shannon
, :jensen-difference
, :taneja
, :kumar-johnson
, :avg
See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha
durbin-watson
(durbin-watson rs)
Lag-1 Autocorrelation test for residuals
epsilon-sq
(epsilon-sq [group1 group2])
(epsilon-sq group1 group2)
Less biased R2
estimate-bins
(estimate-bins vs)
(estimate-bins vs bins-or-estimate-method)
Estimate number of bins for histogram.
Possible methods are: :sqrt
:sturges
:rice
:doane
:scott
:freedman-diaconis
(default).
The number returned is not higher than number of samples.
estimation-strategies-list
List of estimation strategies for percentile/quantile functions.
eta-sq
(eta-sq [group1 group2])
(eta-sq group1 group2)
R2, coefficient of determination
extent
(extent vs)
Return extent (min, max, mean) values from sequence
f-test
(f-test xs ys)
(f-test xs ys {:keys [sides alpha], :or {sides :two-sided, alpha 0.05}})
Variance F-test of two samples.
alpha
- significance level (default:0.05
)sides
- one of::two-sided
(default),:one-sided-less
(short::one-sided
) or:one-sided-greater
fligner-killeen-test
(fligner-killeen-test xss)
(fligner-killeen-test xss {:keys [sides], :or {sides :one-sided-greater}})
freeman-tukey-test
(freeman-tukey-test contingency-table-or-xs)
(freeman-tukey-test contingency-table-or-xs params)
Freeman-Tukey test, a power divergence test for lambda
-0.5
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
geomean
(geomean vs)
(geomean vs weights)
Geometric mean for positive values only with optional weights
glass-delta
(glass-delta [group1 group2])
(glass-delta group1 group2)
Glass’s delta effect size for two groups
harmean
(harmean vs)
(harmean vs weights)
Harmonic mean with optional weights
hedges-g
(hedges-g [group1 group2])
(hedges-g group1 group2)
Hedges’s g effect size for two groups
hedges-g*
(hedges-g* [group1 group2])
(hedges-g* group1 group2)
Less biased Hedges’s g effect size for two groups, J term correction.
hedges-g-corrected
(hedges-g-corrected [group1 group2])
(hedges-g-corrected group1 group2)
Cohen’s d corrected for small group size
histogram
(histogram vs)
(histogram vs bins-or-estimate-method)
(histogram vs bins-or-estimate-method [mn mx])
(histogram vs bins-or-estimate-method mn mx)
Calculate histogram.
Estimation method can be a number, named method: :sqrt
:sturges
:rice
:doane
:scott
:freedman-diaconis
(default) or a sequence of points used as intervals. In the latter case or when mn
and mx
values are provided - data will be filtered to fit in desired interval(s).
Returns map with keys:
:size
- number of bins:step
- average distance between bins:bins
- seq of pairs of range lower value and number of elements:min
- min value:max
- max value:samples
- number of used samples:frequencies
- a map containing counts for bin’s average:intervals
- intervals used to create bins:bins-maps
- seq of maps containing::min
- lower bound:max
- upper bound:step
- actual distance between bins:count
- number of elements:avg
- average value:probability
- probability for bin
If difference between min and max values is 0
, number of bins is set to 1.
hpdi-extent
(hpdi-extent vs)
(hpdi-extent vs size)
Higher Posterior Density interval + median.
size
parameter is the target probability content of the interval.
inner-fence-extent
(inner-fence-extent vs)
(inner-fence-extent vs estimation-strategy)
Returns LIF, UIF and median
iqr
(iqr vs)
(iqr vs estimation-strategy)
Interquartile range.
jarque-bera-test
(jarque-bera-test xs)
(jarque-bera-test xs params)
(jarque-bera-test xs skew kurt {:keys [sides], :or {sides :one-sided-greater}})
Goodness of fit test whether skewness and kurtosis of data match normal distribution
jensen-shannon-divergence DEPRECATED
Deprecated: Use dissimilarity.
(jensen-shannon-divergence [vs1 vs2])
(jensen-shannon-divergence vs1 vs2)
Jensen-Shannon divergence of two sequences.
kendall-correlation
(kendall-correlation [vs1 vs2])
(kendall-correlation vs1 vs2)
Kendall’s correlation of two sequences.
kruskal-test
(kruskal-test xss)
(kruskal-test xss {:keys [sides], :or {sides :right}})
Kruskal-Wallis rank sum test.
ks-test-one-sample
(ks-test-one-sample xs)
(ks-test-one-sample xs distribution-or-ys)
(ks-test-one-sample xs distribution-or-ys {:keys [sides kernel bandwidth distinct?], :or {sides :two-sided, kernel :gaussian, distinct? true}})
One sample Kolmogorov-Smirnov test
ks-test-two-samples
(ks-test-two-samples xs ys)
(ks-test-two-samples xs ys {:keys [sides distinct?], :or {sides :two-sided, distinct? true}})
Two samples Kolmogorov-Smirnov test
kullback-leibler-divergence DEPRECATED
Deprecated: Use dissimilarity.
(kullback-leibler-divergence [vs1 vs2])
(kullback-leibler-divergence vs1 vs2)
Kullback-Leibler divergence of two sequences.
kurtosis
(kurtosis vs)
(kurtosis vs typ)
Calculate kurtosis from sequence.
Possible typs: :G2
(default), :g2
(or :excess
), :geary
, ,:crow
, :moors
, :hogg
or :kurt
.
kurtosis-test
(kurtosis-test xs)
(kurtosis-test xs params)
(kurtosis-test xs kurt {:keys [sides type], :or {sides :two-sided, type :kurt}})
Normality test for kurtosis
levene-test
(levene-test xss)
(levene-test xss {:keys [sides statistic scorediff], :or {sides :one-sided-greater, statistic mean, scorediff abs}})
mad
Alias for median-absolute-deviation
mad-extent
(mad-extent vs)
-/+ median-absolute-deviation and median
mae
(mae [vs1 vs2-or-val])
(mae vs1 vs2-or-val)
Mean absolute error
mape
(mape [vs1 vs2-or-val])
(mape vs1 vs2-or-val)
Mean absolute percentage error
maximum
(maximum vs)
Maximum value from sequence.
mcc
(mcc group1 group2)
(mcc ct)
Matthews correlation coefficient also known as phi coefficient.
me
(me [vs1 vs2-or-val])
(me vs1 vs2-or-val)
Mean error
mean
(mean vs)
(mean vs weights)
Calculate mean of vs
with optional weights
.
mean-absolute-deviation
(mean-absolute-deviation vs)
(mean-absolute-deviation vs center)
Calculate mean absolute deviation
means-ratio
(means-ratio [group1 group2])
(means-ratio group1 group2)
(means-ratio group1 group2 adjusted?)
Means ratio
means-ratio-corrected
(means-ratio-corrected [group1 group2])
(means-ratio-corrected group1 group2)
Bias correced means ratio
median
(median vs estimation-strategy)
(median vs)
Calculate median of vs
. See median-3.
median-3
(median-3 a b c)
Median of three values. See median.
median-absolute-deviation
(median-absolute-deviation vs)
(median-absolute-deviation vs center)
(median-absolute-deviation vs center estimation-strategy)
Calculate MAD
minimum
(minimum vs)
Minimum value from sequence.
minimum-discrimination-information-test
(minimum-discrimination-information-test contingency-table-or-xs)
(minimum-discrimination-information-test contingency-table-or-xs params)
Minimum discrimination information test, a power divergence test for lambda
-1.0
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
mode
(mode vs method)
(mode vs method opts)
(mode vs)
Find the value that appears most often in a dataset vs
.
For sample from continuous distribution, three algorithms are possible: * :histogram
- calculated from histogram * :kde
- calculated from KDE * :pearson
- mode = mean-3(median-mean) * :default
- discrete mode
Histogram accepts optional :bins
(see histogram). KDE method accepts :kde
for kernel name (default :gaussian
) and :bandwidth
(auto). Pearson can accept :estimation-strategy
for median.
See also modes.
modes
(modes vs method)
(modes vs method opts)
(modes vs)
Find the values that appears most often in a dataset vs
.
Returns sequence with all most appearing values in increasing order.
See also mode.
modified-power-transformation
(modified-power-transformation xs)
(modified-power-transformation xs lambda)
(modified-power-transformation xs lambda alpha)
Modified power transformation (Box-Cox transformation) of data.
There is no scaling by geometric mean.
Arguments: * lambda
- power parameter (default: 0.0) * alpha
- shift parameter (optional)
moment
(moment vs)
(moment vs order)
(moment vs order {:keys [absolute? center mean? normalize?], :or {mean? true}})
Calculate moment (central or/and absolute) of given order (default: 2).
Additional parameters as a map:
:absolute?
- calculate sum as absolute values (default:false
):mean?
- returns mean (proper moment) or just sum of differences (default:true
):center
- value of center (default:nil
= mean):normalize?
- apply normalization by standard deviation to the order power
mse
(mse [vs1 vs2-or-val])
(mse vs1 vs2-or-val)
Mean squared error
multinomial-likelihood-ratio-test
(multinomial-likelihood-ratio-test contingency-table-or-xs)
(multinomial-likelihood-ratio-test contingency-table-or-xs params)
Multinomial likelihood ratio test, a power divergence test for lambda
0.0
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
neyman-modified-chisq-test
(neyman-modified-chisq-test contingency-table-or-xs)
(neyman-modified-chisq-test contingency-table-or-xs params)
Neyman modifield chi square test, a power divergence test for lambda
-2.0
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
normality-test
(normality-test xs)
(normality-test xs params)
(normality-test xs skew kurt {:keys [sides], :or {sides :one-sided-greater}})
Normality test based on skewness and kurtosis
omega-sq
(omega-sq [group1 group2])
(omega-sq group1 group2)
(omega-sq group1 group2 degrees-of-freedom)
Adjusted R2
one-way-anova-test
(one-way-anova-test xss)
(one-way-anova-test xss {:keys [sides], :or {sides :one-sided-greater}})
outer-fence-extent
(outer-fence-extent vs)
(outer-fence-extent vs estimation-strategy)
Returns LOF, UOF and median
outliers
(outliers vs)
(outliers vs estimation-strategy)
(outliers vs q1 q3)
Find outliers defined as values outside inner fences.
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LIF (Lower Inner Fence) equals
(- Q1 (* 1.5 IQR))
. - UIF (Upper Inner Fence) equals
(+ Q3 (* 1.5 IQR))
.
Returns a sequence of outliers.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
p-overlap
(p-overlap [group1 group2])
(p-overlap group1 group2)
(p-overlap group1 group2 {:keys [kde bandwidth min-iterations steps], :or {kde :gaussian, min-iterations 3, steps 500}})
Overlapping index, kernel density approximation
p-value
(p-value stat)
(p-value distribution stat)
(p-value distribution stat sides)
Calculate p-value for given distribution (default: N(0,1)), stat
and sides (one of :two-sided
, :one-sided-greater
or :one-sided-less
/:one-sided
).
pacf
(pacf data)
(pacf data lags)
Caluclate pacf (partial autocorrelation function) for given number of lags.
If lags is omitted function returns maximum possible number of lags.
pacf
returns also lag 0
(which is 0.0
).
pacf-ci
(pacf-ci data)
(pacf-ci data lags)
(pacf-ci data lags alpha)
pacf with added confidence interval data.
pearson-correlation
(pearson-correlation [vs1 vs2])
(pearson-correlation vs1 vs2)
Pearson’s correlation of two sequences.
pearson-r
(pearson-r [group1 group2])
(pearson-r group1 group2)
Pearson r
correlation coefficient
percentile
(percentile vs p)
(percentile vs p estimation-strategy)
Calculate percentile of a vs
.
Percentile p
is from range 0-100.
See docs.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also quantile.
percentile-bc-extent
(percentile-bc-extent vs)
(percentile-bc-extent vs p)
(percentile-bc-extent vs p1 p2)
(percentile-bc-extent vs p1 p2 estimation-strategy)
Return bias corrected percentile range and mean for bootstrap samples. See https://projecteuclid.org/euclid.ss/1032280214
p
- calculates extent of bias corrected p
and 100-p
(default: p=2.5
)
Set estimation-strategy
to :r7
to get the same result as in R coxed::bca
.
percentile-bca-extent
(percentile-bca-extent vs)
(percentile-bca-extent vs p)
(percentile-bca-extent vs p1 p2)
(percentile-bca-extent vs p1 p2 estimation-strategy)
(percentile-bca-extent vs p1 p2 accel estimation-strategy)
Return bias corrected percentile range and mean for bootstrap samples. Also accounts for variance variations throught the accelaration parameter. See https://projecteuclid.org/euclid.ss/1032280214
p
- calculates extent of bias corrected p
and 100-p
(default: p=2.5
)
Set estimation-strategy
to :r7
to get the same result as in R coxed::bca
.
percentile-extent
(percentile-extent vs)
(percentile-extent vs p)
(percentile-extent vs p1 p2)
(percentile-extent vs p1 p2 estimation-strategy)
Return percentile range and median.
p
- calculates extent of p
and 100-p
(default: p=25
)
percentiles
(percentiles vs)
(percentiles vs ps)
(percentiles vs ps estimation-strategy)
Calculate percentiles of a vs
.
Percentiles are sequence of values from range 0-100.
See docs.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also quantile.
pi
(pi vs)
(pi vs size)
(pi vs size estimation-strategy)
Returns PI as a map, quantile intervals based on interval size.
Quantiles are (1-size)/2
and 1-(1-size)/2
pi-extent
(pi-extent vs)
(pi-extent vs size)
(pi-extent vs size estimation-strategy)
Returns PI extent, quantile intervals based on interval size + median.
Quantiles are (1-size)/2
and 1-(1-size)/2
pooled-stddev
(pooled-stddev groups)
(pooled-stddev groups method)
Calculate pooled standard deviation for samples and method
pooled-variance
(pooled-variance groups)
(pooled-variance groups method)
Calculate pooled variance for samples and method.
Methods: * :unbiased
- sqrt of weighted average of variances (default) * :biased
- biased version of :unbiased
* :avg
- sqrt of average of variances
population-stddev
(population-stddev vs)
(population-stddev vs mu)
Calculate population standard deviation of vs
.
See stddev.
population-variance
(population-variance vs)
(population-variance vs mu)
Calculate population variance of vs
.
See variance.
population-wstddev
(population-wstddev vs weights)
Calculate population weighted standard deviation of vs
population-wvariance
(population-wvariance vs freqs)
Calculate population weighted variance of vs
.
power-divergence-test
(power-divergence-test contingency-table-or-xs)
(power-divergence-test contingency-table-or-xs {:keys [lambda ci-sides sides p alpha bootstrap-samples ddof bins], :or {lambda m/TWO_THIRD, sides :one-sided-greater, ci-sides :two-sided, alpha 0.05, bootstrap-samples 1000, ddof 0}})
Power divergence test.
First argument should be one of:
- contingency table
- sequence of counts (for goodness of fit)
- sequence of data (for goodness of fit against distribution)
For goodness of fit there are two options:
- comparison of observed counts vs expected probabilities or weights (
:p
) - comparison of data against given distribution (
:p
), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use:bins
option to control histogram creation.
Options are:
:lambda
- test type:1.0
- chisq-test0.0
- multinomial-likelihood-ratio-test-1.0
- minimum-discrimination-information-test-2.0
- neyman-modified-chisq-test-0.5
- freeman-tukey-test2/3
- cressie-read-test - default
:p
- probabilites, weights or distribution object.:alpha
- significance level (default: 0.05):ci-sides
- confidence interval sides (default::two-sided
):sides
- p-value sides (:two-sided
,:one-side-greater
- default,:one-side-less
):bootstrap-samples
- number of samples to estimate confidence intervals (default: 1000):ddof
- delta degrees of freedom, adjustment for dof (default: 0.0):bins
- number of bins or estimator name for histogram
power-transformation
(power-transformation xs)
(power-transformation xs lambda)
(power-transformation xs lambda alpha)
Power transformation of data.
All values should be positive.
Arguments: * lambda
- power parameter (default: 0.0) * alpha
- shift parameter (optional)
powmean
(powmean vs power)
(powmean vs weights power)
Generalized power mean
psnr
(psnr [vs1 vs2-or-val])
(psnr vs1 vs2-or-val)
(psnr vs1 vs2-or-val max-value)
Peak signal to noise, max-value
is maximum possible value (default: max from vs1
and vs2
)
quantile
(quantile vs q)
(quantile vs q estimation-strategy)
Calculate quantile of a vs
.
Quantile q
is from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentile.
quantile-extent
(quantile-extent vs)
(quantile-extent vs q)
(quantile-extent vs q1 q2)
(quantile-extent vs q1 q2 estimation-strategy)
Return quantile range and median.
q
- calculates extent of q
and 1.0-q
(default: q=0.25
)
quantiles
(quantiles vs)
(quantiles vs qs)
(quantiles vs qs estimation-strategy)
Calculate quantiles of a vs
.
Quantilizes is sequence with values from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentiles.
r2
(r2 [vs1 vs2-or-val])
(r2 vs1 vs2-or-val)
(r2 vs1 vs2-or-val no-of-variables)
R2
r2-determination
(r2-determination [group1 group2])
(r2-determination group1 group2)
Coefficient of determination
rank-epsilon-sq
(rank-epsilon-sq xs)
Effect size for Kruskal-Wallis test
rank-eta-sq
(rank-eta-sq xs)
Effect size for Kruskal-Wallis test
remove-outliers
(remove-outliers vs)
(remove-outliers vs estimation-strategy)
(remove-outliers vs q1 q3)
Remove outliers defined as values outside inner fences.
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LIF (Lower Inner Fence) equals
(- Q1 (* 1.5 IQR))
. - UIF (Upper Inner Fence) equals
(+ Q3 (* 1.5 IQR))
.
Returns a sequence without outliers.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
rescale
(rescale vs)
(rescale vs low high)
Lineary rascale data to desired range, [0,1] by default
rmse
(rmse [vs1 vs2-or-val])
(rmse vs1 vs2-or-val)
Root mean squared error
robust-standardize
(robust-standardize vs)
(robust-standardize vs q)
Normalize samples to have median = 0 and MAD = 1.
If q
argument is used, scaling is done by quantile difference (Q_q, Q_(1-q)). Set 0.25 for IQR.
rows->contingency-table
(rows->contingency-table xss)
rss
(rss [vs1 vs2-or-val])
(rss vs1 vs2-or-val)
Residual sum of squares
second-moment DEPRECATED
Deprecated: Use moment function
sem
(sem vs)
Standard error of mean
sem-extent
(sem-extent vs)
-/+ sem and mean
similarity
(similarity method P-observed Q-expected)
(similarity method P-observed Q-expected {:keys [bins probabilities? epsilon], :or {probabilities? true, epsilon 1.0E-6}})
Various PDF similarities between two histograms (frequencies) or probabilities.
Q can be a distribution object. Then, histogram will be created out of P.
Arguments:
method
- distance methodP-observed
- frequencies, probabilities or actual data (when Q is a distribution)Q-expected
- frequencies, probabilities or distribution object (when P is a data)
Options:
:probabilities?
- should P/Q be converted to a probabilities, default:true
.:epsilon
- small number which replaces0.0
when division or logarithm is used`:bins
- number of bins or bins estimation method, see histogram.
The list of methods: :intersection
, :czekanowski
, :motyka
, :kulczynski
, :ruzicka
, :inner-product
, :harmonic-mean
, :cosine
, :jaccard
, :dice
, :fidelity
, :squared-chord
See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha
skewness
(skewness vs)
(skewness vs typ)
Calculate skewness from sequence.
Possible types: :G1
(default), :g1
(:pearson
), :b1
, :B1
(:yule
), :B3
, :skew
, :mode
, :bowley
, :hogg
or :median
.
skewness-test
(skewness-test xs)
(skewness-test xs params)
(skewness-test xs skew {:keys [sides type], :or {sides :two-sided, type :g1}})
Normality test for skewness.
span
(span vs)
Width of the sample, maximum value minus minimum value
spearman-correlation
(spearman-correlation [vs1 vs2])
(spearman-correlation vs1 vs2)
Spearman’s correlation of two sequences.
standardize
(standardize vs)
Normalize samples to have mean = 0 and stddev = 1.
stats-map
(stats-map vs)
(stats-map vs estimation-strategy)
Calculate several statistics of vs
and return as map.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
stddev
(stddev vs)
(stddev vs mu)
Calculate standard deviation of vs
.
See population-stddev.
stddev-extent
(stddev-extent vs)
-/+ stddev and mean
sum
(sum vs)
Sum of all vs
values.
t-test-one-sample
(t-test-one-sample xs)
(t-test-one-sample xs m)
One sample Student’s t-test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
,:one-sided-less
(short::one-sided
) or:one-sided-greater
mu
- mean (default:0.0
)
t-test-two-samples
(t-test-two-samples xs ys)
(t-test-two-samples xs ys {:keys [paired? equal-variances?], :or {paired? false, equal-variances? false}, :as params})
Two samples Student’s t-test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
(default),:one-sided-less
(short::one-sided
) or:one-sided-greater
mu
- mean (default:0.0
)paired?
- unpaired or paired test, boolean (default:false
)equal-variances?
- unequal or equal variances, boolean (default:false
)
trim
(trim vs)
(trim vs quantile)
(trim vs quantile estimation-strategy)
(trim vs low high nan)
Return trimmed data. Trim is done by using quantiles, by default is set to 0.2.
trim-lower
(trim-lower vs)
(trim-lower vs quantile)
(trim-lower vs quantile estimation-strategy)
Trim data below given quanitle, default: 0.2.
trim-upper
(trim-upper vs)
(trim-upper vs quantile)
(trim-upper vs quantile estimation-strategy)
Trim data above given quanitle, default: 0.2.
tschuprows-t
(tschuprows-t group1 group2)
(tschuprows-t contingency-table)
Tschuprows T effect size for discrete data
ttest-one-sample DEPRECATED
Deprecated: Use t-test-one-sample
ttest-two-samples DEPRECATED
Deprecated: Use t-test-two-samples
variance
(variance vs)
(variance vs mu)
Calculate variance of vs
.
See population-variance.
variation
(variation vs)
Coefficient of variation CV = stddev / mean
weighted-kappa
(weighted-kappa contingency-table)
(weighted-kappa contingency-table weights)
Cohen’s weighted kappa for indexed contingency table
winsor
(winsor vs)
(winsor vs quantile)
(winsor vs quantile estimation-strategy)
(winsor vs low high nan)
Return winsorized data. Trim is done by using quantiles, by default is set to 0.2.
wmean DEPRECATED
Deprecated: Use mean
(wmean vs)
(wmean vs weights)
Weighted mean
wmedian
(wmedian vs ws)
(wmedian vs ws method)
Weighted median.
Calculation is done using interpolation. There are three methods: * :linear
- linear interpolation, default * :step
- step interpolation * :average
- average of ties
Based on spatstat.geom::weighted.quantile
from R.
wmw-odds
(wmw-odds [group1 group2])
(wmw-odds group1 group2)
Wilcoxon-Mann-Whitney odds
wquantile
(wquantile vs ws q)
(wquantile vs ws q method)
Weighted quantile.
Calculation is done using interpolation. There are three methods: * :linear
- linear interpolation, default * :step
- step interpolation * :average
- average of ties
Based on spatstat.geom::weighted.quantile
from R.
wquantiles
(wquantiles vs ws)
(wquantiles vs ws qs)
(wquantiles vs ws qs method)
Weighted quantiles.
Calculation is done using interpolation. There are three methods: * :linear
- linear interpolation, default * :step
- step interpolation * :average
- average of ties
Based on spatstat.geom::weighted.quantile
from R.
wstddev
(wstddev vs freqs)
Calculate weighted (unbiased) standard deviation of vs
wvariance
(wvariance vs freqs)
Calculate weighted (unbiased) variance of vs
.
yeo-johnson-transformation
(yeo-johnson-transformation xs)
(yeo-johnson-transformation xs lambda)
(yeo-johnson-transformation xs lambda alpha)
Yeo-Johnson transformation
Arguments: * lambda
- power parameter (default: 0.0) * alpha
- shift parameter (optional)
z-test-one-sample
(z-test-one-sample xs)
(z-test-one-sample xs m)
One sample z-test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
,:one-sided-less
(short::one-sided
) or:one-sided-greater
mu
- mean (default:0.0
)
z-test-two-samples
(z-test-two-samples xs ys)
(z-test-two-samples xs ys {:keys [paired? equal-variances?], :or {paired? false, equal-variances? false}, :as params})
Two samples z-test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
(default),:one-sided-less
(short::one-sided
) or:one-sided-greater
mu
- mean (default:0.0
)paired?
- unpaired or paired test, boolean (default:false
)equal-variances?
- unequal or equal variances, boolean (default:false
)
source: clay/stats.clj