fastmath.stats

Statistics functions.

Descriptive statistics.
Correlation / covariance
Outliers
Confidence intervals
Extents
Effect size
Tests
Histogram
ACF/PACF
Bootstrap (see fastmath.stats.bootstrap)
Binary measures

Functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

Descriptive statistics

All in one function stats-map contains:

:Size - size of the samples, (count ...)
:Min - minimum value
:Max - maximum value
:Range - range of values
:Mean - mean/average
:Median - median, see also: median-3
:Mode - mode, see also: modes
:Q1 - first quartile, use: percentile, quartile
:Q3 - third quartile, use: percentile, quartile
:Total - sum of all samples
:SD - sample standard deviation
:Variance - variance
:MAD - median-absolute-deviation
:SEM - standard error of mean
:LAV - lower adjacent value, use: adjacent-values
:UAV - upper adjacent value, use: adjacent-values
:IQR - interquartile range, (- q3 q1)
:LOF - lower outer fence, (- q1 (* 3.0 iqr))
:UOF - upper outer fence, (+ q3 (* 3.0 iqr))
:LIF - lower inner fence, (- q1 (* 1.5 iqr))
:UIF - upper inner fence, (+ q3 (* 1.5 iqr))
:Outliers - list of outliers, samples which are outside outer fences
:Kurtosis - kurtosis
:Skewness - skewness

Note: percentile and quartile can have 10 different interpolation strategies. See docs

->confusion-matrix

(->confusion-matrix tp fn fp tn)(->confusion-matrix confusion-matrix)(->confusion-matrix actual prediction)(->confusion-matrix actual prediction encode-true)

Convert input to confusion matrix

view source

acf

(acf data)(acf data lags)

Calculate acf (autocorrelation function) for given number of lags or a list of lags.

If lags is omitted function returns maximum possible number of lags.

Examples

Usage

(acf (repeatedly 1000 r/grand) 5)
;;=> (1.0
;;=>  0.024818272833702658
;;=>  -0.016643111047625263
;;=>  -0.04387141411591324
;;=>  -0.017525857903339097
;;=>  -0.02298214632194487)
(acf (repeatedly 1000 r/grand) [10 20 100 500])
;;=> (-0.0376606395291497
;;=>  -0.034034408698036596
;;=>  -0.010242012560395726
;;=>  0.0045013783824463)
(acf [1 2 3 4 5 4 3 2 1])
;;=> (1.0
;;=>  0.5396825396825397
;;=>  -0.013492063492063475
;;=>  -0.4666666666666665
;;=>  -0.6269841269841269
;;=>  -0.3015873015873015
;;=>  -0.011904761904761935
;;=>  0.17777777777777773
;;=>  0.20317460317460315)

view source

acf-ci

(acf-ci data)(acf-ci data lags)(acf-ci data lags alpha)

acf with added confidence interval data.

:cis contains list of calculated ci for every lag.

Examples

Usage

(acf-ci (repeatedly 1000 r/grand) 3)
;;=> {:acf
;;=>  (1.0 8.400697641833316E-4 0.0448042665012578 0.009808923241870098),
;;=>  :ci 0.06197950323045615,
;;=>  :cis (0.06197950323045615
;;=>        0.06197954697044274
;;=>        0.062103841288912796
;;=>        0.062109792420910005)}
(acf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:acf
;;=>  (1.0 0.5396825396825397 -0.013492063492063475 -0.4666666666666665),
;;=>  :ci 0.653321328180018,
;;=>  :cis (0.653321328180018
;;=>        0.8218653739461048
;;=>        0.8219599072345126
;;=>        0.9281841012727746)}

view source

ad-test-one-sample

(ad-test-one-sample xs)(ad-test-one-sample xs distribution-or-ys)

(ad-test-one-sample xs distribution-or-ys {:keys [sides kernel bandwidth], :or {sides :one-sided-greater, kernel :gaussian}})

Anderson-Darling test

view source

adjacent-values

(adjacent-values vs)(adjacent-values vs estimation-strategy)(adjacent-values vs q1 q3 m)

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

LAV is smallest value which is greater or equal to the LIF = (- Q1 (* 1.5 IQR)).
UAV is largest value which is lower or equal to the UIF = (+ Q3 (* 1.5 IQR)).
third value is a median of samples

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

LAV, UAV

(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0 2.0]

Gaussian distribution LAV, UAV

(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.7024365882934935 2.7023587415571386 5.342937622886728E-4]

view source

ameasure

(ameasure [group1 group2])(ameasure group1 group2)

Vargha-Delaney A measure for two populations a and b

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (ameasure t c))
;;=> 0.375

view source

binary-measures

(binary-measures tp fn fp tn)(binary-measures confusion-matrix)(binary-measures actual prediction)(binary-measures actual prediction true-value)

Subset of binary measures. See binary-measures-all.

Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalence]

Examples

Usage

(binary-measures [true false true false true false true false]
                 [true false false true false false false true])
;;=> {:accuracy 0.375,
;;=>  :f-measure 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fn 3,
;;=>  :fp 2,
;;=>  :precision 0.3333333333333333,
;;=>  :prevalence 0.5,
;;=>  :recall 0.25,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tp 1}

Treat 1 as true value.

(binary-measures [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=>  :f-measure 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fn 3,
;;=>  :fp 2,
;;=>  :precision 0.3333333333333333,
;;=>  :prevalence 0.5,
;;=>  :recall 0.25,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tp 1}

Treat :a and :b as true value.

(binary-measures [:a :b :c :d :e :f :a :b]
                 [:a :b :a :b :a :f :d :b]
                 {:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=>  :f-measure 0.6,
;;=>  :fall-out 0.75,
;;=>  :fdr 0.5,
;;=>  :fn 1,
;;=>  :fp 3,
;;=>  :precision 0.5,
;;=>  :prevalence 0.5,
;;=>  :recall 0.75,
;;=>  :sensitivity 0.75,
;;=>  :specificity 0.25,
;;=>  :tn 1,
;;=>  :tp 3}

view source

binary-measures-all

(binary-measures-all tp fn fp tn)(binary-measures-all confusion-matrix)(binary-measures-all actual prediction)(binary-measures-all actual prediction true-value)

Collection of binary measures.

Arguments: * confusion-matrix - either map or sequence with [:tp :fn :fp :tn] values

actual - list of ground truth values
prediction - list of predicted values
true-value - optional, true/false encoding, what is true in truth and prediction

true-value can be one of:

nil - values are treating as booleans
any sequence - values from sequence will be treated as true
map - conversion will be done according to provided map (if there is no correspondin key, value is treated as false)
any predicate

https://en.wikipedia.org/wiki/Precision_and_recall

Examples

Usage

(binary-measures-all [true false true false true false true false]
                     [true false false true false false false true])
;;=> {:accuracy 0.375,
;;=>  :ba 0.375,
;;=>  :bm -0.25,
;;=>  :cn 4.0,
;;=>  :cp 4.0,
;;=>  :dor 0.3333333333333333,
;;=>  :f-beta #,
;;=>  :f-measure 0.28571428571428575,
;;=>  :f1-score 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fm 0.28867513459481287,
;;=>  :fn 3,
;;=>  :fnr 0.75,
;;=>  :for 0.6,
;;=>  :fp 2,
;;=>  :fpr 0.5,
;;=>  :hit-rate 0.25,
;;=>  :jaccard 0.16666666666666666,
;;=>  :kappa -0.25,
;;=>  :lr+ 0.5,
;;=>  :lr- 1.5,
;;=>  :mcc -0.2581988897471611,
;;=>  :miss-rate 0.75,
;;=>  :mk -0.2666666666666666,
;;=>  :n 4.0,
;;=>  :npv 0.4,
;;=>  :p 4.0,
;;=>  :pcn 5.0,
;;=>  :pcp 3.0,
;;=>  :phi -0.2581988897471611,
;;=>  :pn 5.0,
;;=>  :pp 3.0,
;;=>  :ppv 0.3333333333333333,
;;=>  :precision 0.3333333333333333,
;;=>  :prevalence 0.5,
;;=>  :pt 0.5857864376269049,
;;=>  :recall 0.25,
;;=>  :selectivity 0.5,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tnr 0.5,
;;=>  :total 8.0,
;;=>  :tp 1,
;;=>  :tpr 0.25,
;;=>  :ts 0.16666666666666666}

Treat 1 as true value.

(binary-measures-all [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=>  :ba 0.375,
;;=>  :bm -0.25,
;;=>  :cn 4.0,
;;=>  :cp 4.0,
;;=>  :dor 0.3333333333333333,
;;=>  :f-beta #,
;;=>  :f-measure 0.28571428571428575,
;;=>  :f1-score 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fm 0.28867513459481287,
;;=>  :fn 3,
;;=>  :fnr 0.75,
;;=>  :for 0.6,
;;=>  :fp 2,
;;=>  :fpr 0.5,
;;=>  :hit-rate 0.25,
;;=>  :jaccard 0.16666666666666666,
;;=>  :kappa -0.25,
;;=>  :lr+ 0.5,
;;=>  :lr- 1.5,
;;=>  :mcc -0.2581988897471611,
;;=>  :miss-rate 0.75,
;;=>  :mk -0.2666666666666666,
;;=>  :n 4.0,
;;=>  :npv 0.4,
;;=>  :p 4.0,
;;=>  :pcn 5.0,
;;=>  :pcp 3.0,
;;=>  :phi -0.2581988897471611,
;;=>  :pn 5.0,
;;=>  :pp 3.0,
;;=>  :ppv 0.3333333333333333,
;;=>  :precision 0.3333333333333333,
;;=>  :prevalence 0.5,
;;=>  :pt 0.5857864376269049,
;;=>  :recall 0.25,
;;=>  :selectivity 0.5,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tnr 0.5,
;;=>  :total 8.0,
;;=>  :tp 1,
;;=>  :tpr 0.25,
;;=>  :ts 0.16666666666666666}

Treat :a and :b as true value.

(binary-measures-all [:a :b :c :d :e :f :a :b]
                     [:a :b :a :b :a :f :d :b]
                     {:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=>  :ba 0.5,
;;=>  :bm 0.0,
;;=>  :cn 4.0,
;;=>  :cp 4.0,
;;=>  :dor 1.0,
;;=>  :f-beta #,
;;=>  :f-measure 0.6,
;;=>  :f1-score 0.6,
;;=>  :fall-out 0.75,
;;=>  :fdr 0.5,
;;=>  :fm 0.6123724356957945,
;;=>  :fn 1,
;;=>  :fnr 0.25,
;;=>  :for 0.5,
;;=>  :fp 3,
;;=>  :fpr 0.75,
;;=>  :hit-rate 0.75,
;;=>  :jaccard 0.42857142857142855,
;;=>  :kappa 0.0,
;;=>  :lr+ 1.0,
;;=>  :lr- 1.0,
;;=>  :mcc 0.0,
;;=>  :miss-rate 0.25,
;;=>  :mk 0.0,
;;=>  :n 4.0,
;;=>  :npv 0.5,
;;=>  :p 4.0,
;;=>  :pcn 2.0,
;;=>  :pcp 6.0,
;;=>  :phi 0.0,
;;=>  :pn 2.0,
;;=>  :pp 6.0,
;;=>  :ppv 0.5,
;;=>  :precision 0.5,
;;=>  :prevalence 0.5,
;;=>  :pt ##NaN,
;;=>  :recall 0.75,
;;=>  :selectivity 0.25,
;;=>  :sensitivity 0.75,
;;=>  :specificity 0.25,
;;=>  :tn 1,
;;=>  :tnr 0.25,
;;=>  :total 8.0,
;;=>  :tp 3,
;;=>  :tpr 0.75,
;;=>  :ts 0.42857142857142855}

F-beta is a function. When beta is equal 1.0, you get f1-score.

(let [fbeta (:f-beta (binary-measures-all
                      [true false true false true false true false]
                      [true false false true false false false true]))]
  [(fbeta 1.0) (fbeta 2.0) (fbeta 0.5)])
;;=> [0.28571428571428575 0.2631578947368421 0.3125]

view source

binomial-ci

(binomial-ci number-of-successes number-of-trials)(binomial-ci number-of-successes number-of-trials method)(binomial-ci number-of-successes number-of-trials method alpha)

Return confidence interval for a binomial distribution.

Possible methods are: * :asymptotic (normal aproximation, based on central limit theorem), default * :agresti-coull * :clopper-pearson * :wilson * :prop.test - one sample proportion test * :cloglog * :logit * :probit * :arcsine * :all - apply all methods and return a map of triplets

Default alpha is 0.05

Returns a triple lower ci, upper ci, p=successes/trials

view source

binomial-ci-methods

view source

binomial-test

(binomial-test xs)(binomial-test xs maybe-params)

(binomial-test number-of-successes number-of-trials {:keys [alpha p ci-method sides], :or {alpha 0.05, p 0.5, ci-method :asymptotic, sides :two-sided}})

Binomial test

alpha - significance level (default: 0.05)
sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
ci-method - see binomial-ci-methods
p - tested probability

view source

bootstrap

deprecated in Please use fastmath.stats.bootstrap/bootstrap instead

(bootstrap vs)(bootstrap vs samples)(bootstrap vs samples size)

Generate set of samples of given size from provided data.

Default samples is 200, number of size defaults to sample size.

Examples

Usage

(bootstrap [1 2 3 4 1 2 3 1 2 1] 2 20)
;;=> ((2.0
;;=>   3.0
;;=>   1.0
;;=>   2.0
;;=>   3.0
;;=>   2.0
;;=>   3.0
;;=>   2.0
;;=>   2.0
;;=>   2.0
;;=>   2.0
;;=>   2.0
;;=>   2.0
;;=>   1.0
;;=>   2.0
;;=>   2.0
;;=>   3.0
;;=>   2.0
;;=>   1.0
;;=>   2.0)
;;=>  (2.0
;;=>   2.0
;;=>   1.0
;;=>   2.0
;;=>   2.0
;;=>   1.0
;;=>   2.0
;;=>   2.0
;;=>   3.0
;;=>   4.0
;;=>   1.0
;;=>   1.0
;;=>   4.0
;;=>   3.0
;;=>   3.0
;;=>   2.0
;;=>   4.0
;;=>   2.0
;;=>   1.0
;;=>   1.0))
(let [data [1 2 3 4 1 2 3 1 2 1]
      fdata (frequencies data)
      bdata (bootstrap data 5 1000)]
  {:source fdata, :bootstrapped (map frequencies bdata)})
;;=> {:bootstrapped ({1.0 392, 2.0 295, 3.0 216, 4.0 97}
;;=>                 {1.0 388, 2.0 286, 3.0 228, 4.0 98}
;;=>                 {1.0 378, 2.0 305, 3.0 223, 4.0 94}
;;=>                 {1.0 423, 2.0 298, 3.0 190, 4.0 89}
;;=>                 {1.0 406, 2.0 294, 3.0 202, 4.0 98}),
;;=>  :source {1 4, 2 3, 3 2, 4 1}}

view source

bootstrap-ci

deprecated in Please use fastmath.stats.boostrap/ci-basic instead

(bootstrap-ci vs)(bootstrap-ci vs alpha)(bootstrap-ci vs alpha samples)(bootstrap-ci vs alpha samples stat-fn)

Bootstrap method to calculate confidence interval.

Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.

Returns ci and statistical function value.

Examples

Usage

(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-5.498000000000001 17.700000000000003 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [2.5999999999999996 15.7 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8 100000)
;;=> [2.5999999999999996 15.7 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.98 1000 median)
;;=> [-20.5 3.0 1.5]

view source

brown-forsythe-test

(brown-forsythe-test xss)(brown-forsythe-test xss params)

view source

chisq-test

(chisq-test contingency-table-or-xs)(chisq-test contingency-table-or-xs params)

Chi square test, a power divergence test for lambda 1.0

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

ci

(ci vs)(ci vs alpha)

T-student based confidence interval for given data. Alpha value defaults to 0.05.

Last value is mean.

Examples

Usage

(ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-6.842279963189242 24.84227996318924 9.0]
(ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [7.172484402810223 10.827515597189777 9.0]

view source

cliffs-delta

(cliffs-delta [group1 group2])(cliffs-delta group1 group2)

Cliff’s delta effect size for ordinal data.

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (cliffs-delta t c))
;;=> -0.25
(let [t [:a :b :c :D :e :f] c [:a :z :X :y :x :y]] (cliffs-delta t c))
;;=> -0.4722222222222222

view source

coefficient-matrix

(coefficient-matrix vss)(coefficient-matrix vss measure-fn)(coefficient-matrix vss measure-fn symmetric?)

Generate coefficient (correlation, covariance, any two arg function) matrix from seq of seqs. Row order.

Default method: pearson-correlation

view source

cohens-d

(cohens-d [group1 group2])(cohens-d group1 group2)(cohens-d group1 group2 method)

Cohen’s d effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (cohens-d t c))
;;=> -0.42208932839491475

view source

cohens-d-corrected

(cohens-d-corrected [group1 group2])(cohens-d-corrected group1 group2)(cohens-d-corrected group1 group2 method)

Cohen’s d corrected for small group size

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (cohens-d-corrected t c))
;;=> -0.3990662741188285

view source

cohens-f

(cohens-f [group1 group2])(cohens-f group1 group2)(cohens-f group1 group2 type)

Cohens f, sqrt of Cohens f2.

Possible type values are: :eta (default), :omega and :epsilon.

view source

cohens-f2

(cohens-f2 [group1 group2])(cohens-f2 group1 group2)(cohens-f2 group1 group2 type)

Cohens f2, by default based on eta-sq.

Possible type values are: :eta (default), :omega and :epsilon.

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]]
  {:default (cohens-f2 t c),
   :eta (cohens-f2 t c :eta),
   :omega (cohens-f2 t c :omega),
   :epsilon (cohens-f2 t c :epsilon)})
;;=> {:default 0.06779661016949151,
;;=>  :epsilon -0.050847457627118633,
;;=>  :eta 0.06779661016949151,
;;=>  :omega -0.04576271186440677}

view source

cohens-kappa

(cohens-kappa group1 group2)(cohens-kappa contingency-table)

Cohens kappa

view source

cohens-q

(cohens-q r1 r2)(cohens-q group1 group2a group2b)(cohens-q group1a group2a group1b group2b)

Comparison of two correlations.

Arity:

2 - compare two correlation values
3 - compare correlation of group1 and group2a with correlation of group1 and group2b
4 - compare correlation of first two arguments with correlation of last two arguments

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]
      d [5 2 3 4 4 5 4 2 3 -1]
      e (range 10)]
  {:arity-2 (cohens-q 0.5 -0.25),
   :arity-3 (cohens-q t c d),
   :arity-4 (cohens-q t c d e)})
;;=> {:arity-2 0.8047189562170503,
;;=>  :arity-3 0.9030140869391835,
;;=>  :arity-4 0.8369271764963739}

view source

cohens-u2

(cohens-u2 [group1 group2])(cohens-u2 group1 group2)(cohens-u2 group1 group2 estimation-strategy)

Cohen’s U2, the proportion of one of the groups that exceeds the same proportion in the other group.

view source

cohens-u3

(cohens-u3 [group1 group2])(cohens-u3 group1 group2)(cohens-u3 group1 group2 estimation-strategy)

Cohen’s U3, the proportion of the second group that is smaller than the median of the first group.

view source

cohens-w

(cohens-w group1 group2)(cohens-w contingency-table)

Cohen’s W effect size for discrete data.

Examples

Usage

(let [a [:a :a :b :b :f :a :a :b :b :c :a :a :b :b :c :a :a :b :b :c]
      b [:b :f :a :a :b :b :y :z :c :b :b :c :a :a :b :b :c :a :a :b]]
  (cohens-w a b))
;;=> 1.0408329997330665

view source

contingency-2x2-measures

(contingency-2x2-measures & args)

view source

contingency-2x2-measures-all

(contingency-2x2-measures-all a b c d)(contingency-2x2-measures-all map-or-seq)(contingency-2x2-measures-all [a b] [c d])

view source

contingency-table

(contingency-table & seqs)

Returns frequencies map of tuples built from seqs.

view source

contingency-table->marginals

(contingency-table->marginals ct)

view source

correlation

(correlation [vs1 vs2])(correlation vs1 vs2)

Correlation of two sequences.

Examples

Correlation of uniform and gaussian distribution samples.

(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
             (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.003303068749082862

view source

correlation-matrix

(correlation-matrix vss)(correlation-matrix vss measure)

Generate correlation matrix from seq of seqs. Row order.

Possible measures: :pearson (default), :kendall, :spearman.

view source

count=

(count= [vs1 vs2-or-val])(count= vs1 vs2-or-val)

Count equal values in both seqs. Same as L0

view source

covariance

(covariance [vs1 vs2])(covariance vs1 vs2)

Covariance of two sequences.

Examples

Covariance of uniform and gaussian distribution samples.

(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
            (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.04673681719722958

view source

covariance-matrix

(covariance-matrix vss)

Generate covariance matrix from seq of seqs. Row order.

Examples

Usage

(covariance-matrix [[1 2 3 4 5 11] [3 2 3 2 3 4]])
;;=> ([12.666666666666668 1.8666666666666667]
;;=>  [1.8666666666666667 0.5666666666666667])

view source

cramers-c

(cramers-c group1 group2)(cramers-c contingency-table)

Cramer’s C effect size for discrete data.

view source

cramers-v

(cramers-v group1 group2)(cramers-v contingency-table)

Cramer’s V effect size for discrete data.

Examples

Usage

(let [a [:a :a :b :b :f :a :a :b :b :c :a :a :b :b :c :a :a :b :b :c]
      b [:b :f :a :a :b :b :y :z :c :b :b :c :a :a :b :b :c :a :a :b]]
  (cramers-v a b))
;;=> 0.6009252125773316

view source

cramers-v-corrected

(cramers-v-corrected group1 group2)(cramers-v-corrected contingency-table)

Corrected Cramer’s V

Examples

Usage

(let [a [:a :a :b :b :f :a :a :b :b :c :a :a :b :b :c :a :a :b :b :c]
      b [:b :f :a :a :b :b :y :z :c :b :b :c :a :a :b :b :c :a :a :b]]
  (cramers-v-corrected a b))
;;=> 0.3410563654946855

view source

cressie-read-test

(cressie-read-test contingency-table-or-xs)(cressie-read-test contingency-table-or-xs params)

Cressie-Read test, a power divergence test for lambda 2/3

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

demean

(demean vs)

Subtract mean from sequence

Examples

Usage

(demean [-5 1 1 1 1 2 2 5 11 71])
;;=> (-14.0 -8.0 -8.0 -8.0 -8.0 -7.0 -7.0 -4.0 2.0 62.0)

view source

dissimilarity

(dissimilarity method P-observed Q-expected)

(dissimilarity method P-observed Q-expected {:keys [bins probabilities? epsilon log-base power], :or {probabilities? true, epsilon 1.0E-6, log-base m/E, power 2.0}})

Various PDF distance between two histograms (frequencies) or probabilities.

Q can be a distribution object. Then, histogram will be created out of P.

Arguments:

method - distance method
P-observed - frequencies, probabilities or actual data (when Q is a distribution)
Q-expected - frequencies, probabilities or distribution object (when P is a data)

Options:

:probabilities? - should P/Q be converted to a probabilities, default: true.
:epsilon - small number which replaces 0.0 when division or logarithm is used`
:log-base - base for logarithms, default: e
:power - exponent for :minkowski distance, default: 2.0
:bins - number of bins or bins estimation method, see histogram.

The list of methods: :euclidean, :city-block, :manhattan, :chebyshev, :minkowski, :sorensen, :gower, :soergel, :kulczynski, :canberra, :lorentzian, :non-intersection, :wave-hedges, :czekanowski, :motyka, :tanimoto, :jaccard, :dice, :bhattacharyya, :hellinger, :matusita, :squared-chord, :euclidean-sq, :squared-euclidean, :pearson-chisq, :chisq, :neyman-chisq, :squared-chisq, :symmetric-chisq, :divergence, :clark, :additive-symmetric-chisq, :kullback-leibler, :jeffreys, :k-divergence, :topsoe, :jensen-shannon, :jensen-difference, :taneja, :kumar-johnson, :avg

See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha

view source

epsilon-sq

(epsilon-sq [group1 group2])(epsilon-sq group1 group2)

Less biased R2

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]]
  (epsilon-sq t c))
;;=> -0.05357142857142856

view source

estimate-bins

(estimate-bins vs)(estimate-bins vs bins-or-estimate-method)

Estimate number of bins for histogram.

Possible methods are: :sqrt :sturges :rice :doane :scott :freedman-diaconis (default).

The number returned is not higher than number of samples.

Examples

Estimate number of bins for various methods. vs contains 1000 random samples from Log-Normal distribution.

(estimate-bins vs :sqrt)
;;=> 31
(estimate-bins vs :sturges)
;;=> 11
(estimate-bins vs :rice)
;;=> 20
(estimate-bins vs :doane)
;;=> 18
(estimate-bins vs :scott)
;;=> 66
(estimate-bins vs :freedman-diaconis)
;;=> 201

view source

estimation-strategies-list

List of estimation strategies for percentile/quantile functions.

Examples

List of estimation strategies for percentile

(sort (keys estimation-strategies-list))
;;=> (:legacy :r1 :r2 :r3 :r4 :r5 :r6 :r7 :r8 :r9)

view source

eta-sq

(eta-sq [group1 group2])(eta-sq group1 group2)

R2, coefficient of determination

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]]
  (eta-sq t c))
;;=> 0.06349206349206347

view source

extent

(extent vs)

Return extent (min, max, mean) values from sequence

Examples

min/max and mean of gaussian distribution

(extent (repeatedly 100000 r/grand))
;;=> [-4.754024082311232 4.226469292832746 0.0036117613630082942]

view source

f-test

(f-test xs ys)(f-test xs ys {:keys [sides alpha], :or {sides :two-sided, alpha 0.05}})

Variance F-test of two samples.

alpha - significance level (default: 0.05)
sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater

view source

fligner-killeen-test

(fligner-killeen-test xss)(fligner-killeen-test xss {:keys [sides], :or {sides :one-sided-greater}})

view source

freeman-tukey-test

(freeman-tukey-test contingency-table-or-xs)(freeman-tukey-test contingency-table-or-xs params)

Freeman-Tukey test, a power divergence test for lambda -0.5

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

geomean

(geomean vs)

Geometric mean for positive values only

Examples

Geometric mean

(geomean [1 2 3 1 1 2 1 11 111])
;;=> 2.903203203730772

view source

glass-delta

(glass-delta [group1 group2])(glass-delta group1 group2)

Glass’s delta effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (glass-delta t c))
;;=> -0.3849741916091626

view source

harmean

(harmean vs)

Harmonic mean

Examples

Harmonic mean

(harmean [1 2 3 -1 -1 2 -1 11 111])
;;=> -15.880057803468203

view source

hedges-g

(hedges-g [group1 group2])(hedges-g group1 group2)

Hedges’s g effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (hedges-g t c))
;;=> -0.42208932839491475

view source

hedges-g*

(hedges-g* [group1 group2])(hedges-g* group1 group2)

Less biased Hedges’s g effect size for two groups, J term correction.

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (hedges-g* t c))
;;=> -0.3989958628030656

view source

hedges-g-corrected

(hedges-g-corrected [group1 group2])(hedges-g-corrected group1 group2)

Cohen’s d corrected for small group size

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (hedges-g-corrected t c))
;;=> -0.3990662741188285

view source

histogram

(histogram vs)(histogram vs bins-or-estimate-method)(histogram vs bins-or-estimate-method [mn mx])

Calculate histogram.

Returns map with keys:

:size - number of bins
:step - distance between bins
:bins - list of pairs of range lower value and number of hits
:min - min value
:max - max value
:samples - number of used samples

For estimation methods check estimate-bins.

If difference between min and max values is 0, number of bins is set to 1.

Examples

3 bins from uniform distribution.

(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([0.0011357582159668977 340]
;;=>         [0.3340889989822621 341]
;;=>         [0.6670422397485573 319]),
;;=>  :max 0.9999954805148524,
;;=>  :min 0.0011357582159668977,
;;=>  :samples 1000,
;;=>  :size 3,
;;=>  :step 0.3329532407662952}

3 bins from uniform distribution for given range.

(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins
;;=>  ([0.1 1336] [0.23333333333333334 1370] [0.3666666666666667 1333]),
;;=>  :max 0.5,
;;=>  :min 0.1,
;;=>  :samples 4039,
;;=>  :size 3,
;;=>  :step 0.13333333333333333}

5 bins from normal distribution.

(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.4805958085123243 188]
;;=>         [-2.0755018370184155 2289]
;;=>         [-0.6704078655245067 5299]
;;=>         [0.7346861059694016 2069]
;;=>         [2.139780077463311 155]),
;;=>  :max 3.544874048957219,
;;=>  :min -3.4805958085123243,
;;=>  :samples 10000,
;;=>  :size 5,
;;=>  :step 1.4050939714939088}

Estimate number of bins

(:size (histogram (repeatedly 10000 r/grand)))
;;=> 64

Estimate number of bins, Rice rule

(:size (histogram (repeatedly 10000 r/grand) :rice))
;;=> 44

view source

hpdi-extent

(hpdi-extent vs)(hpdi-extent vs size)

Higher Posterior Density interval + median.

size parameter is the target probability content of the interval.

view source

inner-fence-extent

(inner-fence-extent vs)(inner-fence-extent vs estimation-strategy)

Returns LIF, UIF and median

view source

iqr

(iqr vs)(iqr vs estimation-strategy)

Interquartile range.

Examples

IQR

(iqr (repeatedly 100000 r/grand))
;;=> 1.3535813918712551

view source

jensen-shannon-divergence

deprecated in Use [[dissimilarity]].

(jensen-shannon-divergence [vs1 vs2])(jensen-shannon-divergence vs1 vs2)

Jensen-Shannon divergence of two sequences.

Examples

Jensen-Shannon divergence

(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                           (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 439.86198157468397

view source

kendall-correlation

(kendall-correlation [vs1 vs2])(kendall-correlation vs1 vs2)

Kendall’s correlation of two sequences.

Examples

Kendall’s correlation of uniform and gaussian distribution samples.

(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -6.012940129401294E-4

view source

kruskal-test

(kruskal-test xss)(kruskal-test xss {:keys [sides], :or {sides :right}})

Kruskal-Wallis rank sum test.

view source

ks-test-one-sample

(ks-test-one-sample xs)(ks-test-one-sample xs distribution-or-ys)

(ks-test-one-sample xs distribution-or-ys {:keys [sides kernel bandwidth distinct?], :or {sides :two-sided, kernel :gaussian, distinct? true}})

One sample Kolmogorov-Smirnov test

view source

ks-test-two-samples

(ks-test-two-samples xs ys)(ks-test-two-samples xs ys {:keys [sides distinct?], :or {sides :two-sided, distinct? true}})

Two samples Kolmogorov-Smirnov test

view source

kullback-leibler-divergence

deprecated in Use [[dissimilarity]].

(kullback-leibler-divergence [vs1 vs2])(kullback-leibler-divergence vs1 vs2)

Kullback-Leibler divergence of two sequences.

Examples

Kullback-Leibler divergence.

(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                             (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 1511.2029846076014

view source

kurtosis

(kurtosis vs)(kurtosis vs typ)

Calculate kurtosis from sequence.

Possible typs: :G2 (default), :g2 (or :excess), :geary or :kurt.

Examples

Kurtosis

(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099
(kurtosis [1 2 3 -1 -1 2 -1 11 111] :G2)
;;=> 8.732515263272099
(kurtosis [1 2 3 -1 -1 2 -1 11 111] :g2)
;;=> 3.9845705132178515
(kurtosis [1 2 3 -1 -1 2 -1 11 111] :excess)
;;=> 3.9845705132178515
(kurtosis [1 2 3 -1 -1 2 -1 11 111] :kurt)
;;=> 6.984570513217852

view source

L0

Count equal values in both seqs. Same as count==

view source

L1

(L1 [vs1 vs2-or-val])(L1 vs1 vs2-or-val)

Manhattan distance

view source

L2

(L2 [vs1 vs2-or-val])(L2 vs1 vs2-or-val)

Euclidean distance

view source

L2sq

(L2sq [vs1 vs2-or-val])(L2sq vs1 vs2-or-val)

Squared euclidean distance

view source

levene-test

(levene-test xss)(levene-test xss {:keys [sides statistic scorediff], :or {sides :one-sided-greater, statistic mean, scorediff abs}})

view source

LInf

(LInf [vs1 vs2-or-val])(LInf vs1 vs2-or-val)

Chebyshev distance

view source

mad

Alias for median-absolute-deviation

view source

mad-extent

(mad-extent vs)

-/+ median-absolute-deviation and median

Examples

median absolute deviation from median for gaussian distribution

(mad-extent (repeatedly 100000 r/grand))
;;=> [-0.6730878283010525 0.6797538719277473 0.003333021813347429]

view source

mae

(mae [vs1 vs2-or-val])(mae vs1 vs2-or-val)

Mean absolute error

view source

mape

(mape [vs1 vs2-or-val])(mape vs1 vs2-or-val)

Mean absolute percentage error

view source

maximum

(maximum vs)

Maximum value from sequence.

Examples

Maximum value

(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0

view source

mcc

(mcc group1 group2)(mcc ct)

Matthews correlation coefficient also known as phi coefficient.

view source

me

(me [vs1 vs2-or-val])(me vs1 vs2-or-val)

Mean error

view source

mean

(mean vs)

Calculate mean of vs

Examples

Mean (average value)

(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109

view source

mean-absolute-deviation

(mean-absolute-deviation vs)(mean-absolute-deviation vs center)

Calculate mean absolute deviation

view source

means-ratio

(means-ratio [group1 group2])(means-ratio group1 group2)(means-ratio group1 group2 adjusted?)

Means ratio

view source

means-ratio-corrected

(means-ratio-corrected [group1 group2])(means-ratio-corrected group1 group2)

Bias correced means ratio

view source

median

(median vs estimation-strategy)(median vs)

Calculate median of vs. See median-3.

Examples

Median (percentile 50%).

(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0

For three elements use faster median-3.

(median [7 1 4])
;;=> 4.0

view source

median-3

(median-3 a b c)

Median of three values. See median.

Examples

Median of 7 1 4

(median-3 7 1 4)
;;=> 4.0

view source

median-absolute-deviation

(median-absolute-deviation vs)(median-absolute-deviation vs center)(median-absolute-deviation vs center estimation-strategy)

Calculate MAD

Examples

MAD

(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0

view source

minimum

(minimum vs)

Minimum value from sequence.

Examples

Minimum value

(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

view source

minimum-discrimination-information-test

(minimum-discrimination-information-test contingency-table-or-xs)(minimum-discrimination-information-test contingency-table-or-xs params)

Minimum discrimination information test, a power divergence test for lambda -1.0

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

mode

(mode vs method)(mode vs method opts)(mode vs)

Find the value that appears most often in a dataset vs.

For sample from continuous distribution, three algorithms are possible: * :histogram - calculated from histogram * :kde - calculated from KDE * :pearson - mode = mean-3(median-mean) * :default - discrete mode

Histogram accepts optional :bins (see histogram). KDE method accepts :kde for kernel name (default :gaussian) and :bandwidth (auto). Pearson can accept :estimation-strategy for median.

Examples

Example

(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

Returns lowest value when every element appears equally.

(mode [5 1 2 3 4])
;;=> 1.0

view source

modes

(modes vs method)(modes vs method opts)(modes vs)

Find the values that appears most often in a dataset vs.

Returns sequence with all most appearing values in increasing order.

Examples

Example

(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)

Returns lowest value when every element appears equally.

(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)

view source

moment

(moment vs)(moment vs order)(moment vs order {:keys [absolute? center mean? normalize?], :or {mean? true}})

Calculate moment (central or/and absolute) of given order (default: 2).

Additional parameters as a map:

:absolute? - calculate sum as absolute values (default: false)
:mean? - returns mean (proper moment) or just sum of differences (default: true)
:center - value of center (default: nil = mean)
:normalize? - apply normalization by standard deviation to the order power

Examples

Usage

(moment [3 7 5 9 -8])
;;=> 35.36
(moment [3 7 5 9 -8] 1.0)
;;=> 0.0
(moment [3 7 5 9 -8] 4.0)
;;=> 3417.171199999999
(moment [3 7 5 9 -8] 3.0)
;;=> -229.82399999999993
(moment [3 7 5 9 -8] 3.0 {:center 0.0})
;;=> 142.4
(moment [3 7 5 9 -8] 3.0 {:mean? false})
;;=> -1149.1199999999997
(moment [3 7 5 9 -8] 3.0 {:absolute? true})
;;=> 332.15039999999993
(moment [3 7 5 9 -8] 3.0 {:center -3.0})
;;=> 666.2
(moment [3 7 5 9 -8] 0.5 {:absolute? true})
;;=> 1.8986344545712772

view source

mse

(mse [vs1 vs2-or-val])(mse vs1 vs2-or-val)

Mean squared error

view source

multinomial-likelihood-ratio-test

(multinomial-likelihood-ratio-test contingency-table-or-xs)(multinomial-likelihood-ratio-test contingency-table-or-xs params)

Multinomial likelihood ratio test, a power divergence test for lambda 0.0

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

neyman-modified-chisq-test

(neyman-modified-chisq-test contingency-table-or-xs)(neyman-modified-chisq-test contingency-table-or-xs params)

Neyman modifield chi square test, a power divergence test for lambda -2.0

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

omega-sq

(omega-sq [group1 group2])(omega-sq group1 group2)

Adjusted R2

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]]
  (omega-sq t c))
;;=> -0.04795737122557726

view source

one-way-anova-test

(one-way-anova-test xss)(one-way-anova-test xss {:keys [sides], :or {sides :one-sided-greater}})

view source

outer-fence-extent

(outer-fence-extent vs)(outer-fence-extent vs estimation-strategy)

Returns LOF, UOF and median

view source

outliers

(outliers vs)(outliers vs estimation-strategy)(outliers vs q1 q3)

Find outliers defined as values outside inner fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

LIF (Lower Inner Fence) equals (- Q1 (* 1.5 IQR)).
UIF (Upper Inner Fence) equals (+ Q3 (* 1.5 IQR)).

Returns sequence.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Outliers

(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)

Gaussian distribution outliers

(count (outliers (repeatedly 3000000 r/grand)))
;;=> 20801

view source

p-overlap

(p-overlap [group1 group2])(p-overlap group1 group2)(p-overlap group1 group2 {:keys [kde bandwidth min-iterations steps], :or {kde :gaussian, min-iterations 3, steps 500}})

Overlapping index, kernel density approximation

view source

p-value

(p-value stat)(p-value distribution stat)(p-value distribution stat sides)

Calculate p-value for given distribution (default: N(0,1)), stat and sides (one of :two-sided, :one-sided-greater or :one-sided-less/:one-sided).

view source

pacf

(pacf data)(pacf data lags)

Caluclate pacf (partial autocorrelation function) for given number of lags.

If lags is omitted function returns maximum possible number of lags.

pacf returns also lag 0 (which is 0.0).

Examples

Usage

(pacf (repeatedly 1000 r/grand) 10)
;;=> (0.0
;;=>  -0.007493264913704033
;;=>  0.03950940288978749
;;=>  -0.027119682527161276
;;=>  -8.127647576342593E-4
;;=>  0.023184353076001057
;;=>  -0.07492493089581809
;;=>  -0.00415801721495847
;;=>  -0.029857518595778
;;=>  -0.025366182150073028
;;=>  0.02293649431732736)
(pacf [1 2 3 4 5 4 3 2 1])
;;=> (0.0
;;=>  0.5396825396825397
;;=>  -0.4299857803057234
;;=>  -0.388084834596935
;;=>  -0.2792571208141194
;;=>  0.17585056996358742
;;=>  -0.2652225487589841
;;=>  -0.17978918763554708
;;=>  -0.10771973872263883)

view source

pacf-ci

(pacf-ci data)(pacf-ci data lags)(pacf-ci data lags alpha)

pacf with added confidence interval data.

Examples

Usage

(pacf-ci (repeatedly 1000 r/grand) 3)
;;=> {:ci 0.06197950323045615,
;;=>  :pacf
;;=>  (0.0 -0.04930794881410651 -0.005276481462760803 -0.03643226574541986)}
(pacf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:ci 0.653321328180018,
;;=>  :pacf (0.0 0.5396825396825397 -0.4299857803057234 -0.388084834596935)}

view source

pearson-correlation

(pearson-correlation [vs1 vs2])(pearson-correlation vs1 vs2)

Pearson’s correlation of two sequences.

Examples

Pearson’s correlation of uniform and gaussian distribution samples.

(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.002465297102056219

view source

pearson-r

(pearson-r [group1 group2])(pearson-r group1 group2)

Pearson r correlation coefficient

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]]
  (pearson-r t c))
;;=> 0.2519763153394848

view source

percentile

(percentile vs p)(percentile vs p estimation-strategy)

Calculate percentile of a vs.

Percentile p is from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

Examples

Percentile 25%

(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0

Percentile 50% (median)

(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0

Percentile 75%

(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0

Percentile 90%

(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0

Various estimation strategies.

(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007

view source

percentile-bc-extent

(percentile-bc-extent vs)(percentile-bc-extent vs p)(percentile-bc-extent vs p1 p2)(percentile-bc-extent vs p1 p2 estimation-strategy)

Return bias corrected percentile range and mean for bootstrap samples. See https://projecteuclid.org/euclid.ss/1032280214

p - calculates extent of bias corrected p and 100-p (default: p=2.5)

Set estimation-strategy to :r7 to get the same result as in R coxed::bca.

Examples

for samples from gaussian distribution

(percentile-bc-extent (repeatedly 100000 r/grand))
;;=> [-1.9686998976773415 1.9488455054357283 -0.001753000836894031]
(percentile-bc-extent (repeatedly 100000 r/grand) 10)
;;=> [-1.2644568505959903 1.2992608668913583 0.006974183450094526]
(percentile-bc-extent (repeatedly 100000 r/grand) 30 70)
;;=> [-0.5390348516759447 0.5110256349062376 7.918286580104579E-4]

view source

percentile-bca-extent

(percentile-bca-extent vs)(percentile-bca-extent vs p)(percentile-bca-extent vs p1 p2)(percentile-bca-extent vs p1 p2 estimation-strategy)(percentile-bca-extent vs p1 p2 accel estimation-strategy)

Return bias corrected percentile range and mean for bootstrap samples. Also accounts for variance variations throught the accelaration parameter. See https://projecteuclid.org/euclid.ss/1032280214

p - calculates extent of bias corrected p and 100-p (default: p=2.5)

Set estimation-strategy to :r7 to get the same result as in R coxed::bca.

Examples

for samples from gaussian distribution

(percentile-bca-extent (repeatedly 100000 r/grand))
;;=> [-1.9611115428631776 1.9395600880977821 -0.0031832554346387315]
(percentile-bca-extent (repeatedly 100000 r/grand) 10)
;;=> [-1.2666928768037038 1.2838206326649084 0.0011417437187627867]
(percentile-bca-extent (repeatedly 100000 r/grand) 30 70)
;;=> [-0.5316035303243614 0.5241767534807082 0.001668106379069707]

view source

percentile-extent

(percentile-extent vs)(percentile-extent vs p)(percentile-extent vs p1 p2)(percentile-extent vs p1 p2 estimation-strategy)

Return percentile range and median.

p - calculates extent of p and 100-p (default: p=25)

Examples

for samples from gaussian distribution

(percentile-extent (repeatedly 100000 r/grand))
;;=> [-0.6811300265122471 0.67725176529625 0.002258912209196587]
(percentile-extent (repeatedly 100000 r/grand) 10)
;;=> [-1.2775454067052694 1.2710171173948166 7.651774916386039E-4]
(percentile-extent (repeatedly 100000 r/grand) 30 70)
;;=> [-0.5247561462693245 0.5243627229342358 -0.001450827530566729]

view source

percentiles

(percentiles vs)(percentiles vs ps)(percentiles vs ps estimation-strategy)

Calculate percentiles of a vs.

Percentiles are sequence of values from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

Examples

Usage

(percentiles [1 2 3 -1 -1 2 -1 11 111] [25 50 75 90])
;;=> [-1.0 2.0 7.0 111.0]

view source

pi

(pi vs)(pi vs size)(pi vs size estimation-strategy)

Returns PI as a map, quantile intervals based on interval size.

Quantiles are (1-size)/2 and 1-(1-size)/2

view source

pi-extent

(pi-extent vs)(pi-extent vs size)(pi-extent vs size estimation-strategy)

Returns PI extent, quantile intervals based on interval size + median.

Quantiles are (1-size)/2 and 1-(1-size)/2

view source

pooled-stddev

(pooled-stddev groups)(pooled-stddev groups method)

Calculate pooled standard deviation for samples and method

view source

pooled-variance

(pooled-variance groups)(pooled-variance groups method)

Calculate pooled variance for samples and method.

Methods: * :unbiased - sqrt of weighted average of variances (default) * :biased - biased version of :unbiased * :avg - sqrt of average of variances

view source

population-stddev

(population-stddev vs)(population-stddev vs u)

Calculate population standard deviation of vs.

See stddev.

Examples

Population standard deviation.

(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403

view source

population-variance

(population-variance vs)(population-variance vs u)

Calculate population variance of vs.

See variance.

Examples

Population variance

(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543

view source

power-divergence-test

(power-divergence-test contingency-table-or-xs)

(power-divergence-test contingency-table-or-xs {:keys [lambda ci-sides sides p alpha bootstrap-samples ddof bins], :or {lambda m/TWO_THIRD, sides :one-sided-greater, ci-sides :two-sided, alpha 0.05, bootstrap-samples 1000, ddof 0}})

Power divergence test.

First argument should be one of:

contingency table
sequence of counts (for goodness of fit)
sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

comparison of observed counts vs expected probabilities or weights (:p)
comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

:lambda - test type:
- 1.0 - chisq-test
- 0.0 - multinomial-likelihood-ratio-test
- -1.0 - minimum-discrimination-information-test
- -2.0 - neyman-modified-chisq-test
- -0.5 - freeman-tukey-test
- 2/3 - cressie-read-test - default
:p - probabilites, weights or distribution object.
:alpha - significance level (default: 0.05)
:ci-sides - confidence interval sides (default: :two-sided)
:sides - p-value sides (:two-sided, :one-side-greater - default, :one-side-less)
:bootstrap-samples - number of samples to estimate confidence intervals (default: 1000)
:ddof - delta degrees of freedom, adjustment for dof (default: 0.0)
:bins - number of bins or estimator name for histogram

view source

powmean

(powmean vs power)

Generalized power mean

Examples

Power mean

(powmean [1 2 3 1 1 2 1 11 111] 0.0)
;;=> 2.903203203730772
(powmean [1 2 3 1 1 2 1 11 111] 0.1)
;;=> 3.2703950036489737
(powmean [1 2 3 1 1 2 1 11 111] 0.5)
;;=> 6.201625343593919
(powmean [1 2 3 1 1 2 1 11 111] 1.0)
;;=> 14.777777777777782
(powmean [1 2 3 1 1 2 1 11 111] 2.0)
;;=> 37.21260240533814
(powmean [1 2 3 1 1 2 1 11 111] 3.0)
;;=> 53.381150691705734
(powmean [1 2 3 1 1 2 1 11 111] 5.5)
;;=> 74.44312203513597

view source

psnr

(psnr [vs1 vs2-or-val])(psnr vs1 vs2-or-val)(psnr vs1 vs2-or-val max-value)

Peak signal to noise, max-value is maximum possible value (default: max from vs1 and vs2)

view source

quantile

(quantile vs q)(quantile vs q estimation-strategy)

Calculate quantile of a vs.

Quantile q is from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

Examples

Quantile 0.25

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0

Quantile 0.5 (median)

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0

Quantile 0.75

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0

Quantile 0.9

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0

Various estimation strategies.

(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966

view source

quantile-extent

(quantile-extent vs)(quantile-extent vs q)(quantile-extent vs q1 q2)(quantile-extent vs q1 q2 estimation-strategy)

Return quantile range and median.

q - calculates extent of q and 1.0-q (default: q=0.25)

view source

quantiles

(quantiles vs)(quantiles vs qs)(quantiles vs qs estimation-strategy)

Calculate quantiles of a vs.

Quantilizes is sequence with values from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

Examples

Usage

(quantiles [1 2 3 -1 -1 2 -1 11 111] [0.25 0.5 0.75 0.9])
;;=> [-1.0 2.0 7.0 111.0]

view source

r2

(r2 [vs1 vs2-or-val])(r2 vs1 vs2-or-val)

view source

r2-determination

(r2-determination [group1 group2])(r2-determination group1 group2)

Coefficient of determination

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [-50 20 30 40 40 50 10 20 30 10]]
  (r2-determination t c))
;;=> 0.06349206349206347

view source

rank-epsilon-sq

(rank-epsilon-sq xs)

Effect size for Kruskal-Wallis test

view source

rank-eta-sq

(rank-eta-sq xs)

Effect size for Kruskal-Wallis test

view source

rescale

(rescale vs)(rescale vs low high)

Lineary rascale data to desired range, 0,1 by default

view source

rmse

(rmse [vs1 vs2-or-val])(rmse vs1 vs2-or-val)

Root mean squared error

view source

robust-standardize

(robust-standardize vs)(robust-standardize vs q)

Normalize samples to have median = 0 and MAD = 1.

If q argument is used, scaling is done by quantile difference (Q_q, Q_(1-q)). Set 0.25 for IQR.

view source

rows->contingency-table

(rows->contingency-table xss)

view source

rss

(rss [vs1 vs2-or-val])(rss vs1 vs2-or-val)

Residual sum of squares

view source

second-moment

deprecated in Use [[moment]] function

view source

sem

(sem vs)

Standard error of mean

Examples

SEM

(sem [1 2 3 -1 -1 2 -1 11 111])
;;=> 12.174021115615695

view source

sem-extent

(sem-extent vs)

-/+ sem and mean

Examples

standard error of mean and mean for gaussian distribution

(sem-extent (repeatedly 100000 r/grand))
;;=> [6.802850148960229E-4 0.007013149643892707 0.003846717329394365]

view source

similarity

(similarity method P-observed Q-expected)

(similarity method P-observed Q-expected {:keys [bins probabilities? epsilon], :or {probabilities? true, epsilon 1.0E-6}})

Various PDF similarities between two histograms (frequencies) or probabilities.

Q can be a distribution object. Then, histogram will be created out of P.

Arguments:

method - distance method
P-observed - frequencies, probabilities or actual data (when Q is a distribution)
Q-expected - frequencies, probabilities or distribution object (when P is a data)

Options:

:probabilities? - should P/Q be converted to a probabilities, default: true.
:epsilon - small number which replaces 0.0 when division or logarithm is used`
:bins - number of bins or bins estimation method, see histogram.

The list of methods: :intersection, :czekanowski, :motyka, :kulczynski, :ruzicka, :inner-product, :harmonic-mean, :cosine, :jaccard, :dice, :fidelity, :squared-chord

See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha

view source

skewness

(skewness vs)(skewness vs typ)

Calculate skewness from sequence.

Possible types: :G1 (default), :g1 (:pearson), :b1, :B1 (:yule), :B3, :skew, :mode or :median.

Examples

Skewness

(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954
(skewness [1 2 3 -1 -1 2 -1 11 111] :G1)
;;=> 2.94268445417954
(skewness [1 2 3 -1 -1 2 -1 11 111] :g1)
;;=> 2.4275908211830184
(skewness [1 2 3 -1 -1 2 -1 11 111] :pearson)
;;=> 2.4275908211830184
(skewness [1 2 3 -1 -1 2 -1 11 111] :b1)
;;=> 2.034448511531534
(skewness [1 2 3 -1 -1 2 -1 11 111] :B1)
;;=> 0.25
(skewness [1 2 3 -1 -1 2 -1 11 111] :yule)
;;=> 0.25
(skewness [1 2 3 -1 -1 2 -1 11 111] :B3)
;;=> 0.8449612403100772
(skewness [1 2 3 -1 -1 2 -1 11 111] :mode)
;;=> 0.4137529407252298
(skewness [1 2 3 -1 -1 2 -1 11 111] :median)
;;=> 0.9948324383613981
(skewness [1 2 3 -1 -1 2 -1 11 111] :skew)
;;=> 0.8091969403943394

view source

span

(span vs)

Width of the sample, maximum value minus minimum value

view source

spearman-correlation

(spearman-correlation [vs1 vs2])(spearman-correlation vs1 vs2)

Spearman’s correlation of two sequences.

Examples

Spearsman’s correlation of uniform and gaussian distribution samples.

(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                      (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0020795592460891798

view source

standardize

(standardize vs)

Normalize samples to have mean = 0 and stddev = 1.

Examples

Standardize

(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=>  -0.33161081278713267
;;=>  -0.30423010347443363
;;=>  -0.4137529407252298
;;=>  -0.4137529407252298
;;=>  -0.33161081278713267
;;=>  -0.4137529407252298
;;=>  -0.08518442897284138
;;=>  2.652886502297062)

view source

stats-map

(stats-map vs)(stats-map vs estimation-strategy)

Calculate several statistics of vs and return as map.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Stats

(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=>  :Kurtosis 8.732515263272099,
;;=>  :LAV -1.0,
;;=>  :LIF -13.0,
;;=>  :LOF -25.0,
;;=>  :MAD 3.0,
;;=>  :Max 111.0,
;;=>  :Mean 14.11111111111111,
;;=>  :Median 2.0,
;;=>  :Min -1.0,
;;=>  :Mode -1.0,
;;=>  :Outliers (111.0),
;;=>  :Q1 -1.0,
;;=>  :Q3 7.0,
;;=>  :Range 112.0,
;;=>  :SD 36.522063346847084,
;;=>  :SEM 12.174021115615695,
;;=>  :Size 9,
;;=>  :Skewness 2.94268445417954,
;;=>  :Total 127.0,
;;=>  :UAV 11.0,
;;=>  :UIF 19.0,
;;=>  :UOF 31.0,
;;=>  :Variance 1333.8611111111113}

view source

stddev

(stddev vs)(stddev vs u)

Calculate standard deviation of vs.

See population-stddev.

Examples

Standard deviation.

(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084

view source

stddev-extent

(stddev-extent vs)

-/+ stddev and mean

Examples

standard deviation from mean and mean for gaussian distribution

(stddev-extent (repeatedly 100000 r/grand))
;;=> [-0.9996022983939234 0.9951962468655247 -0.0022030257641994337]

view source

sum

(sum vs)

Sum of all vs values.

Examples

Sum

(sum [1 2 3 -1 -1 2 -1 11 111])
;;=> 127.0

view source

t-test-one-sample

(t-test-one-sample xs)(t-test-one-sample xs m)

One sample Student’s t-test

alpha - significance level (default: 0.05)
sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
mu - mean (default: 0.0)

view source

t-test-two-samples

(t-test-two-samples xs ys)(t-test-two-samples xs ys {:keys [paired? equal-variances?], :or {paired? false, equal-variances? false}, :as params})

Two samples Student’s t-test

alpha - significance level (default: 0.05)
sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
mu - mean (default: 0.0)
paired? - unpaired or paired test, boolean (default: false)
equal-variances? - unequal or equal variances, boolean (default: false)

view source

trim

(trim vs)(trim vs quantile)(trim vs quantile estimation-strategy)(trim vs low high nan)

Return trimmed data. Trim is done by using quantiles, by default is set to 0.2.

view source

tschuprows-t

(tschuprows-t group1 group2)(tschuprows-t contingency-table)

Tschuprows T effect size for discrete data

Examples

Usage

(let [a [:a :a :b :b :f :a :a :b :b :c :a :a :b :b :c :a :a :b :b :c]
      b [:b :f :a :a :b :b :y :z :c :b :b :c :a :a :b :b :c :a :a :b]]
  (tschuprows-t a b))
;;=> 0.5288813325243744

view source

ttest-one-sample

deprecated in Use [[t-test-one-sample]]

Examples

Usage

(ttest-one-sample [1 2 3 4 5 6 7 8 9 10])
;;=> {:alpha 0.05,
;;=>  :confidence-interval [3.3341494103317983 7.665850589668201],
;;=>  :df 9,
;;=>  :estimate 5.5,
;;=>  :level 0.95,
;;=>  :mu 0.0,
;;=>  :n 10,
;;=>  :p-value 2.7819601104828173E-4,
;;=>  :stat 5.744562646538029,
;;=>  :stderr 0.9574271077563381,
;;=>  :t 5.744562646538029,
;;=>  :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:alpha 0.2})
;;=> {:alpha 0.2,
;;=>  :confidence-interval [4.175850795053416 6.824149204946584],
;;=>  :df 9,
;;=>  :estimate 5.5,
;;=>  :level 0.8,
;;=>  :mu 0.0,
;;=>  :n 10,
;;=>  :p-value 2.7819601104828173E-4,
;;=>  :stat 5.744562646538029,
;;=>  :stderr 0.9574271077563381,
;;=>  :t 5.744562646538029,
;;=>  :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:sides :one-sided})
;;=> {:alpha 0.05,
;;=>  :confidence-interval [##-Inf 7.255072013309326],
;;=>  :df 9,
;;=>  :estimate 5.5,
;;=>  :level 0.95,
;;=>  :mu 0.0,
;;=>  :n 10,
;;=>  :p-value 0.9998609019944759,
;;=>  :stat 5.744562646538029,
;;=>  :stderr 0.9574271077563381,
;;=>  :t 5.744562646538029,
;;=>  :test-type :one-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:mu 5.0})
;;=> {:alpha 0.05,
;;=>  :confidence-interval [3.334149410331798 7.665850589668201],
;;=>  :df 9,
;;=>  :estimate 5.5,
;;=>  :level 0.95,
;;=>  :mu 5.0,
;;=>  :n 10,
;;=>  :p-value 0.6141172548083933,
;;=>  :stat 0.5222329678670935,
;;=>  :stderr 0.9574271077563381,
;;=>  :t 0.5222329678670935,
;;=>  :test-type :two-sided}

view source

ttest-two-samples

deprecated in Use [[t-test-two-samples]]

Examples

Usage

(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [7 8 9 10 11 12 13 14 15 16 17 18 19 20])
;;=> {:alpha 0.05,
;;=>  :confidence-interval [-11.052801725158163 -4.9471982748418375],
;;=>  :df 21.982212340188994,
;;=>  :equal-variances? false,
;;=>  :estimate -8.0,
;;=>  :estimated-mu [5.5 13.5],
;;=>  :level 0.95,
;;=>  :mu 0.0,
;;=>  :n [10 14],
;;=>  :nx 10,
;;=>  :ny 14,
;;=>  :p-value 1.8552818325118146E-5,
;;=>  :paired? false,
;;=>  :sides :two-sided,
;;=>  :stat -5.4349297638940595,
;;=>  :stderr 1.4719601443879746,
;;=>  :t -5.4349297638940595,
;;=>  :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [7 8 9 10 11 12 13 14 15 16 17 18 19 20 200])
;;=> {:alpha 0.05,
;;=>  :confidence-interval [-47.242899887102105 6.376233220435439],
;;=>  :df 14.164598953012467,
;;=>  :equal-variances? false,
;;=>  :estimate -20.43333333333333,
;;=>  :estimated-mu [5.5 25.93333333333333],
;;=>  :level 0.95,
;;=>  :mu 0.0,
;;=>  :n [10 15],
;;=>  :nx 10,
;;=>  :ny 15,
;;=>  :p-value 0.12451349808974498,
;;=>  :paired? false,
;;=>  :sides :two-sided,
;;=>  :stat -1.632902633201205,
;;=>  :stderr 12.51350381698818,
;;=>  :t -1.632902633201205,
;;=>  :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [7 8 9 10 11 12 13 14 15 16 17 18 19 20]
                   {:equal-variances? true})
;;=> {:alpha 0.05,
;;=>  :confidence-interval [-11.22324472988163 -4.77675527011837],
;;=>  :df 22.0,
;;=>  :equal-variances? true,
;;=>  :estimate -8.0,
;;=>  :estimated-mu [5.5 13.5],
;;=>  :level 0.95,
;;=>  :mu 0.0,
;;=>  :n [10 14],
;;=>  :nx 10,
;;=>  :ny 14,
;;=>  :p-value 3.690577215911943E-5,
;;=>  :paired? false,
;;=>  :sides :two-sided,
;;=>  :stat -5.147292847304685,
;;=>  :stderr 1.5542150480497916,
;;=>  :t -5.147292847304685,
;;=>  :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [200 11 200 11 200 11 200 11 200 11]
                   {:paired? true})
;;=> {:alpha 0.05,
;;=>  :confidence-interval [-171.66671936335894 -28.333280636641092],
;;=>  :df 9,
;;=>  :estimate -100.0,
;;=>  :level 0.95,
;;=>  :mu 0.0,
;;=>  :n 10,
;;=>  :p-value 0.011615504295919215,
;;=>  :paired? true,
;;=>  :stat -3.156496045715208,
;;=>  :stderr 31.680698645494967,
;;=>  :t -3.156496045715208,
;;=>  :test-type :two-sided}

view source

variance

(variance vs)(variance vs u)

Calculate variance of vs.

See population-variance.

Examples

Variance.

(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111

view source

variation

(variation vs)

Coefficient of variation CV = stddev / mean

view source

weighted-kappa

(weighted-kappa contingency-table)(weighted-kappa contingency-table weights)

Cohen’s weighted kappa for indexed contingency table

view source

winsor

(winsor vs)(winsor vs quantile)(winsor vs quantile estimation-strategy)(winsor vs low high nan)

Return winsorized data. Trim is done by using quantiles, by default is set to 0.2.

view source

wmean

(wmean vs)(wmean vs weights)

Weighted mean

view source

wmedian

(wmedian vs ws)(wmedian vs ws method)

Weighted median.

Calculation is done using interpolation. There are three methods: * :linear - linear interpolation, default * :step - step interpolation * :average - average of ties

Based on spatstat.geom::weighted.quantile from R.

view source

wmw-odds

(wmw-odds [group1 group2])(wmw-odds group1 group2)

Wilcoxon-Mann-Whitney odds

view source

wquantile

(wquantile vs ws q)(wquantile vs ws q method)

Weighted quantile.

Calculation is done using interpolation. There are three methods: * :linear - linear interpolation, default * :step - step interpolation * :average - average of ties

Based on spatstat.geom::weighted.quantile from R.

view source

wquantiles

(wquantiles vs ws)(wquantiles vs ws qs)(wquantiles vs ws qs method)

Weighted quantiles.

Calculation is done using interpolation. There are three methods: * :linear - linear interpolation, default * :step - step interpolation * :average - average of ties

Based on spatstat.geom::weighted.quantile from R.

view source

z-test-one-sample

(z-test-one-sample xs)(z-test-one-sample xs m)

One sample z-test

alpha - significance level (default: 0.05)
sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
mu - mean (default: 0.0)

view source

z-test-two-samples

(z-test-two-samples xs ys)(z-test-two-samples xs ys {:keys [paired? equal-variances?], :or {paired? false, equal-variances? false}, :as params})

Two samples z-test

alpha - significance level (default: 0.05)
sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
mu - mean (default: 0.0)
paired? - unpaired or paired test, boolean (default: false)
equal-variances? - unequal or equal variances, boolean (default: false)

view source

Generated by Codox with RDash UI theme

Fastmath 2.4.0

Project

Namespaces

Public Vars

fastmath.stats

Statistics functions.

Descriptive statistics

Categories

->confusion-matrix

acf

Examples

acf-ci

Examples

ad-test-one-sample

adjacent-values

Examples

ameasure

Examples

binary-measures

Examples

binary-measures-all

Examples

binomial-ci

binomial-ci-methods

binomial-test

bootstrap

deprecated in Please use fastmath.stats.bootstrap/bootstrap instead

Examples

bootstrap-ci

deprecated in Please use fastmath.stats.boostrap/ci-basic instead

Examples

brown-forsythe-test

chisq-test

ci

Examples

cliffs-delta

Examples

coefficient-matrix

cohens-d

Examples

cohens-d-corrected

Examples

cohens-f

cohens-f2

Examples

cohens-kappa

cohens-q

Examples

cohens-u2

cohens-u3

cohens-w

Examples

contingency-2x2-measures

contingency-2x2-measures-all

contingency-table

contingency-table->marginals

correlation

Examples

correlation-matrix

count=

covariance

Examples

covariance-matrix

Examples

cramers-c

cramers-v

Examples

cramers-v-corrected

Examples

cressie-read-test

demean

Examples

dissimilarity

epsilon-sq

Examples

estimate-bins

Examples

estimation-strategies-list

Examples