fastmath.stats

Statistics functions.

  • Descriptive statistics.
  • Correlation / covariance
  • Outliers
  • Confidence intervals
  • Extents
  • Effect size
  • Student’s t-test
  • Histogram
  • ACF/PACF
  • Bootstrap
  • Binary measures

All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

Descriptive statistics

All in one function stats-map contains:

  • :Size - size of the samples, (count ...)
  • :Min - minimum value
  • :Max - maximum value
  • :Range - range of values
  • :Mean - mean/average
  • :Median - median, see also: median-3
  • :Mode - mode, see also: modes
  • :Q1 - first quartile, use: percentile, quartile
  • :Q3 - third quartile, use: percentile, quartile
  • :Total - sum of all samples
  • :SD - sample standard deviation
  • :Variance - variance
  • :MAD - median-absolute-deviation
  • :SEM - standard error of mean
  • :LAV - lower adjacent value, use: adjacent-values
  • :UAV - upper adjacent value, use: adjacent-values
  • :IQR - interquartile range, (- q3 q1)
  • :LOF - lower outer fence, (- q1 (* 3.0 iqr))
  • :UOF - upper outer fence, (+ q3 (* 3.0 iqr))
  • :LIF - lower inner fence, (- q1 (* 1.5 iqr))
  • :UIF - upper inner fence, (+ q3 (* 1.5 iqr))
  • :Outliers - list of outliers, samples which are outside outer fences
  • :Kurtosis - kurtosis
  • :Skewness - skewness

Note: percentile and quartile can have 10 different interpolation strategies. See docs

acf

(acf data)(acf data lags)

Calculate acf (autocorrelation function) for given number of lags or a list of lags.

If lags is omitted function returns maximum possible number of lags.

See also acf-ci, pacf, pacf-ci

Examples

Usage

(acf (repeatedly 1000 r/grand) 5)
;;=> (1.0
;;=>  0.0056672021105804715
;;=>  0.02683192836034792
;;=>  0.003505061419148288
;;=>  0.017117838382944242
;;=>  -0.014709355084377094)
(acf (repeatedly 1000 r/grand) [10 20 100 500])
;;=> (0.03608425929253231
;;=>  -0.04862331077397911
;;=>  0.0026191550786753507
;;=>  0.006099538382009882)
(acf [1 2 3 4 5 4 3 2 1])
;;=> (1.0
;;=>  0.5396825396825397
;;=>  -0.013492063492063475
;;=>  -0.4666666666666665
;;=>  -0.6269841269841269
;;=>  -0.3015873015873015
;;=>  -0.011904761904761935
;;=>  0.17777777777777773
;;=>  0.20317460317460315)

acf-ci

(acf-ci data lags)(acf-ci data lags alpha)

acf with added confidence interval data.

:cis contains list of calculated ci for every lag.

Examples

Usage

(acf-ci (repeatedly 1000 r/grand) 3)
;;=> {:acf
;;=>  (1.0 0.001652674595128032 0.030605610528521364 -0.006874164003129496),
;;=>  :ci 0.06197950323045615,
;;=>  :cis (0.06197950323045615
;;=>        0.06197967251690712
;;=>        0.062037701604327464
;;=>        0.06204062757534195)}
(acf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:acf
;;=>  (1.0 0.5396825396825397 -0.013492063492063475 -0.4666666666666665),
;;=>  :ci 0.653321328180018,
;;=>  :cis (0.653321328180018
;;=>        0.8218653739461048
;;=>        0.8219599072345126
;;=>        0.9281841012727746)}

adjacent-values

(adjacent-values vs)(adjacent-values vs estimation-strategy)(adjacent-values vs q1 q3)

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LAV is smallest value which is greater or equal to the LIF = (- Q1 (* 1.5 IQR)).
  • UAV is largest value which is lower or equal to the UIF = (+ Q3 (* 1.5 IQR)).
  • third value is a median of samples

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

[LAV, UAV]

(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0 2.0]

Gaussian distribution [LAV, UAV]

(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.698318697864716 2.6996062842439654 0.0016124036257504448]

ameasure

(ameasure group1 group2)

Vargha-Delaney A measure for two populations a and b

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (ameasure t c))
;;=> 0.20833333333333334

binary-measures

(binary-measures truth prediction)(binary-measures truth prediction true-value)

Subset of binary measures. See binary-measures-all.

Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]

Examples

Usage

(binary-measures [true false true false true false true false]
                 [true false false true false false false true])
;;=> {:accuracy 0.375,
;;=>  :f-measure 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fn 3,
;;=>  :fp 2,
;;=>  :precision 0.3333333333333333,
;;=>  :recall 0.25,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tp 1}

Treat 1 as true value.

(binary-measures [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=>  :f-measure 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fn 3,
;;=>  :fp 2,
;;=>  :precision 0.3333333333333333,
;;=>  :recall 0.25,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tp 1}

Treat :a and :b as true value.

(binary-measures [:a :b :c :d :e :f :a :b]
                 [:a :b :a :b :a :f :d :b]
                 {:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=>  :f-measure 0.6,
;;=>  :fall-out 0.75,
;;=>  :fdr 0.5,
;;=>  :fn 1,
;;=>  :fp 3,
;;=>  :precision 0.5,
;;=>  :recall 0.75,
;;=>  :sensitivity 0.75,
;;=>  :specificity 0.25,
;;=>  :tn 1,
;;=>  :tp 3}

binary-measures-all

(binary-measures-all truth prediction)(binary-measures-all truth prediction true-value)

Collection of binary measures.

  • truth - list of ground truth values
  • prediction - list of predicted values
  • true-value - optional, what is true in truth and prediction

true-value can be one of:

  • nil - values are treating as booleans
  • any sequence - values from sequence will be treated as true
  • map - conversion will be done according to provided map (if there is no correspondin key, value is treated as false)

https://en.wikipedia.org/wiki/Precision_and_recall

Examples

Usage

(binary-measures-all [true false true false true false true false]
                     [true false false true false false false true])
;;=> {:accuracy 0.375,
;;=>  :bm -0.25,
;;=>  :cn 4.0,
;;=>  :cp 4.0,
;;=>  :dor 0.3333333333333333,
;;=>  :f-beta #,
;;=>  :f-measure 0.28571428571428575,
;;=>  :f1-score 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fn 3,
;;=>  :fnr 0.75,
;;=>  :for 0.6,
;;=>  :fp 2,
;;=>  :fpr 0.5,
;;=>  :hit-rate 0.25,
;;=>  :lr+ 0.5,
;;=>  :lr- 1.5,
;;=>  :mcc -0.2581988897471611,
;;=>  :miss-rate 0.75,
;;=>  :mk -0.2666666666666666,
;;=>  :npv 0.4,
;;=>  :pcn 5.0,
;;=>  :pcp 3.0,
;;=>  :ppv 0.3333333333333333,
;;=>  :precision 0.3333333333333333,
;;=>  :prevalence 0.5,
;;=>  :recall 0.25,
;;=>  :selectivity 0.5,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tnr 0.5,
;;=>  :total 8.0,
;;=>  :tp 1,
;;=>  :tpr 0.25}

Treat 1 as true value.

(binary-measures-all [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=>  :bm -0.25,
;;=>  :cn 4.0,
;;=>  :cp 4.0,
;;=>  :dor 0.3333333333333333,
;;=>  :f-beta #,
;;=>  :f-measure 0.28571428571428575,
;;=>  :f1-score 0.28571428571428575,
;;=>  :fall-out 0.5,
;;=>  :fdr 0.6666666666666667,
;;=>  :fn 3,
;;=>  :fnr 0.75,
;;=>  :for 0.6,
;;=>  :fp 2,
;;=>  :fpr 0.5,
;;=>  :hit-rate 0.25,
;;=>  :lr+ 0.5,
;;=>  :lr- 1.5,
;;=>  :mcc -0.2581988897471611,
;;=>  :miss-rate 0.75,
;;=>  :mk -0.2666666666666666,
;;=>  :npv 0.4,
;;=>  :pcn 5.0,
;;=>  :pcp 3.0,
;;=>  :ppv 0.3333333333333333,
;;=>  :precision 0.3333333333333333,
;;=>  :prevalence 0.5,
;;=>  :recall 0.25,
;;=>  :selectivity 0.5,
;;=>  :sensitivity 0.25,
;;=>  :specificity 0.5,
;;=>  :tn 2,
;;=>  :tnr 0.5,
;;=>  :total 8.0,
;;=>  :tp 1,
;;=>  :tpr 0.25}

Treat :a and :b as true value.

(binary-measures-all [:a :b :c :d :e :f :a :b]
                     [:a :b :a :b :a :f :d :b]
                     {:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=>  :bm 0.0,
;;=>  :cn 4.0,
;;=>  :cp 4.0,
;;=>  :dor 1.0,
;;=>  :f-beta #,
;;=>  :f-measure 0.6,
;;=>  :f1-score 0.6,
;;=>  :fall-out 0.75,
;;=>  :fdr 0.5,
;;=>  :fn 1,
;;=>  :fnr 0.25,
;;=>  :for 0.5,
;;=>  :fp 3,
;;=>  :fpr 0.75,
;;=>  :hit-rate 0.75,
;;=>  :lr+ 1.0,
;;=>  :lr- 1.0,
;;=>  :mcc 0.0,
;;=>  :miss-rate 0.25,
;;=>  :mk 0.0,
;;=>  :npv 0.5,
;;=>  :pcn 2.0,
;;=>  :pcp 6.0,
;;=>  :ppv 0.5,
;;=>  :precision 0.5,
;;=>  :prevalence 0.5,
;;=>  :recall 0.75,
;;=>  :selectivity 0.25,
;;=>  :sensitivity 0.75,
;;=>  :specificity 0.25,
;;=>  :tn 1,
;;=>  :tnr 0.25,
;;=>  :total 8.0,
;;=>  :tp 3,
;;=>  :tpr 0.75}

F-beta is a function. When beta is equal 1.0, you get f1-score.

(let [fbeta (:f-beta (binary-measures-all
                      [true false true false true false true false]
                      [true false false true false false false true]))]
  [(fbeta 1.0) (fbeta 2.0) (fbeta 0.5)])
;;=> [0.28571428571428575 0.7142857142857144 0.1785714285714286]

bootstrap

(bootstrap vs)(bootstrap vs samples)(bootstrap vs samples size)

Generate set of samples of given size from provided data.

Default samples is 50, number of size defaults to 1000

Examples

Usage

(bootstrap [1 2 3 4 1 2 3 1 2 1] 2 20)
;;=> ((1.0
;;=>   1.0
;;=>   2.0
;;=>   1.0
;;=>   4.0
;;=>   3.0
;;=>   2.0
;;=>   2.0
;;=>   1.0
;;=>   4.0
;;=>   2.0
;;=>   1.0
;;=>   4.0
;;=>   2.0
;;=>   1.0
;;=>   3.0
;;=>   1.0
;;=>   3.0
;;=>   3.0
;;=>   2.0)
;;=>  (1.0
;;=>   4.0
;;=>   1.0
;;=>   2.0
;;=>   1.0
;;=>   3.0
;;=>   2.0
;;=>   1.0
;;=>   1.0
;;=>   1.0
;;=>   1.0
;;=>   2.0
;;=>   2.0
;;=>   1.0
;;=>   2.0
;;=>   3.0
;;=>   1.0
;;=>   1.0
;;=>   2.0
;;=>   3.0))
(let [data [1 2 3 4 1 2 3 1 2 1]
      fdata (frequencies data)
      bdata (bootstrap data 5 1000)]
  {:source fdata, :bootstrapped (map frequencies bdata)})
;;=> {:bootstrapped ({1.0 413, 2.0 294, 3.0 204, 4.0 89}
;;=>                 {1.0 415, 2.0 285, 3.0 192, 4.0 108}
;;=>                 {1.0 406, 2.0 297, 3.0 189, 4.0 108}
;;=>                 {1.0 419, 2.0 300, 3.0 199, 4.0 82}
;;=>                 {1.0 401, 2.0 322, 3.0 196, 4.0 81}),
;;=>  :source {1 4, 2 3, 3 2, 4 1}}

bootstrap-ci

(bootstrap-ci vs)(bootstrap-ci vs alpha)(bootstrap-ci vs alpha samples)(bootstrap-ci vs alpha samples stat-fn)

Bootstrap method to calculate confidence interval.

Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.

Returns ci and statistical function value.

Examples

Usage

(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-5.796000000000005 17.8 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [2.5999999999999996 15.280000000000005 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8 100000)
;;=> [2.5999999999999996 15.7 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.98 1000 median)
;;=> [-5.0 2.0 1.5]

ci

(ci vs)(ci vs alpha)

T-student based confidence interval for given data. Alpha value defaults to 0.98.

Last value is mean.

Examples

Usage

(ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-10.759020390886263 28.759020390886263 9.0]
(ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [-0.6855907410547175 18.685590741054718 9.0]

cliffs-delta

(cliffs-delta group1 group2)

Cliff’s delta effect size

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (cliffs-delta t c))
;;=> -0.25

cohens-d

(cohens-d group1 group2)

Cohen’s d effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (cohens-d t c))
;;=> -0.42090943320131763

cohens-d-orig

(cohens-d-orig group1 group2)

Original version of Cohen’s d effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (cohens-d-orig t c))
;;=> -0.39372472247513574

correlation

(correlation vs1 vs2)

Correlation of two sequences.

Examples

Correlation of uniform and gaussian distribution samples.

(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
             (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.004322084824287007

covariance

(covariance vs1 vs2)

Covariance of two sequences.

Examples

Covariance of uniform and gaussian distribution samples.

(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
            (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.02860827181203021

covariance-matrix

(covariance-matrix vss)

Generate covariance matrix from seq of seqs. Row order.

Examples

Usage

(covariance-matrix [[1 2 3 4 5 11] [3 2 3 2 3 4]])
;;=> ([12.666666666666668 1.8666666666666667]
;;=>  [1.8666666666666667 0.5666666666666667])

demean

(demean vs)

Subtract mean from sequence

Examples

Usage

(demean [-5 1 1 1 1 2 2 5 11 71])
;;=> (-14.0 -8.0 -8.0 -8.0 -8.0 -7.0 -7.0 -4.0 2.0 62.0)

estimate-bins

(estimate-bins vs)(estimate-bins vs bins-or-estimate-method)

Estimate number of bins for histogram.

Possible methods are: :sqrt :sturges :rice :doane :scott :freedman-diaconis (default).

Examples

Estimate number of bins for various methods. vs contains 1000 random samples from Log-Normal distribution.

(estimate-bins vs :sqrt)
;;=> 31
(estimate-bins vs :sturges)
;;=> 11
(estimate-bins vs :rice)
;;=> 20
(estimate-bins vs :doane)
;;=> 16
(estimate-bins vs :scott)
;;=> 34
(estimate-bins vs :freedman-diaconis)
;;=> 81

estimation-strategies-list

Examples

List of estimation strategies for percentile

(sort (keys estimation-strategies-list))
;;=> (:legacy :r1 :r2 :r3 :r4 :r5 :r6 :r7 :r8 :r9)

extent

(extent vs)

Return extent (min, max, mean) values from sequence

Examples

min/max and mean of gaussian distribution

(extent (repeatedly 100000 r/grand))
;;=> [-4.433795945695126 4.1230995512178925 -0.0030623577039036527]

glass-delta

(glass-delta group1 group2)

Glass’s delta effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (glass-delta t c))
;;=> -0.3849741916091626

hedges-g

(hedges-g group1 group2)

Hedges’s g effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (hedges-g t c))
;;=> -0.3907787841275092

hedges-g*

(hedges-g* group1 group2)

Less biased Hedges’s g effect size for two groups

Examples

Usage

(let [t [10 10 20 20 20 30 30 30 40 50]
      c [10 20 30 40 40 50]]
  (hedges-g* t c))
;;=> -0.36946357772055416

histogram

(histogram vs)(histogram vs bins-or-estimate-method)(histogram vs bins [mn mx])

Calculate histogram.

Returns map with keys:

  • :size - number of bins
  • :step - distance between bins
  • :bins - list of pairs of range lower value and number of hits
  • :min - min value
  • :max - max value
  • :samples - number of used samples

For estimation methods check estimate-bins.

Examples

3 bins from uniform distribution.

(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([3.852964808315207E-5 333]
;;=>         [0.3324743943078295 344]
;;=>         [0.6649102589675758 323]),
;;=>  :max 0.9973461236273221,
;;=>  :min 3.852964808315207E-5,
;;=>  :samples 1000,
;;=>  :size 3,
;;=>  :step 0.33243586465974634}

3 bins from uniform distribution for given range.

(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins ([0.1 1325] [0.2333333333333334 1305] [0.3666666666666668 1362]),
;;=>  :max 0.5000000000000001,
;;=>  :min 0.1,
;;=>  :samples 3992,
;;=>  :size 3,
;;=>  :step 0.1333333333333334}

5 bins from normal distribution.

(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.6207687808508426 156]
;;=>         [-2.189509070713405 2083]
;;=>         [-0.7582493605759675 5213]
;;=>         [0.6730103495614701 2348]
;;=>         [2.1042700596989077 200]),
;;=>  :max 3.535529769836345,
;;=>  :min -3.6207687808508426,
;;=>  :samples 10000,
;;=>  :size 5,
;;=>  :step 1.4312597101374376}

Estimate number of bins

(:size (histogram (repeatedly 10000 r/grand)))
;;=> 63

Estimate number of bins, Rice rule

(:size (histogram (repeatedly 10000 r/grand) :rice))
;;=> 44

iqr

(iqr vs)(iqr vs estimation-strategy)

Interquartile range.

Examples

IQR

(iqr (repeatedly 100000 r/grand))
;;=> 1.3440848153319385

jensen-shannon-divergence

(jensen-shannon-divergence vs1 vs2)

Jensen-Shannon divergence of two sequences.

Examples

Jensen-Shannon divergence

(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                           (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 569.0495492365783

kendall-correlation

(kendall-correlation vs1 vs2)

Kendall’s correlation of two sequences.

Examples

Kendall’s correlation of uniform and gaussian distribution samples.

(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.0013475046750467505

kullback-leibler-divergence

(kullback-leibler-divergence vs1 vs2)

Kullback-Leibler divergence of two sequences.

Examples

Kullback-Leibler divergence.

(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                             (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 2974.653698088623

kurtosis

(kurtosis vs)

Calculate kurtosis from sequence.

Examples

Kurtosis

(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099

mad-extent

(mad-extent vs)

-/+ median-absolute-deviation and median

Examples

median absolute deviation from median for gaussian distribution

(mad-extent (repeatedly 100000 r/grand))
;;=> [-0.6674624646061628 0.6788112117843635 0.005674373589100425]

maximum

(maximum vs)

Maximum value from sequence.

Examples

Maximum value

(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0

mean

(mean vs)

Calculate mean of vs

Examples

Mean (average value)

(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109

median

(median vs)

Calculate median of vs. See median-3.

Examples

Median (percentile 50%).

(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0

For three elements use faster median-3.

(median [7 1 4])
;;=> 4.0

median-3

(median-3 a b c)

Median of three values. See median.

Examples

Median of [7 1 4]

(median-3 7 1 4)
;;=> 4.0

median-absolute-deviation

(median-absolute-deviation vs)

Calculate MAD

Examples

MAD

(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0

minimum

(minimum vs)

Minimum value from sequence.

Examples

Minimum value

(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

mode

(mode vs)

Find the value that appears most often in a dataset vs.

See also modes.

Examples

Example

(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

Returns lowest value when every element appears equally.

(mode [5 1 2 3 4])
;;=> 1.0

modes

(modes vs)

Find the values that appears most often in a dataset vs.

Returns sequence with all most appearing values in increasing order.

See also mode.

Examples

Example

(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)

Returns lowest value when every element appears equally.

(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)

moment

(moment vs)(moment vs order)(moment vs order {:keys [absolute? center mean?], :or {absolute? false, center nil, mean? true}})

Calculate moment (central or/and absolute) of given order (default: 2).

Additional parameters as a map:

  • :absolute? - calculate sum as absolute values (default: false)
  • :mean? - returns mean (proper moment) or just sum of differences (default: true)
  • :center - value of central (default: nil = mean)

Examples

Usage

(moment [3 7 5 9 -8])
;;=> 35.36
(moment [3 7 5 9 -8] 1.0)
;;=> 0.0
(moment [3 7 5 9 -8] 4.0)
;;=> 3417.171199999999
(moment [3 7 5 9 -8] 3.0)
;;=> -229.82399999999993
(moment [3 7 5 9 -8] 3.0 {:center 0.0})
;;=> 142.4
(moment [3 7 5 9 -8] 3.0 {:mean? false})
;;=> -1149.1199999999997
(moment [3 7 5 9 -8] 3.0 {:absolute? true})
;;=> 332.15039999999993
(moment [3 7 5 9 -8] 3.0 {:center -3.0})
;;=> 666.2
(moment [3 7 5 9 -8] 0.5 {:absolute? true})
;;=> 1.8986344545712772

outliers

(outliers vs)(outliers vs estimation-strategy)(outliers vs q1 q3)

Find outliers defined as values outside outer fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LIF (Lower Outer Fence) equals (- Q1 (* 1.5 IQR)).
  • UIF (Upper Outer Fence) equals (+ Q3 (* 1.5 IQR)).

Returns sequence.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Outliers

(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)

Gaussian distribution outliers

(count (outliers (repeatedly 3000000 r/grand)))
;;=> 20845

pacf

(pacf data)(pacf data lags)

Caluclate pacf (partial autocorrelation function) for given number of lags.

If lags is omitted function returns maximum possible number of lags.

pacf returns also lag 0 (which is 0.0).

See also acf, acf-ci, pacf-ci

Examples

Usage

(pacf (repeatedly 1000 r/grand) 10)
;;=> (0.0
;;=>  -0.026736190605719863
;;=>  0.010877194279486118
;;=>  0.032203194789526435
;;=>  -0.035726254389457764
;;=>  -0.002736273319801345
;;=>  -0.05322100355959638
;;=>  0.016135531529402884
;;=>  0.008998546289145027
;;=>  -0.0614874305488179
;;=>  0.010613174279058944)
(pacf [1 2 3 4 5 4 3 2 1])
;;=> (0.0
;;=>  0.5396825396825397
;;=>  -0.4299857803057234
;;=>  -0.388084834596935
;;=>  -0.2792571208141194
;;=>  0.17585056996358742
;;=>  -0.2652225487589841
;;=>  -0.17978918763554708
;;=>  -0.10771973872263883)

pacf-ci

(pacf-ci data lags)(pacf-ci data lags alpha)

pacf with added confidence interval data.

Examples

Usage

(pacf-ci (repeatedly 1000 r/grand) 3)
;;=> {:ci 0.06197950323045615,
;;=>  :pacf
;;=>  (0.0 -0.003069469436674989 0.019682898818580288 0.008559579552033444)}
(pacf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:ci 0.653321328180018,
;;=>  :pacf (0.0 0.5396825396825397 -0.4299857803057234 -0.388084834596935)}

pearson-correlation

(pearson-correlation vs1 vs2)

Pearson’s correlation of two sequences.

Examples

Pearson’s correlation of uniform and gaussian distribution samples.

(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0013362357862912368

percentile

(percentile vs p)(percentile vs p estimation-strategy)

Calculate percentile of a vs.

Percentile p is from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also quantile.

Examples

Percentile 25%

(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0

Percentile 50% (median)

(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0

Percentile 75%

(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0

Percentile 90%

(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0

Various estimation strategies.

(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007

percentile-extent

(percentile-extent vs)(percentile-extent vs p)(percentile-extent vs p1 p2)(percentile-extent vs p1 p2 estimation-strategy)

Return percentile range and median.

p - calculates extent of p and 100-p (default: p=25)

Examples

for samples from gaussian distribution

(percentile-extent (repeatedly 100000 r/grand))
;;=> [-0.6757379614999778 0.6743169986727071 -6.525193090059774E-4]
(percentile-extent (repeatedly 100000 r/grand) 10)
;;=> [-1.2857354257915978 1.2809543542935327 -0.005038343134417929]
(percentile-extent (repeatedly 100000 r/grand) 30 70)
;;=> [-0.5245075255533502 0.522954632601382 -6.971662739497372E-4]

percentiles

(percentiles vs ps)(percentiles vs ps estimation-strategy)

Calculate percentiles of a vs.

Percentiles are sequence of values from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also quantile.

Examples

Usage

(percentiles [1 2 3 -1 -1 2 -1 11 111] [25 50 75 90])
;;=> [-1.0 2.0 7.0 111.0]

population-stddev

(population-stddev vs)(population-stddev vs u)

Calculate population standard deviation of vs.

See stddev.

Examples

Population standard deviation.

(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403

population-variance

(population-variance vs)(population-variance vs u)

Calculate population variance of vs.

See variance.

Examples

Population variance

(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543

quantile

(quantile vs q)(quantile vs q estimation-strategy)

Calculate quantile of a vs.

Quantile q is from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also percentile.

Examples

Quantile 0.25

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0

Quantile 0.5 (median)

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0

Quantile 0.75

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0

Quantile 0.9

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0

Various estimation strategies.

(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966

quantiles

(quantiles vs qs)(quantiles vs qs estimation-strategy)

Calculate quantiles of a vs.

Quantilizes is sequence with values from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also percentiles.

Examples

Usage

(quantiles [1 2 3 -1 -1 2 -1 11 111] [0.25 0.5 0.75 0.9])
;;=> [-1.0 2.0 7.0 111.0]

second-moment

deprecated in Use `moment` function

sem

(sem vs)

Standard error of mean

Examples

SEM

(sem [1 2 3 -1 -1 2 -1 11 111])
;;=> 12.174021115615695

sem-extent

(sem-extent vs)

-/+ sem and mean

Examples

standard error of mean and mean for gaussian distribution

(sem-extent (repeatedly 100000 r/grand))
;;=> [-0.0041852769369816285 0.002121789728373999 -0.001031743604303815]

skewness

(skewness vs)

Calculate kurtosis from sequence.

Examples

Skewness

(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954

spearman-correlation

(spearman-correlation vs1 vs2)

Spearman’s correlation of two sequences.

Examples

Spearsman’s correlation of uniform and gaussian distribution samples.

(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                      (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -2.2283008578278884E-4

standardize

(standardize vs)

Normalize samples to have mean = 0 and stddev = 1.

Examples

Standardize

(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=>  -0.33161081278713267
;;=>  -0.30423010347443363
;;=>  -0.4137529407252298
;;=>  -0.4137529407252298
;;=>  -0.33161081278713267
;;=>  -0.4137529407252298
;;=>  -0.08518442897284138
;;=>  2.652886502297062)

stats-map

(stats-map vs)(stats-map vs estimation-strategy)

Calculate several statistics of vs and return as map.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Stats

(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=>  :Kurtosis 8.732515263272099,
;;=>  :LAV -1.0,
;;=>  :LIF -13.0,
;;=>  :LOF -25.0,
;;=>  :MAD 3.0,
;;=>  :Max 111.0,
;;=>  :Mean 14.11111111111111,
;;=>  :Median 2.0,
;;=>  :Min -1.0,
;;=>  :Mode -1.0,
;;=>  :Outliers (111.0),
;;=>  :Q1 -1.0,
;;=>  :Q3 7.0,
;;=>  :Range 112.0,
;;=>  :SD 36.522063346847084,
;;=>  :SEM 12.174021115615695,
;;=>  :Size 9,
;;=>  :Skewness 2.94268445417954,
;;=>  :Total 127.0,
;;=>  :UAV 11.0,
;;=>  :UIF 19.0,
;;=>  :UOF 31.0,
;;=>  :Variance 1333.8611111111113}

stddev

(stddev vs)(stddev vs u)

Calculate standard deviation of vs.

See population-stddev.

Examples

Standard deviation.

(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084

stddev-extent

(stddev-extent vs)

-/+ stddev and mean

Examples

standard deviation from mean and mean for gaussian distribution

(stddev-extent (repeatedly 100000 r/grand))
;;=> [-1.0008742722153459 0.994022466044159 -0.0034259030855934678]

sum

(sum vs)

Sum of all vs values.

Examples

Sum

(sum [1 2 3 -1 -1 2 -1 11 111])
;;=> 127.0

ttest-one-sample

(ttest-one-sample xs)(ttest-one-sample xs {:keys [alpha sides mu], :or {alpha 0.05, sides :two-sided, mu 0.0}})

One-sample Student’s t-test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
  • mu - mean (default: 0.0)

Examples

Usage

(ttest-one-sample [1 2 3 4 5 6 7 8 9 10])
;;=> {:confidence-intervals [3.3341494103317983 7.665850589668201],
;;=>  :df 9,
;;=>  :estimated-mu 5.5,
;;=>  :p-value 2.781960110481859E-4,
;;=>  :t 5.744562646538029,
;;=>  :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:alpha 0.2})
;;=> {:confidence-intervals [4.175850795053416 6.824149204946584],
;;=>  :df 9,
;;=>  :estimated-mu 5.5,
;;=>  :p-value 2.781960110481859E-4,
;;=>  :t 5.744562646538029,
;;=>  :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:sides :one-sided})
;;=> {:confidence-intervals [##-Inf 7.255072013309326],
;;=>  :df 9,
;;=>  :estimated-mu 5.5,
;;=>  :p-value 0.9998609019944759,
;;=>  :t 5.744562646538029,
;;=>  :test-type :one-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:mu 5.0})
;;=> {:confidence-intervals [3.334149410331798 7.665850589668201],
;;=>  :df 9,
;;=>  :estimated-mu 5.5,
;;=>  :p-value 0.6141172548083933,
;;=>  :t 0.5222329678670935,
;;=>  :test-type :two-sided}

ttest-two-samples

(ttest-two-samples xs ys)(ttest-two-samples xs ys {:keys [alpha sides mu paired? equal-variances?], :or {alpha 0.05, sides :two-sided, mu 0.0, paired? false, equal-variances? false}, :as params})

Two-sample Student’s t-test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
  • mu - mean (default: 0.0)
  • paired? - unpaired or paired test, boolean (default: false)
  • equal-variances? - unequal or equal variances, boolean (default: false)

Examples

Usage

(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [7 8 9 10 11 12 13 14 15 16 17 18 19 20])
;;=> {:confidence-intervals [-11.052801725158163 -4.9471982748418375],
;;=>  :df 21.982212340188994,
;;=>  :estimated-mu [5.5 13.5],
;;=>  :p-value 1.8552818325118146E-5,
;;=>  :paired? false,
;;=>  :t -5.4349297638940595,
;;=>  :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [7 8 9 10 11 12 13 14 15 16 17 18 19 20 200])
;;=> {:confidence-intervals [-47.242899887102105 6.376233220435439],
;;=>  :df 14.164598953012467,
;;=>  :estimated-mu [5.5 25.93333333333333],
;;=>  :p-value 0.12451349808974498,
;;=>  :paired? false,
;;=>  :t -1.632902633201205,
;;=>  :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [7 8 9 10 11 12 13 14 15 16 17 18 19 20]
                   {:equal-variances? true})
;;=> {:confidence-intervals [-11.22324472988163 -4.77675527011837],
;;=>  :df 22.0,
;;=>  :estimated-mu [5.5 13.5],
;;=>  :p-value 3.690577215911943E-5,
;;=>  :paired? false,
;;=>  :t -5.147292847304685,
;;=>  :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
                   [200 11 200 11 200 11 200 11 200 11]
                   {:paired? true})
;;=> {:confidence-intervals [-171.66671936335894 -28.333280636641092],
;;=>  :df 9,
;;=>  :estimated-mu -100.0,
;;=>  :p-value 0.011615504295919215,
;;=>  :paired? true,
;;=>  :t -3.156496045715208,
;;=>  :test-type :two-sided}

variance

(variance vs)(variance vs u)

Calculate variance of vs.

See population-variance.

Examples

Variance.

(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111