fastmath.stats
Statistics functions.
- Descriptive statistics.
- Correlation / covariance
- Outliers
- Confidence intervals
- Extents
- Effect size
- Student’s t-test
- Histogram
- ACF/PACF
- Bootstrap
- Binary measures
All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.
Descriptive statistics
All in one function stats-map contains:
:Size- size of the samples,(count ...):Min- minimum value:Max- maximum value:Range- range of values:Mean- mean/average:Median- median, see also: median-3:Mode- mode, see also: modes:Q1- first quartile, use: percentile, quartile:Q3- third quartile, use: percentile, quartile:Total- sum of all samples:SD- sample standard deviation:Variance- variance:MAD- median-absolute-deviation:SEM- standard error of mean:LAV- lower adjacent value, use: adjacent-values:UAV- upper adjacent value, use: adjacent-values:IQR- interquartile range,(- q3 q1):LOF- lower outer fence,(- q1 (* 3.0 iqr)):UOF- upper outer fence,(+ q3 (* 3.0 iqr)):LIF- lower inner fence,(- q1 (* 1.5 iqr)):UIF- upper inner fence,(+ q3 (* 1.5 iqr)):Outliers- list of outliers, samples which are outside outer fences:Kurtosis- kurtosis:Skewness- skewness
Note: percentile and quartile can have 10 different interpolation strategies. See docs
Categories
- Correlation: correlation covariance covariance-matrix jensen-shannon-divergence kendall-correlation kullback-leibler-divergence pearson-correlation spearman-correlation
- Effect size: ameasure cliffs-delta cohens-d cohens-d-orig glass-delta hedges-g hedges-g*
- Extents: adjacent-values bootstrap-ci ci extent mad-extent percentile-extent sem-extent stddev-extent
- Normalize: demean standardize
- Descriptive statistics: binary-measures binary-measures-all estimate-bins estimation-strategies-list histogram iqr kurtosis maximum mean median median-3 median-absolute-deviation minimum mode modes outliers percentile percentiles population-stddev population-variance quantile quantiles sem skewness stats-map stddev sum variance
- Hypothesis test: ttest-one-sample ttest-two-samples
- Time series: acf acf-ci pacf pacf-ci
Other vars: bootstrap moment second-moment
acf
(acf data)(acf data lags)Examples
Usage
(acf (repeatedly 1000 r/grand) 5)
;;=> (1.0
;;=> 0.0056672021105804715
;;=> 0.02683192836034792
;;=> 0.003505061419148288
;;=> 0.017117838382944242
;;=> -0.014709355084377094)
(acf (repeatedly 1000 r/grand) [10 20 100 500])
;;=> (0.03608425929253231
;;=> -0.04862331077397911
;;=> 0.0026191550786753507
;;=> 0.006099538382009882)
(acf [1 2 3 4 5 4 3 2 1])
;;=> (1.0
;;=> 0.5396825396825397
;;=> -0.013492063492063475
;;=> -0.4666666666666665
;;=> -0.6269841269841269
;;=> -0.3015873015873015
;;=> -0.011904761904761935
;;=> 0.17777777777777773
;;=> 0.20317460317460315)acf-ci
(acf-ci data lags)(acf-ci data lags alpha)acf with added confidence interval data.
:cis contains list of calculated ci for every lag.
Examples
Usage
(acf-ci (repeatedly 1000 r/grand) 3)
;;=> {:acf
;;=> (1.0 0.001652674595128032 0.030605610528521364 -0.006874164003129496),
;;=> :ci 0.06197950323045615,
;;=> :cis (0.06197950323045615
;;=> 0.06197967251690712
;;=> 0.062037701604327464
;;=> 0.06204062757534195)}
(acf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:acf
;;=> (1.0 0.5396825396825397 -0.013492063492063475 -0.4666666666666665),
;;=> :ci 0.653321328180018,
;;=> :cis (0.653321328180018
;;=> 0.8218653739461048
;;=> 0.8219599072345126
;;=> 0.9281841012727746)}adjacent-values
(adjacent-values vs)(adjacent-values vs estimation-strategy)(adjacent-values vs q1 q3)Lower and upper adjacent values (LAV and UAV).
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).
- LAV is smallest value which is greater or equal to the LIF =
(- Q1 (* 1.5 IQR)). - UAV is largest value which is lower or equal to the UIF =
(+ Q3 (* 1.5 IQR)). - third value is a median of samples
Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
[LAV, UAV]
(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0 2.0]Gaussian distribution [LAV, UAV]
(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.698318697864716 2.6996062842439654 0.0016124036257504448]ameasure
(ameasure group1 group2)Vargha-Delaney A measure for two populations a and b
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(ameasure t c))
;;=> 0.20833333333333334binary-measures
(binary-measures truth prediction)(binary-measures truth prediction true-value)Subset of binary measures. See binary-measures-all.
Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]
Examples
Usage
(binary-measures [true false true false true false true false]
[true false false true false false false true])
;;=> {:accuracy 0.375,
;;=> :f-measure 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fp 2,
;;=> :precision 0.3333333333333333,
;;=> :recall 0.25,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tp 1}Treat
1astruevalue.
(binary-measures [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=> :f-measure 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fp 2,
;;=> :precision 0.3333333333333333,
;;=> :recall 0.25,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tp 1}Treat
:aand:bastruevalue.
(binary-measures [:a :b :c :d :e :f :a :b]
[:a :b :a :b :a :f :d :b]
{:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=> :f-measure 0.6,
;;=> :fall-out 0.75,
;;=> :fdr 0.5,
;;=> :fn 1,
;;=> :fp 3,
;;=> :precision 0.5,
;;=> :recall 0.75,
;;=> :sensitivity 0.75,
;;=> :specificity 0.25,
;;=> :tn 1,
;;=> :tp 3}binary-measures-all
(binary-measures-all truth prediction)(binary-measures-all truth prediction true-value)Collection of binary measures.
truth- list of ground truth valuesprediction- list of predicted valuestrue-value- optional, what is true intruthandprediction
true-value can be one of:
nil- values are treating as booleans- any sequence - values from sequence will be treated as
true - map - conversion will be done according to provided map (if there is no correspondin key, value is treated as
false)
Examples
Usage
(binary-measures-all [true false true false true false true false]
[true false false true false false false true])
;;=> {:accuracy 0.375,
;;=> :bm -0.25,
;;=> :cn 4.0,
;;=> :cp 4.0,
;;=> :dor 0.3333333333333333,
;;=> :f-beta #,
;;=> :f-measure 0.28571428571428575,
;;=> :f1-score 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fnr 0.75,
;;=> :for 0.6,
;;=> :fp 2,
;;=> :fpr 0.5,
;;=> :hit-rate 0.25,
;;=> :lr+ 0.5,
;;=> :lr- 1.5,
;;=> :mcc -0.2581988897471611,
;;=> :miss-rate 0.75,
;;=> :mk -0.2666666666666666,
;;=> :npv 0.4,
;;=> :pcn 5.0,
;;=> :pcp 3.0,
;;=> :ppv 0.3333333333333333,
;;=> :precision 0.3333333333333333,
;;=> :prevalence 0.5,
;;=> :recall 0.25,
;;=> :selectivity 0.5,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tnr 0.5,
;;=> :total 8.0,
;;=> :tp 1,
;;=> :tpr 0.25} Treat
1astruevalue.
(binary-measures-all [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=> :bm -0.25,
;;=> :cn 4.0,
;;=> :cp 4.0,
;;=> :dor 0.3333333333333333,
;;=> :f-beta #,
;;=> :f-measure 0.28571428571428575,
;;=> :f1-score 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fnr 0.75,
;;=> :for 0.6,
;;=> :fp 2,
;;=> :fpr 0.5,
;;=> :hit-rate 0.25,
;;=> :lr+ 0.5,
;;=> :lr- 1.5,
;;=> :mcc -0.2581988897471611,
;;=> :miss-rate 0.75,
;;=> :mk -0.2666666666666666,
;;=> :npv 0.4,
;;=> :pcn 5.0,
;;=> :pcp 3.0,
;;=> :ppv 0.3333333333333333,
;;=> :precision 0.3333333333333333,
;;=> :prevalence 0.5,
;;=> :recall 0.25,
;;=> :selectivity 0.5,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tnr 0.5,
;;=> :total 8.0,
;;=> :tp 1,
;;=> :tpr 0.25} Treat
:aand:bastruevalue.
(binary-measures-all [:a :b :c :d :e :f :a :b]
[:a :b :a :b :a :f :d :b]
{:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=> :bm 0.0,
;;=> :cn 4.0,
;;=> :cp 4.0,
;;=> :dor 1.0,
;;=> :f-beta #,
;;=> :f-measure 0.6,
;;=> :f1-score 0.6,
;;=> :fall-out 0.75,
;;=> :fdr 0.5,
;;=> :fn 1,
;;=> :fnr 0.25,
;;=> :for 0.5,
;;=> :fp 3,
;;=> :fpr 0.75,
;;=> :hit-rate 0.75,
;;=> :lr+ 1.0,
;;=> :lr- 1.0,
;;=> :mcc 0.0,
;;=> :miss-rate 0.25,
;;=> :mk 0.0,
;;=> :npv 0.5,
;;=> :pcn 2.0,
;;=> :pcp 6.0,
;;=> :ppv 0.5,
;;=> :precision 0.5,
;;=> :prevalence 0.5,
;;=> :recall 0.75,
;;=> :selectivity 0.25,
;;=> :sensitivity 0.75,
;;=> :specificity 0.25,
;;=> :tn 1,
;;=> :tnr 0.25,
;;=> :total 8.0,
;;=> :tp 3,
;;=> :tpr 0.75} F-beta is a function. When
betais equal1.0, you getf1-score.
(let [fbeta (:f-beta (binary-measures-all
[true false true false true false true false]
[true false false true false false false true]))]
[(fbeta 1.0) (fbeta 2.0) (fbeta 0.5)])
;;=> [0.28571428571428575 0.7142857142857144 0.1785714285714286]bootstrap
(bootstrap vs)(bootstrap vs samples)(bootstrap vs samples size)Generate set of samples of given size from provided data.
Default samples is 50, number of size defaults to 1000
Examples
Usage
(bootstrap [1 2 3 4 1 2 3 1 2 1] 2 20)
;;=> ((1.0
;;=> 1.0
;;=> 2.0
;;=> 1.0
;;=> 4.0
;;=> 3.0
;;=> 2.0
;;=> 2.0
;;=> 1.0
;;=> 4.0
;;=> 2.0
;;=> 1.0
;;=> 4.0
;;=> 2.0
;;=> 1.0
;;=> 3.0
;;=> 1.0
;;=> 3.0
;;=> 3.0
;;=> 2.0)
;;=> (1.0
;;=> 4.0
;;=> 1.0
;;=> 2.0
;;=> 1.0
;;=> 3.0
;;=> 2.0
;;=> 1.0
;;=> 1.0
;;=> 1.0
;;=> 1.0
;;=> 2.0
;;=> 2.0
;;=> 1.0
;;=> 2.0
;;=> 3.0
;;=> 1.0
;;=> 1.0
;;=> 2.0
;;=> 3.0))
(let [data [1 2 3 4 1 2 3 1 2 1]
fdata (frequencies data)
bdata (bootstrap data 5 1000)]
{:source fdata, :bootstrapped (map frequencies bdata)})
;;=> {:bootstrapped ({1.0 413, 2.0 294, 3.0 204, 4.0 89}
;;=> {1.0 415, 2.0 285, 3.0 192, 4.0 108}
;;=> {1.0 406, 2.0 297, 3.0 189, 4.0 108}
;;=> {1.0 419, 2.0 300, 3.0 199, 4.0 82}
;;=> {1.0 401, 2.0 322, 3.0 196, 4.0 81}),
;;=> :source {1 4, 2 3, 3 2, 4 1}}bootstrap-ci
(bootstrap-ci vs)(bootstrap-ci vs alpha)(bootstrap-ci vs alpha samples)(bootstrap-ci vs alpha samples stat-fn)Bootstrap method to calculate confidence interval.
Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.
Returns ci and statistical function value.
Examples
Usage
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-5.796000000000005 17.8 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [2.5999999999999996 15.280000000000005 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8 100000)
;;=> [2.5999999999999996 15.7 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.98 1000 median)
;;=> [-5.0 2.0 1.5]ci
(ci vs)(ci vs alpha)T-student based confidence interval for given data. Alpha value defaults to 0.98.
Last value is mean.
Examples
Usage
(ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-10.759020390886263 28.759020390886263 9.0]
(ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [-0.6855907410547175 18.685590741054718 9.0]cliffs-delta
(cliffs-delta group1 group2)Cliff’s delta effect size
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(cliffs-delta t c))
;;=> -0.25cohens-d
(cohens-d group1 group2)Cohen’s d effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(cohens-d t c))
;;=> -0.42090943320131763cohens-d-orig
(cohens-d-orig group1 group2)Original version of Cohen’s d effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(cohens-d-orig t c))
;;=> -0.39372472247513574correlation
(correlation vs1 vs2)Correlation of two sequences.
Examples
Correlation of uniform and gaussian distribution samples.
(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.004322084824287007covariance
(covariance vs1 vs2)Covariance of two sequences.
Examples
Covariance of uniform and gaussian distribution samples.
(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.02860827181203021covariance-matrix
(covariance-matrix vss)Generate covariance matrix from seq of seqs. Row order.
Examples
Usage
(covariance-matrix [[1 2 3 4 5 11] [3 2 3 2 3 4]])
;;=> ([12.666666666666668 1.8666666666666667]
;;=> [1.8666666666666667 0.5666666666666667])demean
(demean vs)Subtract mean from sequence
Examples
Usage
(demean [-5 1 1 1 1 2 2 5 11 71])
;;=> (-14.0 -8.0 -8.0 -8.0 -8.0 -7.0 -7.0 -4.0 2.0 62.0)estimate-bins
(estimate-bins vs)(estimate-bins vs bins-or-estimate-method)Estimate number of bins for histogram.
Possible methods are: :sqrt :sturges :rice :doane :scott :freedman-diaconis (default).
Examples
Estimate number of bins for various methods.
vscontains 1000 random samples from Log-Normal distribution.
(estimate-bins vs :sqrt)
;;=> 31
(estimate-bins vs :sturges)
;;=> 11
(estimate-bins vs :rice)
;;=> 20
(estimate-bins vs :doane)
;;=> 16
(estimate-bins vs :scott)
;;=> 34
(estimate-bins vs :freedman-diaconis)
;;=> 81estimation-strategies-list
Examples
List of estimation strategies for percentile
(sort (keys estimation-strategies-list))
;;=> (:legacy :r1 :r2 :r3 :r4 :r5 :r6 :r7 :r8 :r9)extent
(extent vs)Return extent (min, max, mean) values from sequence
Examples
min/max and mean of gaussian distribution
(extent (repeatedly 100000 r/grand))
;;=> [-4.433795945695126 4.1230995512178925 -0.0030623577039036527]glass-delta
(glass-delta group1 group2)Glass’s delta effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(glass-delta t c))
;;=> -0.3849741916091626hedges-g
(hedges-g group1 group2)Hedges’s g effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(hedges-g t c))
;;=> -0.3907787841275092hedges-g*
(hedges-g* group1 group2)Less biased Hedges’s g effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(hedges-g* t c))
;;=> -0.36946357772055416histogram
(histogram vs)(histogram vs bins-or-estimate-method)(histogram vs bins [mn mx])Calculate histogram.
Returns map with keys:
:size- number of bins:step- distance between bins:bins- list of pairs of range lower value and number of hits:min- min value:max- max value:samples- number of used samples
For estimation methods check estimate-bins.
Examples
3 bins from uniform distribution.
(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([3.852964808315207E-5 333]
;;=> [0.3324743943078295 344]
;;=> [0.6649102589675758 323]),
;;=> :max 0.9973461236273221,
;;=> :min 3.852964808315207E-5,
;;=> :samples 1000,
;;=> :size 3,
;;=> :step 0.33243586465974634}3 bins from uniform distribution for given range.
(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins ([0.1 1325] [0.2333333333333334 1305] [0.3666666666666668 1362]),
;;=> :max 0.5000000000000001,
;;=> :min 0.1,
;;=> :samples 3992,
;;=> :size 3,
;;=> :step 0.1333333333333334}5 bins from normal distribution.
(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.6207687808508426 156]
;;=> [-2.189509070713405 2083]
;;=> [-0.7582493605759675 5213]
;;=> [0.6730103495614701 2348]
;;=> [2.1042700596989077 200]),
;;=> :max 3.535529769836345,
;;=> :min -3.6207687808508426,
;;=> :samples 10000,
;;=> :size 5,
;;=> :step 1.4312597101374376}Estimate number of bins
(:size (histogram (repeatedly 10000 r/grand)))
;;=> 63Estimate number of bins, Rice rule
(:size (histogram (repeatedly 10000 r/grand) :rice))
;;=> 44iqr
(iqr vs)(iqr vs estimation-strategy)Interquartile range.
Examples
IQR
(iqr (repeatedly 100000 r/grand))
;;=> 1.3440848153319385jensen-shannon-divergence
(jensen-shannon-divergence vs1 vs2)Jensen-Shannon divergence of two sequences.
Examples
Jensen-Shannon divergence
(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
(repeatedly 100 (fn* [] (r/irand 100))))
;;=> 569.0495492365783kendall-correlation
(kendall-correlation vs1 vs2)Kendall’s correlation of two sequences.
Examples
Kendall’s correlation of uniform and gaussian distribution samples.
(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.0013475046750467505kullback-leibler-divergence
(kullback-leibler-divergence vs1 vs2)Kullback-Leibler divergence of two sequences.
Examples
Kullback-Leibler divergence.
(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
(repeatedly 100 (fn* [] (r/irand 100))))
;;=> 2974.653698088623kurtosis
(kurtosis vs)Calculate kurtosis from sequence.
Examples
Kurtosis
(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099mad-extent
(mad-extent vs)-/+ median-absolute-deviation and median
Examples
median absolute deviation from median for gaussian distribution
(mad-extent (repeatedly 100000 r/grand))
;;=> [-0.6674624646061628 0.6788112117843635 0.005674373589100425]maximum
(maximum vs)Maximum value from sequence.
Examples
Maximum value
(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0mean
(mean vs)Calculate mean of vs
Examples
Mean (average value)
(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109median
(median vs)Calculate median of vs. See median-3.
Examples
Median (percentile 50%).
(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0For three elements use faster median-3.
(median [7 1 4])
;;=> 4.0median-3
(median-3 a b c)Median of three values. See median.
Examples
Median of [7 1 4]
(median-3 7 1 4)
;;=> 4.0median-absolute-deviation
(median-absolute-deviation vs)Calculate MAD
Examples
MAD
(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0minimum
(minimum vs)Minimum value from sequence.
Examples
Minimum value
(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0mode
(mode vs)Find the value that appears most often in a dataset vs.
See also modes.
Examples
Example
(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0Returns lowest value when every element appears equally.
(mode [5 1 2 3 4])
;;=> 1.0modes
(modes vs)Find the values that appears most often in a dataset vs.
Returns sequence with all most appearing values in increasing order.
See also mode.
Examples
Example
(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)Returns lowest value when every element appears equally.
(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)moment
(moment vs)(moment vs order)(moment vs order {:keys [absolute? center mean?], :or {absolute? false, center nil, mean? true}})Calculate moment (central or/and absolute) of given order (default: 2).
Additional parameters as a map:
:absolute?- calculate sum as absolute values (default:false):mean?- returns mean (proper moment) or just sum of differences (default:true):center- value of central (default:nil= mean)
Examples
Usage
(moment [3 7 5 9 -8])
;;=> 35.36
(moment [3 7 5 9 -8] 1.0)
;;=> 0.0
(moment [3 7 5 9 -8] 4.0)
;;=> 3417.171199999999
(moment [3 7 5 9 -8] 3.0)
;;=> -229.82399999999993
(moment [3 7 5 9 -8] 3.0 {:center 0.0})
;;=> 142.4
(moment [3 7 5 9 -8] 3.0 {:mean? false})
;;=> -1149.1199999999997
(moment [3 7 5 9 -8] 3.0 {:absolute? true})
;;=> 332.15039999999993
(moment [3 7 5 9 -8] 3.0 {:center -3.0})
;;=> 666.2
(moment [3 7 5 9 -8] 0.5 {:absolute? true})
;;=> 1.8986344545712772outliers
(outliers vs)(outliers vs estimation-strategy)(outliers vs q1 q3)Find outliers defined as values outside outer fences.
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).
- LIF (Lower Outer Fence) equals
(- Q1 (* 1.5 IQR)). - UIF (Upper Outer Fence) equals
(+ Q3 (* 1.5 IQR)).
Returns sequence.
Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
Outliers
(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)Gaussian distribution outliers
(count (outliers (repeatedly 3000000 r/grand)))
;;=> 20845pacf
(pacf data)(pacf data lags)Examples
Usage
(pacf (repeatedly 1000 r/grand) 10)
;;=> (0.0
;;=> -0.026736190605719863
;;=> 0.010877194279486118
;;=> 0.032203194789526435
;;=> -0.035726254389457764
;;=> -0.002736273319801345
;;=> -0.05322100355959638
;;=> 0.016135531529402884
;;=> 0.008998546289145027
;;=> -0.0614874305488179
;;=> 0.010613174279058944)
(pacf [1 2 3 4 5 4 3 2 1])
;;=> (0.0
;;=> 0.5396825396825397
;;=> -0.4299857803057234
;;=> -0.388084834596935
;;=> -0.2792571208141194
;;=> 0.17585056996358742
;;=> -0.2652225487589841
;;=> -0.17978918763554708
;;=> -0.10771973872263883)pacf-ci
(pacf-ci data lags)(pacf-ci data lags alpha)pacf with added confidence interval data.
Examples
Usage
(pacf-ci (repeatedly 1000 r/grand) 3)
;;=> {:ci 0.06197950323045615,
;;=> :pacf
;;=> (0.0 -0.003069469436674989 0.019682898818580288 0.008559579552033444)}
(pacf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:ci 0.653321328180018,
;;=> :pacf (0.0 0.5396825396825397 -0.4299857803057234 -0.388084834596935)}pearson-correlation
(pearson-correlation vs1 vs2)Pearson’s correlation of two sequences.
Examples
Pearson’s correlation of uniform and gaussian distribution samples.
(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0013362357862912368percentile
(percentile vs p)(percentile vs p estimation-strategy)Examples
Percentile 25%
(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0Percentile 50% (median)
(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0Percentile 75%
(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0Percentile 90%
(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0Various estimation strategies.
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007percentile-extent
(percentile-extent vs)(percentile-extent vs p)(percentile-extent vs p1 p2)(percentile-extent vs p1 p2 estimation-strategy)Return percentile range and median.
p - calculates extent of p and 100-p (default: p=25)
Examples
for samples from gaussian distribution
(percentile-extent (repeatedly 100000 r/grand))
;;=> [-0.6757379614999778 0.6743169986727071 -6.525193090059774E-4]
(percentile-extent (repeatedly 100000 r/grand) 10)
;;=> [-1.2857354257915978 1.2809543542935327 -0.005038343134417929]
(percentile-extent (repeatedly 100000 r/grand) 30 70)
;;=> [-0.5245075255533502 0.522954632601382 -6.971662739497372E-4]percentiles
(percentiles vs ps)(percentiles vs ps estimation-strategy)Examples
Usage
(percentiles [1 2 3 -1 -1 2 -1 11 111] [25 50 75 90])
;;=> [-1.0 2.0 7.0 111.0]population-stddev
(population-stddev vs)(population-stddev vs u)Calculate population standard deviation of vs.
See stddev.
Examples
Population standard deviation.
(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403population-variance
(population-variance vs)(population-variance vs u)Calculate population variance of vs.
See variance.
Examples
Population variance
(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543quantile
(quantile vs q)(quantile vs q estimation-strategy)Calculate quantile of a vs.
Quantile q is from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here
See also percentile.
Examples
Quantile 0.25
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0Quantile 0.5 (median)
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0Quantile 0.75
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0Quantile 0.9
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0Various estimation strategies.
(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966quantiles
(quantiles vs qs)(quantiles vs qs estimation-strategy)Calculate quantiles of a vs.
Quantilizes is sequence with values from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here
See also percentiles.
Examples
Usage
(quantiles [1 2 3 -1 -1 2 -1 11 111] [0.25 0.5 0.75 0.9])
;;=> [-1.0 2.0 7.0 111.0]sem
(sem vs)Standard error of mean
Examples
SEM
(sem [1 2 3 -1 -1 2 -1 11 111])
;;=> 12.174021115615695sem-extent
(sem-extent vs)-/+ sem and mean
Examples
standard error of mean and mean for gaussian distribution
(sem-extent (repeatedly 100000 r/grand))
;;=> [-0.0041852769369816285 0.002121789728373999 -0.001031743604303815]skewness
(skewness vs)Calculate kurtosis from sequence.
Examples
Skewness
(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954spearman-correlation
(spearman-correlation vs1 vs2)Spearman’s correlation of two sequences.
Examples
Spearsman’s correlation of uniform and gaussian distribution samples.
(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -2.2283008578278884E-4standardize
(standardize vs)Normalize samples to have mean = 0 and stddev = 1.
Examples
Standardize
(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=> -0.33161081278713267
;;=> -0.30423010347443363
;;=> -0.4137529407252298
;;=> -0.4137529407252298
;;=> -0.33161081278713267
;;=> -0.4137529407252298
;;=> -0.08518442897284138
;;=> 2.652886502297062)stats-map
(stats-map vs)(stats-map vs estimation-strategy)Calculate several statistics of vs and return as map.
Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
Stats
(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=> :Kurtosis 8.732515263272099,
;;=> :LAV -1.0,
;;=> :LIF -13.0,
;;=> :LOF -25.0,
;;=> :MAD 3.0,
;;=> :Max 111.0,
;;=> :Mean 14.11111111111111,
;;=> :Median 2.0,
;;=> :Min -1.0,
;;=> :Mode -1.0,
;;=> :Outliers (111.0),
;;=> :Q1 -1.0,
;;=> :Q3 7.0,
;;=> :Range 112.0,
;;=> :SD 36.522063346847084,
;;=> :SEM 12.174021115615695,
;;=> :Size 9,
;;=> :Skewness 2.94268445417954,
;;=> :Total 127.0,
;;=> :UAV 11.0,
;;=> :UIF 19.0,
;;=> :UOF 31.0,
;;=> :Variance 1333.8611111111113}stddev
(stddev vs)(stddev vs u)Calculate standard deviation of vs.
See population-stddev.
Examples
Standard deviation.
(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084stddev-extent
(stddev-extent vs)-/+ stddev and mean
Examples
standard deviation from mean and mean for gaussian distribution
(stddev-extent (repeatedly 100000 r/grand))
;;=> [-1.0008742722153459 0.994022466044159 -0.0034259030855934678]ttest-one-sample
(ttest-one-sample xs)(ttest-one-sample xs {:keys [alpha sides mu], :or {alpha 0.05, sides :two-sided, mu 0.0}})One-sample Student’s t-test
alpha- significance level (default:0.05)sides- one of::two-sided,:one-sided-less(short::one-sided) or:one-sided-greatermu- mean (default:0.0)
Examples
Usage
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10])
;;=> {:confidence-intervals [3.3341494103317983 7.665850589668201],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 2.781960110481859E-4,
;;=> :t 5.744562646538029,
;;=> :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:alpha 0.2})
;;=> {:confidence-intervals [4.175850795053416 6.824149204946584],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 2.781960110481859E-4,
;;=> :t 5.744562646538029,
;;=> :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:sides :one-sided})
;;=> {:confidence-intervals [##-Inf 7.255072013309326],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 0.9998609019944759,
;;=> :t 5.744562646538029,
;;=> :test-type :one-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:mu 5.0})
;;=> {:confidence-intervals [3.334149410331798 7.665850589668201],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 0.6141172548083933,
;;=> :t 0.5222329678670935,
;;=> :test-type :two-sided}ttest-two-samples
(ttest-two-samples xs ys)(ttest-two-samples xs ys {:keys [alpha sides mu paired? equal-variances?], :or {alpha 0.05, sides :two-sided, mu 0.0, paired? false, equal-variances? false}, :as params})Two-sample Student’s t-test
alpha- significance level (default:0.05)sides- one of::two-sided,:one-sided-less(short::one-sided) or:one-sided-greatermu- mean (default:0.0)paired?- unpaired or paired test, boolean (default:false)equal-variances?- unequal or equal variances, boolean (default:false)
Examples
Usage
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[7 8 9 10 11 12 13 14 15 16 17 18 19 20])
;;=> {:confidence-intervals [-11.052801725158163 -4.9471982748418375],
;;=> :df 21.982212340188994,
;;=> :estimated-mu [5.5 13.5],
;;=> :p-value 1.8552818325118146E-5,
;;=> :paired? false,
;;=> :t -5.4349297638940595,
;;=> :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[7 8 9 10 11 12 13 14 15 16 17 18 19 20 200])
;;=> {:confidence-intervals [-47.242899887102105 6.376233220435439],
;;=> :df 14.164598953012467,
;;=> :estimated-mu [5.5 25.93333333333333],
;;=> :p-value 0.12451349808974498,
;;=> :paired? false,
;;=> :t -1.632902633201205,
;;=> :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[7 8 9 10 11 12 13 14 15 16 17 18 19 20]
{:equal-variances? true})
;;=> {:confidence-intervals [-11.22324472988163 -4.77675527011837],
;;=> :df 22.0,
;;=> :estimated-mu [5.5 13.5],
;;=> :p-value 3.690577215911943E-5,
;;=> :paired? false,
;;=> :t -5.147292847304685,
;;=> :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[200 11 200 11 200 11 200 11 200 11]
{:paired? true})
;;=> {:confidence-intervals [-171.66671936335894 -28.333280636641092],
;;=> :df 9,
;;=> :estimated-mu -100.0,
;;=> :p-value 0.011615504295919215,
;;=> :paired? true,
;;=> :t -3.156496045715208,
;;=> :test-type :two-sided}variance
(variance vs)(variance vs u)Calculate variance of vs.
See population-variance.
Examples
Variance.
(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111