fastmath.stats
Statistics functions.
- Descriptive statistics.
- Correlation / covariance
- Outliers
- Confidence intervals
- Extents
- Effect size
- Student’s t-test
- Histogram
- ACF/PACF
- Bootstrap
- Binary measures
All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.
Descriptive statistics
All in one function stats-map contains:
:Size
- size of the samples,(count ...)
:Min
- minimum value:Max
- maximum value:Range
- range of values:Mean
- mean/average:Median
- median, see also: median-3:Mode
- mode, see also: modes:Q1
- first quartile, use: percentile, quartile:Q3
- third quartile, use: percentile, quartile:Total
- sum of all samples:SD
- sample standard deviation:Variance
- variance:MAD
- median-absolute-deviation:SEM
- standard error of mean:LAV
- lower adjacent value, use: adjacent-values:UAV
- upper adjacent value, use: adjacent-values:IQR
- interquartile range,(- q3 q1)
:LOF
- lower outer fence,(- q1 (* 3.0 iqr))
:UOF
- upper outer fence,(+ q3 (* 3.0 iqr))
:LIF
- lower inner fence,(- q1 (* 1.5 iqr))
:UIF
- upper inner fence,(+ q3 (* 1.5 iqr))
:Outliers
- list of outliers, samples which are outside outer fences:Kurtosis
- kurtosis:Skewness
- skewness
Note: percentile and quartile can have 10 different interpolation strategies. See docs
Categories
- Correlation: correlation covariance covariance-matrix jensen-shannon-divergence kendall-correlation kullback-leibler-divergence pearson-correlation spearman-correlation
- Effect size: ameasure cliffs-delta cohens-d cohens-d-orig glass-delta hedges-g hedges-g*
- Extents: adjacent-values bootstrap-ci ci extent mad-extent percentile-extent sem-extent stddev-extent
- Normalize: demean standardize
- Descriptive statistics: binary-measures binary-measures-all estimate-bins estimation-strategies-list histogram iqr kurtosis maximum mean median median-3 median-absolute-deviation minimum mode modes outliers percentile percentiles population-stddev population-variance quantile quantiles sem skewness stats-map stddev sum variance
- Hypothesis test: ttest-one-sample ttest-two-samples
- Time series: acf acf-ci pacf pacf-ci
Other vars: bootstrap moment second-moment
acf
(acf data)
(acf data lags)
Examples
Usage
(acf (repeatedly 1000 r/grand) 5)
;;=> (1.0
;;=> 0.0056672021105804715
;;=> 0.02683192836034792
;;=> 0.003505061419148288
;;=> 0.017117838382944242
;;=> -0.014709355084377094)
(acf (repeatedly 1000 r/grand) [10 20 100 500])
;;=> (0.03608425929253231
;;=> -0.04862331077397911
;;=> 0.0026191550786753507
;;=> 0.006099538382009882)
(acf [1 2 3 4 5 4 3 2 1])
;;=> (1.0
;;=> 0.5396825396825397
;;=> -0.013492063492063475
;;=> -0.4666666666666665
;;=> -0.6269841269841269
;;=> -0.3015873015873015
;;=> -0.011904761904761935
;;=> 0.17777777777777773
;;=> 0.20317460317460315)
acf-ci
(acf-ci data lags)
(acf-ci data lags alpha)
acf with added confidence interval data.
:cis
contains list of calculated ci for every lag.
Examples
Usage
(acf-ci (repeatedly 1000 r/grand) 3)
;;=> {:acf
;;=> (1.0 0.001652674595128032 0.030605610528521364 -0.006874164003129496),
;;=> :ci 0.06197950323045615,
;;=> :cis (0.06197950323045615
;;=> 0.06197967251690712
;;=> 0.062037701604327464
;;=> 0.06204062757534195)}
(acf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:acf
;;=> (1.0 0.5396825396825397 -0.013492063492063475 -0.4666666666666665),
;;=> :ci 0.653321328180018,
;;=> :cis (0.653321328180018
;;=> 0.8218653739461048
;;=> 0.8219599072345126
;;=> 0.9281841012727746)}
adjacent-values
(adjacent-values vs)
(adjacent-values vs estimation-strategy)
(adjacent-values vs q1 q3)
Lower and upper adjacent values (LAV and UAV).
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LAV is smallest value which is greater or equal to the LIF =
(- Q1 (* 1.5 IQR))
. - UAV is largest value which is lower or equal to the UIF =
(+ Q3 (* 1.5 IQR))
. - third value is a median of samples
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
[LAV, UAV]
(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0 2.0]
Gaussian distribution [LAV, UAV]
(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.698318697864716 2.6996062842439654 0.0016124036257504448]
ameasure
(ameasure group1 group2)
Vargha-Delaney A measure for two populations a and b
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(ameasure t c))
;;=> 0.20833333333333334
binary-measures
(binary-measures truth prediction)
(binary-measures truth prediction true-value)
Subset of binary measures. See binary-measures-all.
Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]
Examples
Usage
(binary-measures [true false true false true false true false]
[true false false true false false false true])
;;=> {:accuracy 0.375,
;;=> :f-measure 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fp 2,
;;=> :precision 0.3333333333333333,
;;=> :recall 0.25,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tp 1}
Treat
1
astrue
value.
(binary-measures [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=> :f-measure 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fp 2,
;;=> :precision 0.3333333333333333,
;;=> :recall 0.25,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tp 1}
Treat
:a
and:b
astrue
value.
(binary-measures [:a :b :c :d :e :f :a :b]
[:a :b :a :b :a :f :d :b]
{:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=> :f-measure 0.6,
;;=> :fall-out 0.75,
;;=> :fdr 0.5,
;;=> :fn 1,
;;=> :fp 3,
;;=> :precision 0.5,
;;=> :recall 0.75,
;;=> :sensitivity 0.75,
;;=> :specificity 0.25,
;;=> :tn 1,
;;=> :tp 3}
binary-measures-all
(binary-measures-all truth prediction)
(binary-measures-all truth prediction true-value)
Collection of binary measures.
truth
- list of ground truth valuesprediction
- list of predicted valuestrue-value
- optional, what is true intruth
andprediction
true-value
can be one of:
nil
- values are treating as booleans- any sequence - values from sequence will be treated as
true
- map - conversion will be done according to provided map (if there is no correspondin key, value is treated as
false
)
Examples
Usage
(binary-measures-all [true false true false true false true false]
[true false false true false false false true])
;;=> {:accuracy 0.375,
;;=> :bm -0.25,
;;=> :cn 4.0,
;;=> :cp 4.0,
;;=> :dor 0.3333333333333333,
;;=> :f-beta #,
;;=> :f-measure 0.28571428571428575,
;;=> :f1-score 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fnr 0.75,
;;=> :for 0.6,
;;=> :fp 2,
;;=> :fpr 0.5,
;;=> :hit-rate 0.25,
;;=> :lr+ 0.5,
;;=> :lr- 1.5,
;;=> :mcc -0.2581988897471611,
;;=> :miss-rate 0.75,
;;=> :mk -0.2666666666666666,
;;=> :npv 0.4,
;;=> :pcn 5.0,
;;=> :pcp 3.0,
;;=> :ppv 0.3333333333333333,
;;=> :precision 0.3333333333333333,
;;=> :prevalence 0.5,
;;=> :recall 0.25,
;;=> :selectivity 0.5,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tnr 0.5,
;;=> :total 8.0,
;;=> :tp 1,
;;=> :tpr 0.25}
Treat
1
astrue
value.
(binary-measures-all [1 0 1 0 1 0 1 0] [1 0 0 1 0 0 0 1] [1])
;;=> {:accuracy 0.375,
;;=> :bm -0.25,
;;=> :cn 4.0,
;;=> :cp 4.0,
;;=> :dor 0.3333333333333333,
;;=> :f-beta #,
;;=> :f-measure 0.28571428571428575,
;;=> :f1-score 0.28571428571428575,
;;=> :fall-out 0.5,
;;=> :fdr 0.6666666666666667,
;;=> :fn 3,
;;=> :fnr 0.75,
;;=> :for 0.6,
;;=> :fp 2,
;;=> :fpr 0.5,
;;=> :hit-rate 0.25,
;;=> :lr+ 0.5,
;;=> :lr- 1.5,
;;=> :mcc -0.2581988897471611,
;;=> :miss-rate 0.75,
;;=> :mk -0.2666666666666666,
;;=> :npv 0.4,
;;=> :pcn 5.0,
;;=> :pcp 3.0,
;;=> :ppv 0.3333333333333333,
;;=> :precision 0.3333333333333333,
;;=> :prevalence 0.5,
;;=> :recall 0.25,
;;=> :selectivity 0.5,
;;=> :sensitivity 0.25,
;;=> :specificity 0.5,
;;=> :tn 2,
;;=> :tnr 0.5,
;;=> :total 8.0,
;;=> :tp 1,
;;=> :tpr 0.25}
Treat
:a
and:b
astrue
value.
(binary-measures-all [:a :b :c :d :e :f :a :b]
[:a :b :a :b :a :f :d :b]
{:a true, :b true, :c false})
;;=> {:accuracy 0.5,
;;=> :bm 0.0,
;;=> :cn 4.0,
;;=> :cp 4.0,
;;=> :dor 1.0,
;;=> :f-beta #,
;;=> :f-measure 0.6,
;;=> :f1-score 0.6,
;;=> :fall-out 0.75,
;;=> :fdr 0.5,
;;=> :fn 1,
;;=> :fnr 0.25,
;;=> :for 0.5,
;;=> :fp 3,
;;=> :fpr 0.75,
;;=> :hit-rate 0.75,
;;=> :lr+ 1.0,
;;=> :lr- 1.0,
;;=> :mcc 0.0,
;;=> :miss-rate 0.25,
;;=> :mk 0.0,
;;=> :npv 0.5,
;;=> :pcn 2.0,
;;=> :pcp 6.0,
;;=> :ppv 0.5,
;;=> :precision 0.5,
;;=> :prevalence 0.5,
;;=> :recall 0.75,
;;=> :selectivity 0.25,
;;=> :sensitivity 0.75,
;;=> :specificity 0.25,
;;=> :tn 1,
;;=> :tnr 0.25,
;;=> :total 8.0,
;;=> :tp 3,
;;=> :tpr 0.75}
F-beta is a function. When
beta
is equal1.0
, you getf1-score
.
(let [fbeta (:f-beta (binary-measures-all
[true false true false true false true false]
[true false false true false false false true]))]
[(fbeta 1.0) (fbeta 2.0) (fbeta 0.5)])
;;=> [0.28571428571428575 0.7142857142857144 0.1785714285714286]
bootstrap
(bootstrap vs)
(bootstrap vs samples)
(bootstrap vs samples size)
Generate set of samples of given size from provided data.
Default samples
is 50, number of size
defaults to 1000
Examples
Usage
(bootstrap [1 2 3 4 1 2 3 1 2 1] 2 20)
;;=> ((1.0
;;=> 1.0
;;=> 2.0
;;=> 1.0
;;=> 4.0
;;=> 3.0
;;=> 2.0
;;=> 2.0
;;=> 1.0
;;=> 4.0
;;=> 2.0
;;=> 1.0
;;=> 4.0
;;=> 2.0
;;=> 1.0
;;=> 3.0
;;=> 1.0
;;=> 3.0
;;=> 3.0
;;=> 2.0)
;;=> (1.0
;;=> 4.0
;;=> 1.0
;;=> 2.0
;;=> 1.0
;;=> 3.0
;;=> 2.0
;;=> 1.0
;;=> 1.0
;;=> 1.0
;;=> 1.0
;;=> 2.0
;;=> 2.0
;;=> 1.0
;;=> 2.0
;;=> 3.0
;;=> 1.0
;;=> 1.0
;;=> 2.0
;;=> 3.0))
(let [data [1 2 3 4 1 2 3 1 2 1]
fdata (frequencies data)
bdata (bootstrap data 5 1000)]
{:source fdata, :bootstrapped (map frequencies bdata)})
;;=> {:bootstrapped ({1.0 413, 2.0 294, 3.0 204, 4.0 89}
;;=> {1.0 415, 2.0 285, 3.0 192, 4.0 108}
;;=> {1.0 406, 2.0 297, 3.0 189, 4.0 108}
;;=> {1.0 419, 2.0 300, 3.0 199, 4.0 82}
;;=> {1.0 401, 2.0 322, 3.0 196, 4.0 81}),
;;=> :source {1 4, 2 3, 3 2, 4 1}}
bootstrap-ci
(bootstrap-ci vs)
(bootstrap-ci vs alpha)
(bootstrap-ci vs alpha samples)
(bootstrap-ci vs alpha samples stat-fn)
Bootstrap method to calculate confidence interval.
Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.
Returns ci and statistical function value.
Examples
Usage
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-5.796000000000005 17.8 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [2.5999999999999996 15.280000000000005 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.8 100000)
;;=> [2.5999999999999996 15.7 9.0]
(bootstrap-ci [-5 1 1 1 1 2 2 5 11 71] 0.98 1000 median)
;;=> [-5.0 2.0 1.5]
ci
(ci vs)
(ci vs alpha)
T-student based confidence interval for given data. Alpha value defaults to 0.98.
Last value is mean.
Examples
Usage
(ci [-5 1 1 1 1 2 2 5 11 71])
;;=> [-10.759020390886263 28.759020390886263 9.0]
(ci [-5 1 1 1 1 2 2 5 11 71] 0.8)
;;=> [-0.6855907410547175 18.685590741054718 9.0]
cliffs-delta
(cliffs-delta group1 group2)
Cliff’s delta effect size
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(cliffs-delta t c))
;;=> -0.25
cohens-d
(cohens-d group1 group2)
Cohen’s d effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(cohens-d t c))
;;=> -0.42090943320131763
cohens-d-orig
(cohens-d-orig group1 group2)
Original version of Cohen’s d effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(cohens-d-orig t c))
;;=> -0.39372472247513574
correlation
(correlation vs1 vs2)
Correlation of two sequences.
Examples
Correlation of uniform and gaussian distribution samples.
(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.004322084824287007
covariance
(covariance vs1 vs2)
Covariance of two sequences.
Examples
Covariance of uniform and gaussian distribution samples.
(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.02860827181203021
covariance-matrix
(covariance-matrix vss)
Generate covariance matrix from seq of seqs. Row order.
Examples
Usage
(covariance-matrix [[1 2 3 4 5 11] [3 2 3 2 3 4]])
;;=> ([12.666666666666668 1.8666666666666667]
;;=> [1.8666666666666667 0.5666666666666667])
demean
(demean vs)
Subtract mean from sequence
Examples
Usage
(demean [-5 1 1 1 1 2 2 5 11 71])
;;=> (-14.0 -8.0 -8.0 -8.0 -8.0 -7.0 -7.0 -4.0 2.0 62.0)
estimate-bins
(estimate-bins vs)
(estimate-bins vs bins-or-estimate-method)
Estimate number of bins for histogram.
Possible methods are: :sqrt
:sturges
:rice
:doane
:scott
:freedman-diaconis
(default).
Examples
Estimate number of bins for various methods.
vs
contains 1000 random samples from Log-Normal distribution.
(estimate-bins vs :sqrt)
;;=> 31
(estimate-bins vs :sturges)
;;=> 11
(estimate-bins vs :rice)
;;=> 20
(estimate-bins vs :doane)
;;=> 16
(estimate-bins vs :scott)
;;=> 34
(estimate-bins vs :freedman-diaconis)
;;=> 81
estimation-strategies-list
Examples
List of estimation strategies for percentile
(sort (keys estimation-strategies-list))
;;=> (:legacy :r1 :r2 :r3 :r4 :r5 :r6 :r7 :r8 :r9)
extent
(extent vs)
Return extent (min, max, mean) values from sequence
Examples
min/max and mean of gaussian distribution
(extent (repeatedly 100000 r/grand))
;;=> [-4.433795945695126 4.1230995512178925 -0.0030623577039036527]
glass-delta
(glass-delta group1 group2)
Glass’s delta effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(glass-delta t c))
;;=> -0.3849741916091626
hedges-g
(hedges-g group1 group2)
Hedges’s g effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(hedges-g t c))
;;=> -0.3907787841275092
hedges-g*
(hedges-g* group1 group2)
Less biased Hedges’s g effect size for two groups
Examples
Usage
(let [t [10 10 20 20 20 30 30 30 40 50]
c [10 20 30 40 40 50]]
(hedges-g* t c))
;;=> -0.36946357772055416
histogram
(histogram vs)
(histogram vs bins-or-estimate-method)
(histogram vs bins [mn mx])
Calculate histogram.
Returns map with keys:
:size
- number of bins:step
- distance between bins:bins
- list of pairs of range lower value and number of hits:min
- min value:max
- max value:samples
- number of used samples
For estimation methods check estimate-bins.
Examples
3 bins from uniform distribution.
(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([3.852964808315207E-5 333]
;;=> [0.3324743943078295 344]
;;=> [0.6649102589675758 323]),
;;=> :max 0.9973461236273221,
;;=> :min 3.852964808315207E-5,
;;=> :samples 1000,
;;=> :size 3,
;;=> :step 0.33243586465974634}
3 bins from uniform distribution for given range.
(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins ([0.1 1325] [0.2333333333333334 1305] [0.3666666666666668 1362]),
;;=> :max 0.5000000000000001,
;;=> :min 0.1,
;;=> :samples 3992,
;;=> :size 3,
;;=> :step 0.1333333333333334}
5 bins from normal distribution.
(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.6207687808508426 156]
;;=> [-2.189509070713405 2083]
;;=> [-0.7582493605759675 5213]
;;=> [0.6730103495614701 2348]
;;=> [2.1042700596989077 200]),
;;=> :max 3.535529769836345,
;;=> :min -3.6207687808508426,
;;=> :samples 10000,
;;=> :size 5,
;;=> :step 1.4312597101374376}
Estimate number of bins
(:size (histogram (repeatedly 10000 r/grand)))
;;=> 63
Estimate number of bins, Rice rule
(:size (histogram (repeatedly 10000 r/grand) :rice))
;;=> 44
iqr
(iqr vs)
(iqr vs estimation-strategy)
Interquartile range.
Examples
IQR
(iqr (repeatedly 100000 r/grand))
;;=> 1.3440848153319385
jensen-shannon-divergence
(jensen-shannon-divergence vs1 vs2)
Jensen-Shannon divergence of two sequences.
Examples
Jensen-Shannon divergence
(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
(repeatedly 100 (fn* [] (r/irand 100))))
;;=> 569.0495492365783
kendall-correlation
(kendall-correlation vs1 vs2)
Kendall’s correlation of two sequences.
Examples
Kendall’s correlation of uniform and gaussian distribution samples.
(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.0013475046750467505
kullback-leibler-divergence
(kullback-leibler-divergence vs1 vs2)
Kullback-Leibler divergence of two sequences.
Examples
Kullback-Leibler divergence.
(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
(repeatedly 100 (fn* [] (r/irand 100))))
;;=> 2974.653698088623
kurtosis
(kurtosis vs)
Calculate kurtosis from sequence.
Examples
Kurtosis
(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099
mad-extent
(mad-extent vs)
-/+ median-absolute-deviation and median
Examples
median absolute deviation from median for gaussian distribution
(mad-extent (repeatedly 100000 r/grand))
;;=> [-0.6674624646061628 0.6788112117843635 0.005674373589100425]
maximum
(maximum vs)
Maximum value from sequence.
Examples
Maximum value
(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0
mean
(mean vs)
Calculate mean of vs
Examples
Mean (average value)
(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109
median
(median vs)
Calculate median of vs
. See median-3.
Examples
Median (percentile 50%).
(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0
For three elements use faster median-3.
(median [7 1 4])
;;=> 4.0
median-3
(median-3 a b c)
Median of three values. See median.
Examples
Median of [7 1 4]
(median-3 7 1 4)
;;=> 4.0
median-absolute-deviation
(median-absolute-deviation vs)
Calculate MAD
Examples
MAD
(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0
minimum
(minimum vs)
Minimum value from sequence.
Examples
Minimum value
(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0
mode
(mode vs)
Find the value that appears most often in a dataset vs
.
See also modes.
Examples
Example
(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0
Returns lowest value when every element appears equally.
(mode [5 1 2 3 4])
;;=> 1.0
modes
(modes vs)
Find the values that appears most often in a dataset vs
.
Returns sequence with all most appearing values in increasing order.
See also mode.
Examples
Example
(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)
Returns lowest value when every element appears equally.
(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)
moment
(moment vs)
(moment vs order)
(moment vs order {:keys [absolute? center mean?], :or {absolute? false, center nil, mean? true}})
Calculate moment (central or/and absolute) of given order (default: 2).
Additional parameters as a map:
:absolute?
- calculate sum as absolute values (default:false
):mean?
- returns mean (proper moment) or just sum of differences (default:true
):center
- value of central (default:nil
= mean)
Examples
Usage
(moment [3 7 5 9 -8])
;;=> 35.36
(moment [3 7 5 9 -8] 1.0)
;;=> 0.0
(moment [3 7 5 9 -8] 4.0)
;;=> 3417.171199999999
(moment [3 7 5 9 -8] 3.0)
;;=> -229.82399999999993
(moment [3 7 5 9 -8] 3.0 {:center 0.0})
;;=> 142.4
(moment [3 7 5 9 -8] 3.0 {:mean? false})
;;=> -1149.1199999999997
(moment [3 7 5 9 -8] 3.0 {:absolute? true})
;;=> 332.15039999999993
(moment [3 7 5 9 -8] 3.0 {:center -3.0})
;;=> 666.2
(moment [3 7 5 9 -8] 0.5 {:absolute? true})
;;=> 1.8986344545712772
outliers
(outliers vs)
(outliers vs estimation-strategy)
(outliers vs q1 q3)
Find outliers defined as values outside outer fences.
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LIF (Lower Outer Fence) equals
(- Q1 (* 1.5 IQR))
. - UIF (Upper Outer Fence) equals
(+ Q3 (* 1.5 IQR))
.
Returns sequence.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
Outliers
(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)
Gaussian distribution outliers
(count (outliers (repeatedly 3000000 r/grand)))
;;=> 20845
pacf
(pacf data)
(pacf data lags)
Examples
Usage
(pacf (repeatedly 1000 r/grand) 10)
;;=> (0.0
;;=> -0.026736190605719863
;;=> 0.010877194279486118
;;=> 0.032203194789526435
;;=> -0.035726254389457764
;;=> -0.002736273319801345
;;=> -0.05322100355959638
;;=> 0.016135531529402884
;;=> 0.008998546289145027
;;=> -0.0614874305488179
;;=> 0.010613174279058944)
(pacf [1 2 3 4 5 4 3 2 1])
;;=> (0.0
;;=> 0.5396825396825397
;;=> -0.4299857803057234
;;=> -0.388084834596935
;;=> -0.2792571208141194
;;=> 0.17585056996358742
;;=> -0.2652225487589841
;;=> -0.17978918763554708
;;=> -0.10771973872263883)
pacf-ci
(pacf-ci data lags)
(pacf-ci data lags alpha)
pacf with added confidence interval data.
Examples
Usage
(pacf-ci (repeatedly 1000 r/grand) 3)
;;=> {:ci 0.06197950323045615,
;;=> :pacf
;;=> (0.0 -0.003069469436674989 0.019682898818580288 0.008559579552033444)}
(pacf-ci [1 2 3 4 5 4 3 2 1] 3)
;;=> {:ci 0.653321328180018,
;;=> :pacf (0.0 0.5396825396825397 -0.4299857803057234 -0.388084834596935)}
pearson-correlation
(pearson-correlation vs1 vs2)
Pearson’s correlation of two sequences.
Examples
Pearson’s correlation of uniform and gaussian distribution samples.
(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0013362357862912368
percentile
(percentile vs p)
(percentile vs p estimation-strategy)
Examples
Percentile 25%
(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0
Percentile 50% (median)
(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0
Percentile 75%
(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0
Percentile 90%
(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0
Various estimation strategies.
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007
percentile-extent
(percentile-extent vs)
(percentile-extent vs p)
(percentile-extent vs p1 p2)
(percentile-extent vs p1 p2 estimation-strategy)
Return percentile range and median.
p
- calculates extent of p
and 100-p
(default: p=25
)
Examples
for samples from gaussian distribution
(percentile-extent (repeatedly 100000 r/grand))
;;=> [-0.6757379614999778 0.6743169986727071 -6.525193090059774E-4]
(percentile-extent (repeatedly 100000 r/grand) 10)
;;=> [-1.2857354257915978 1.2809543542935327 -0.005038343134417929]
(percentile-extent (repeatedly 100000 r/grand) 30 70)
;;=> [-0.5245075255533502 0.522954632601382 -6.971662739497372E-4]
percentiles
(percentiles vs ps)
(percentiles vs ps estimation-strategy)
Examples
Usage
(percentiles [1 2 3 -1 -1 2 -1 11 111] [25 50 75 90])
;;=> [-1.0 2.0 7.0 111.0]
population-stddev
(population-stddev vs)
(population-stddev vs u)
Calculate population standard deviation of vs
.
See stddev.
Examples
Population standard deviation.
(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403
population-variance
(population-variance vs)
(population-variance vs u)
Calculate population variance of vs
.
See variance.
Examples
Population variance
(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543
quantile
(quantile vs q)
(quantile vs q estimation-strategy)
Calculate quantile of a vs
.
Quantile q
is from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentile.
Examples
Quantile 0.25
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0
Quantile 0.5 (median)
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0
Quantile 0.75
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0
Quantile 0.9
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0
Various estimation strategies.
(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966
quantiles
(quantiles vs qs)
(quantiles vs qs estimation-strategy)
Calculate quantiles of a vs
.
Quantilizes is sequence with values from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentiles.
Examples
Usage
(quantiles [1 2 3 -1 -1 2 -1 11 111] [0.25 0.5 0.75 0.9])
;;=> [-1.0 2.0 7.0 111.0]
sem
(sem vs)
Standard error of mean
Examples
SEM
(sem [1 2 3 -1 -1 2 -1 11 111])
;;=> 12.174021115615695
sem-extent
(sem-extent vs)
-/+ sem and mean
Examples
standard error of mean and mean for gaussian distribution
(sem-extent (repeatedly 100000 r/grand))
;;=> [-0.0041852769369816285 0.002121789728373999 -0.001031743604303815]
skewness
(skewness vs)
Calculate kurtosis from sequence.
Examples
Skewness
(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954
spearman-correlation
(spearman-correlation vs1 vs2)
Spearman’s correlation of two sequences.
Examples
Spearsman’s correlation of uniform and gaussian distribution samples.
(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -2.2283008578278884E-4
standardize
(standardize vs)
Normalize samples to have mean = 0 and stddev = 1.
Examples
Standardize
(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=> -0.33161081278713267
;;=> -0.30423010347443363
;;=> -0.4137529407252298
;;=> -0.4137529407252298
;;=> -0.33161081278713267
;;=> -0.4137529407252298
;;=> -0.08518442897284138
;;=> 2.652886502297062)
stats-map
(stats-map vs)
(stats-map vs estimation-strategy)
Calculate several statistics of vs
and return as map.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
Stats
(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=> :Kurtosis 8.732515263272099,
;;=> :LAV -1.0,
;;=> :LIF -13.0,
;;=> :LOF -25.0,
;;=> :MAD 3.0,
;;=> :Max 111.0,
;;=> :Mean 14.11111111111111,
;;=> :Median 2.0,
;;=> :Min -1.0,
;;=> :Mode -1.0,
;;=> :Outliers (111.0),
;;=> :Q1 -1.0,
;;=> :Q3 7.0,
;;=> :Range 112.0,
;;=> :SD 36.522063346847084,
;;=> :SEM 12.174021115615695,
;;=> :Size 9,
;;=> :Skewness 2.94268445417954,
;;=> :Total 127.0,
;;=> :UAV 11.0,
;;=> :UIF 19.0,
;;=> :UOF 31.0,
;;=> :Variance 1333.8611111111113}
stddev
(stddev vs)
(stddev vs u)
Calculate standard deviation of vs
.
See population-stddev.
Examples
Standard deviation.
(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084
stddev-extent
(stddev-extent vs)
-/+ stddev and mean
Examples
standard deviation from mean and mean for gaussian distribution
(stddev-extent (repeatedly 100000 r/grand))
;;=> [-1.0008742722153459 0.994022466044159 -0.0034259030855934678]
ttest-one-sample
(ttest-one-sample xs)
(ttest-one-sample xs {:keys [alpha sides mu], :or {alpha 0.05, sides :two-sided, mu 0.0}})
One-sample Student’s t-test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
,:one-sided-less
(short::one-sided
) or:one-sided-greater
mu
- mean (default:0.0
)
Examples
Usage
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10])
;;=> {:confidence-intervals [3.3341494103317983 7.665850589668201],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 2.781960110481859E-4,
;;=> :t 5.744562646538029,
;;=> :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:alpha 0.2})
;;=> {:confidence-intervals [4.175850795053416 6.824149204946584],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 2.781960110481859E-4,
;;=> :t 5.744562646538029,
;;=> :test-type :two-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:sides :one-sided})
;;=> {:confidence-intervals [##-Inf 7.255072013309326],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 0.9998609019944759,
;;=> :t 5.744562646538029,
;;=> :test-type :one-sided}
(ttest-one-sample [1 2 3 4 5 6 7 8 9 10] {:mu 5.0})
;;=> {:confidence-intervals [3.334149410331798 7.665850589668201],
;;=> :df 9,
;;=> :estimated-mu 5.5,
;;=> :p-value 0.6141172548083933,
;;=> :t 0.5222329678670935,
;;=> :test-type :two-sided}
ttest-two-samples
(ttest-two-samples xs ys)
(ttest-two-samples xs ys {:keys [alpha sides mu paired? equal-variances?], :or {alpha 0.05, sides :two-sided, mu 0.0, paired? false, equal-variances? false}, :as params})
Two-sample Student’s t-test
alpha
- significance level (default:0.05
)sides
- one of::two-sided
,:one-sided-less
(short::one-sided
) or:one-sided-greater
mu
- mean (default:0.0
)paired?
- unpaired or paired test, boolean (default:false
)equal-variances?
- unequal or equal variances, boolean (default:false
)
Examples
Usage
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[7 8 9 10 11 12 13 14 15 16 17 18 19 20])
;;=> {:confidence-intervals [-11.052801725158163 -4.9471982748418375],
;;=> :df 21.982212340188994,
;;=> :estimated-mu [5.5 13.5],
;;=> :p-value 1.8552818325118146E-5,
;;=> :paired? false,
;;=> :t -5.4349297638940595,
;;=> :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[7 8 9 10 11 12 13 14 15 16 17 18 19 20 200])
;;=> {:confidence-intervals [-47.242899887102105 6.376233220435439],
;;=> :df 14.164598953012467,
;;=> :estimated-mu [5.5 25.93333333333333],
;;=> :p-value 0.12451349808974498,
;;=> :paired? false,
;;=> :t -1.632902633201205,
;;=> :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[7 8 9 10 11 12 13 14 15 16 17 18 19 20]
{:equal-variances? true})
;;=> {:confidence-intervals [-11.22324472988163 -4.77675527011837],
;;=> :df 22.0,
;;=> :estimated-mu [5.5 13.5],
;;=> :p-value 3.690577215911943E-5,
;;=> :paired? false,
;;=> :t -5.147292847304685,
;;=> :test-type :two-sided}
(ttest-two-samples [1 2 3 4 5 6 7 8 9 10]
[200 11 200 11 200 11 200 11 200 11]
{:paired? true})
;;=> {:confidence-intervals [-171.66671936335894 -28.333280636641092],
;;=> :df 9,
;;=> :estimated-mu -100.0,
;;=> :p-value 0.011615504295919215,
;;=> :paired? true,
;;=> :t -3.156496045715208,
;;=> :test-type :two-sided}
variance
(variance vs)
(variance vs u)
Calculate variance of vs
.
See population-variance.
Examples
Variance.
(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111