fastmath.stats.bootstrap

Bootstrap methods and confidence intervals

bootstrap

(bootstrap input)(bootstrap input statistic)(bootstrap input statistic {:keys [rng samples size method antithetic? smoothing dimensions include?], :or {samples 500}, :as params})

Create set of samples from given data (nonparametric) or model (parametric).

Input: * sequence of values (any type) or sequence of sequences for multidimensional data * a map containing: * :data - sequence * :model - model for parametric bootstrap (optional) * statistic function which returns statistic value (optional)

Parameters: * :samples - number of bootstrapped samples (default: 500) * :size - forced size of individual sample (default: same as source) * :method * nil (default) - random * :jackknife for leave-one-out jackknife * :jackknife+ for positive jackknife * any method accepted by fastmath.random/->seq * :rng - random number generator (see: fastmath.random/rng) * :smoothing - smoothing bootstrap: * :kde - kernel density estimation, additional options are: :kernel (default) and :bandwidth (auto) * :gaussian - add random value from N(0,standard error) * :distribution - distribution used to auto generate model (distribution) from data: * :real-discrete-distribution - default * :integer-discrete-distribution - for integer values * :categorical-distribution - for any other type * :dimensions - if set to :multi - multidimensional data and models are created * :antithetic? - antithetic sampling (default: false) * :include? - if set to true (default: false) original dataset is included in samples

Model can be: * any distribution object * any 0-arity function which returns random sample

When model is ommited, function creates discrete distribution. When multidimensional data are provided, models should be created for every dimension (as a sequence).

bootstrap-stats

(bootstrap-stats {:keys [data samples], :as input} statistic)

Calculate bootstrap analysis.

Arguments:

  • a map containing :data and :samples
  • statistic - bootstrap statistic

Returns a map containing:

  • :data, :samples, model and :statistic - from input
  • :t0 - statistic from data (single value)
  • :ts - statistic from bootstrap samples (sequence)
  • :bias - difference between mean of ts and t0
  • :mean, :median, :variance, :stddev, :sem - statistics from ts

ci-basic

(ci-basic boot-data)(ci-basic boot-data alpha)(ci-basic {:keys [t0 ts]} alpha estimation-strategy)

Basic percentile confidence interval

:t0 and :ts are obligatory

ci-bc

(ci-bc boot-data)(ci-bc boot-data alpha)(ci-bc {:keys [t0 ts]} alpha estimation-strategy)

Bias-corrected confidence interval

ci-bca

(ci-bca boot-data)(ci-bca boot-data alpha)(ci-bca {:keys [t0 ts data statistic]} alpha estimation-strategy)

Bias-corrected and accelerated confidence interval.

There are two ways to calculate acceleration: * jackknife method (when boot-data contains :data and :statistic) * empirical from bootstrap estimations ts otherwise

ci-normal

(ci-normal boot-data)(ci-normal {:keys [t0 ts stddev bias]} alpha)

Normal (gaussian) bias-corrected confidence interval

:t0 and :ts are obligatory

ci-percentile

(ci-percentile boot-data)(ci-percentile boot-data alpha)(ci-percentile {:keys [t0 ts]} alpha estimation-strategy)

Percentile confidence interval

:t0 and :ts are obligatory

ci-studentized

(ci-studentized boot-data)(ci-studentized boot-data alpha)(ci-studentized {:keys [t0 ts data samples]} alpha estimation-strategy)

Confidence interval from studentized data.

:t0, :ts, :data and :samples are obligatory in boot-data

ci-t

(ci-t boot-data)(ci-t {:keys [t0 ts stddev]} alpha)

Student’s T confidence interval.

:t0 and :ts are obligatory

jackknife

(jackknife vs)

Generates set of samples using jackknife leave-one-out method

jackknife+

(jackknife+ vs)

Generates set of samples using jackknife positive method