fastmath.classification
Classification algorithms.
Input data
- features - sequence of sequences of numbers
- categories - sequence of any values
Workflow
- create classifier with parameters
- cross validate cv
- repeat or predict
- to validate model against test data, call validate
Classifier parameters are map of values specific for given algorithm. Check documentation in backend library to find documentation. Classifier can be retrained using train. New instance will be created.
Classifier training is delayed to the actual use. To force training, call train or predict.
Implementation notes
- only doubles as input data
- categories can be any type
Cross validation
Every classifier exposes it’s own cross validation method with configuration cv.
Additionally three Clojure level methods are defined: cv, loocv and bootstrap.
SMILE
What is missed:
- other types than doubles (attributes)
- maxent
- Online classifiers
- General Naive Bayes
Native cross validation config is a map with keys:
:k
- number of folds (default: 10):type
- type of cross validation, one of:cv
(default),:loocv
and:bootstrap
Liblinear
https://github.com/bwaldvogel/liblinear-java
Native cross validation expects a number as number of folds (default: 10)
Examples
Iris database is used.
Categories
- Classification: ada-boost decision-tree fld gradient-tree-boost knn lda liblinear logistic-regression naive-bayes neural-net qda random-forest rbf-network rda svm
- Validation: accuracy confusion-map validate
Other vars: activation-functions-list backend bayes-models-list cv data-native error-functions-list labels liblinear-solver-list model-native multiclass-strategies-list predict predict-all split-rules-list train
accuracy
(accuracy t p)
Calculate accuracy for real and predicted sequences.
Examples
Usage
(accuracy [1 2 3 4 1 2 3 4] [1 2 4 4 2 4 4 4])
;;=> 0.5
activation-functions-list
List of activation functions for neural-net.
Examples
Names
activation-functions-list
;;=> (:linear :logistic-sigmoid :soft-max)
ada-boost
(ada-boost x y)
(ada-boost {:keys [number-of-trees max-nodes], :or {number-of-trees 500, max-nodes 2}} x y)
ada-boost classifier. Backend library: smile
Examples
Usage
(let [cl (ada-boost train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
backend
(backend model)
Return name of backend library
Examples
Usage
(backend (knn test-data test-labels))
;;=> :smile
(backend (liblinear test-data test-labels))
;;=> :liblinear
bayes-models-list
List of naive-bayes models.
Examples
Names
bayes-models-list
;;=> (:multinomial :bernoulli :polyaurn)
confusion-map
(confusion-map t p)
Create confusion map where keys are pairs of [truth-label prediction-label]
Examples
Usage
(let [cl (liblinear {:solver :l1r-lr, :C 10} train-data train-labels)
pred (predict-all cl test-data)]
(confusion-map test-labels pred))
;;=> {[:setosa :setosa] 13,
;;=> [:versicolor :versicolor] 17,
;;=> [:virginica :virginica] 15}
cv
(cv model)
(cv model params)
Cross validation
Examples
Usage
(cv (knn train-data train-labels))
;;=> {:accuracy 0.9333333333333333}
(cv (knn test-data test-labels) {:type :loocv})
;;=> {:accuracy 1.0}
(cv (knn test-data test-labels) {:type :bootstrap})
;;=> {:accuracy {:mean 0.9933333333333333, :stddev 0.02108185106778919}}
(cv (liblinear test-data test-labels))
;;=> {:accuracy 0.9555555555555556}
data-native
(data-native model)
Return data transformed for backend library.
Examples
Usage
(data-native (knn [[1 2] [3 2]] [0 1]))
;;=> [[[1.0 2.0] [3.0 2.0]] [0 1]]
decision-tree
(decision-tree x y)
(decision-tree {:keys [max-nodes node-size split-rule], :or {max-nodes 100, node-size 1, split-rule :gini}} x y)
decision-tree classifier. Backend library: smile
Examples
Usage
(let [cl (decision-tree train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
error-functions-list
List of error functions for neural-net.
Examples
Names
error-functions-list
;;=> (:least-mean-squares :cross-entropy)
fld
(fld x y)
(fld {:keys [dimensionality tolerance], :or {dimensionality -1, tolerance 1.0E-4}} x y)
fld classifier. Backend library: smile
Examples
Usage
(let [cl (fld train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 2,
;;=> :data ([6 3 4.8 1.8] [6.1 3 4.9 1.8]),
;;=> :prediction (:versicolor :versicolor),
;;=> :truth (:virginica :virginica)},
;;=> :stats {:accuracy 0.9555555555555556}}
gradient-tree-boost
(gradient-tree-boost x y)
(gradient-tree-boost {:keys [number-of-trees shrinkage max-nodes subsample], :or {number-of-trees 500, shrinkage 0.005, max-nodes 6, subsample 0.7}} x y)
gradient-tree-boost classifier. Backend library: smile
Examples
Usage
(let [cl (gradient-tree-boost train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
knn
(knn x y)
(knn {:keys [distance k], :or {distance (EuclideanDistance.), k 1}} x y)
knn classifier. Backend library: smile
Examples
Usage
(let [cl (knn train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
Different distance
(let [cl (knn {:distance dist/cosine} train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 3,
;;=> :data ([6.3 2.9 5.6 1.8] [5.5 2.6 4.4 1.2] [7.2 3.2 6 1.8]),
;;=> :prediction (:versicolor :virginica :versicolor),
;;=> :truth (:virginica :versicolor :virginica)},
;;=> :stats {:accuracy 0.9333333333333333}}
labels
(labels ys)
Return labels
Examples
Usage
(labels train-labels)
;;=> [:setosa :versicolor :virginica]
lda
(lda x y)
(lda {:keys [priori tolerance], :or {priori nil, tolerance 1.0E-4}} x y)
lda classifier. Backend library: smile
Examples
Usage
(let [cl (lda train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 3,
;;=> :data ([5.7 2.5 5 2] [6.3 2.5 5 1.9] [6.4 3.2 4.5 1.5]),
;;=> :prediction (:versicolor :versicolor :virginica),
;;=> :truth (:virginica :virginica :versicolor)},
;;=> :stats {:accuracy 0.9333333333333333}}
liblinear
(liblinear x y)
(liblinear {:keys [solver bias C eps max-iters p weights], :or {solver :l2r-l2loss-svc-dual, bias -1, C 1.0, eps 0.01, max-iters 1000, p 0.1}} x y)
liblinear classifier. Backend library: liblinear
Examples
Usage
(let [cl (liblinear train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([5.6 3 4.5 1.5]),
;;=> :prediction (:virginica),
;;=> :truth (:versicolor)},
;;=> :stats {:accuracy 0.9777777777777777}}
Different solver
(let [cl (liblinear {:solver :l1r-lr, :C 10} train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 0, :data (), :prediction (), :truth ()},
;;=> :stats {:accuracy 1.0}}
liblinear-solver-list
List of liblinear solvers.
Examples
Names
liblinear-solver-list
;;=> (:l2r-lr :l2r-l2loss-svc-dual
;;=> :l2r-l2loss-svc :l2r-l1loss-svc-dual
;;=> :mcsvm-cs :l1r-l2loss-svc
;;=> :l1r-lr :l2r-lr-dual)
logistic-regression
(logistic-regression x y)
(logistic-regression {:keys [lambda tolerance max-iterations], :or {lambda 0.0, tolerance 1.0E-5, max-iterations 500}} x y)
logistic-regression classifier. Backend library: smile
Examples
Usage
(let [cl (logistic-regression train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
model-native
(model-native model)
Return trained model as a backend class.
Examples
Usage
(model-native (knn train-data train-labels))
;;=> smile.classification.KNN@36f46335
(model-native (liblinear train-data train-labels))
;;=> Model bias=-1.0 nr_class=3 nr_feature=4 solverType=L2R_L2LOSS_SVC_DUAL
multiclass-strategies-list
List of multiclass strategies for svm
Examples
Names
multiclass-strategies-list
;;=> (:one-vs-one :one-vs-all)
naive-bayes
(naive-bayes x y)
(naive-bayes {:keys [model priori sigma], :or {model :bernoulli, sigma 1.0}} x y)
naive-bayes classifier. Backend library: smile
Examples
Usage
(let [cl (naive-bayes train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 32,
;;=> :data ([6.3 2.9 5.6 1.8]
;;=> [6 3 4.8 1.8]
;;=> [6.6 2.9 4.6 1.3]
;;=> [6.9 3.1 5.1 2.3]
;;=> [5.8 2.6 4 1.2]
;;=> [5.8 2.7 3.9 1.2]
;;=> [5.6 2.8 4.9 2]
;;=> [5.9 3 5.1 1.8]
;;=> [5.7 2.5 5 2]
;;=> [6.1 3 4.9 1.8]
;;=> [6.5 3 5.5 1.8]
;;=> [5.6 2.7 4.2 1.3]
;;=> [5.7 2.8 4.1 1.3]
;;=> [6.1 2.9 4.7 1.4]
;;=> [6 2.2 4 1]
;;=> [5.5 2.4 3.8 1.1]
;;=> [5.5 2.6 4.4 1.2]
;;=> [6.1 2.8 4 1.3]
;;=> [6.2 3.4 5.4 2.3]
;;=> [6.3 3.3 6 2.5]
;;=> [5.1 2.5 3 1.1]
;;=> [6.8 3.2 5.9 2.3]
;;=> [5 2.3 3.3 1]
;;=> [7.2 3.2 6 1.8]
;;=> [6.1 2.8 4.7 1.2]
;;=> [5.6 3 4.5 1.5]
;;=> [6.3 2.5 5 1.9]
;;=> [5.8 2.7 5.1 1.9]
;;=> [6.7 3 5.2 2.3]
;;=> [6.4 3.2 4.5 1.5]
;;=> [5.7 2.8 4.5 1.3]
;;=> [6.8 2.8 4.8 1.4]),
;;=> :prediction (:setosa
;;=> :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa
;;=> :setosa :setosa),
;;=> :truth (:virginica
;;=> :virginica
;;=> :versicolor :virginica
;;=> :versicolor :versicolor
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :versicolor
;;=> :versicolor :versicolor
;;=> :versicolor :versicolor
;;=> :versicolor :versicolor
;;=> :virginica :virginica
;;=> :versicolor :virginica
;;=> :versicolor :virginica
;;=> :versicolor :versicolor
;;=> :virginica :virginica
;;=> :virginica :versicolor
;;=> :versicolor :versicolor)},
;;=> :stats {:accuracy 0.28888888888888886}}
neural-net
(neural-net x y)
(neural-net {:keys [error-function activation-function layers learning-rate momentum weight-decay number-of-epochs], :or {error-function :cross-entropy, learning-rate 0.1, momentum 0.0, weight-decay 0.0, number-of-epochs 25}} x y)
neural-net classifier. Backend library: smile
Examples
Usage
(let [cl (neural-net {:layers [100 100]} train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 0, :data (), :prediction (), :truth ()},
;;=> :stats {:accuracy 1.0}}
Bad model
(let [cl (neural-net {:layers [3 3 3], :number-of-epochs 3}
train-data
train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 30,
;;=> :data ([5.2 3.4 1.4 0.2]
;;=> [4.8 3.4 1.6 0.2]
;;=> [5.4 3.9 1.7 0.4]
;;=> [6.6 2.9 4.6 1.3]
;;=> [4.7 3.2 1.3 0.2]
;;=> [5.8 2.6 4 1.2]
;;=> [5.8 2.7 3.9 1.2]
;;=> [4.4 3 1.3 0.2]
;;=> [4.9 3.1 1.5 0.1]
;;=> [5.6 2.7 4.2 1.3]
;;=> [5.7 2.8 4.1 1.3]
;;=> [6.1 2.9 4.7 1.4]
;;=> [6 2.2 4 1]
;;=> [5.5 2.4 3.8 1.1]
;;=> [5.5 2.6 4.4 1.2]
;;=> [5.7 3.8 1.7 0.3]
;;=> [5 3.4 1.6 0.4]
;;=> [6.1 2.8 4 1.3]
;;=> [4.8 3.1 1.6 0.2]
;;=> [5.1 2.5 3 1.1]
;;=> [4.6 3.1 1.5 0.2]
;;=> [5 2.3 3.3 1]
;;=> [6.1 2.8 4.7 1.2]
;;=> [5.6 3 4.5 1.5]
;;=> [5 3.4 1.5 0.2]
;;=> [5.5 3.5 1.3 0.2]
;;=> [6.4 3.2 4.5 1.5]
;;=> [5.7 2.8 4.5 1.3]
;;=> [4.4 2.9 1.4 0.2]
;;=> [6.8 2.8 4.8 1.4]),
;;=> :prediction (:virginica
;;=> :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :virginica),
;;=> :truth (:setosa
;;=> :setosa
;;=> :setosa :versicolor
;;=> :setosa :versicolor
;;=> :versicolor :setosa
;;=> :setosa :versicolor
;;=> :versicolor :versicolor
;;=> :versicolor :versicolor
;;=> :versicolor :setosa
;;=> :setosa :versicolor
;;=> :setosa :versicolor
;;=> :setosa :versicolor
;;=> :versicolor :versicolor
;;=> :setosa :setosa
;;=> :versicolor :versicolor
;;=> :setosa :versicolor)},
;;=> :stats {:accuracy 0.33333333333333337}}
predict
(predict model v)
(predict model v posteriori?)
Predict categories for given vector. If posteriori?
is true returns also posteriori probability (default false
).
Examples
Usage
(predict (knn train-data train-labels) [1 2 3 4])
;;=> :virginica
(predict (ada-boost train-data train-labels) [1 2 3 4] true)
;;=> [:virginica
;;=> {:setosa 0.06483835192928034,
;;=> :versicolor 0.42084587807036844,
;;=> :virginica 0.5143157700003511}]
predict-all
(predict-all model v)
(predict-all model v posteriori?)
Predict categories for given sequence of vectors. If posteriori?
is true returns also posteriori probability (default false
).
Examples
Usage
(predict-all (knn train-data train-labels) (take 3 test-data))
;;=> (:virginica :setosa :setosa)
(predict-all (ada-boost train-data train-labels)
(take 3 test-data)
true)
;;=> ([:virginica
;;=> {:setosa 0.0,
;;=> :versicolor 0.4029298073416867,
;;=> :virginica 0.5970701926583133}]
;;=> [:setosa
;;=> {:setosa 0.5281866471546287,
;;=> :versicolor 0.3299864055545864,
;;=> :virginica 0.141826947290785}]
;;=> [:setosa
;;=> {:setosa 0.5302410437446485,
;;=> :versicolor 0.28859523953387023,
;;=> :virginica 0.18116371672148113}])
qda
(qda x y)
(qda {:keys [priori tolerance], :or {priori nil, tolerance 1.0E-4}} x y)
qda classifier. Backend library: smile
Examples
Usage
(let [cl (qda train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 0, :data (), :prediction (), :truth ()},
;;=> :stats {:accuracy 1.0}}
random-forest
(random-forest x y)
(random-forest {:keys [number-of-trees split-rule mtry node-size max-nodes subsample], :or {number-of-trees 500, split-rule :gini, node-size 1, max-nodes 100, subsample 1.0}} x y)
random-forest classifier. Backend library: smile
Examples
Usage
(let [cl (random-forest train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
rbf-network
(rbf-network x y)
(rbf-network {:keys [distance rbf number-of-basis normalize?], :or {distance dist/euclidean, number-of-basis 10, normalize? false}} x y)
rbf-network classifier. Backend library: smile
Examples
Usage
(let [cl (rbf-network train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
Custom rbfs
(let [cl (rbf-network
{:rbf (take 5 (cycle [(k/rbf :linear) (k/rbf :wendland-20)]))}
train-data
train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 7,
;;=> :data ([6 3 4.8 1.8]
;;=> [5.6 2.8 4.9 2]
;;=> [5.9 3 5.1 1.8]
;;=> [5.7 2.5 5 2]
;;=> [6.1 3 4.9 1.8]
;;=> [5.8 2.7 5.1 1.9]
;;=> [6.8 2.8 4.8 1.4]),
;;=> :prediction (:versicolor :versicolor :versicolor
;;=> :versicolor :versicolor
;;=> :versicolor :virginica),
;;=> :truth (:virginica :virginica :virginica
;;=> :virginica :virginica
;;=> :virginica :versicolor)},
;;=> :stats {:accuracy 0.8444444444444444}}
rda
(rda x y)
(rda {:keys [alpha priori tolerance], :or {alpha 0.9, tolerance 1.0E-4}} x y)
rda classifier. Backend library: smile
Examples
Usage
(let [cl (rda train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 2,
;;=> :data ([5.9 3 5.1 1.8] [6.1 2.8 4.7 1.2]),
;;=> :prediction (:versicolor :virginica),
;;=> :truth (:virginica :versicolor)},
;;=> :stats {:accuracy 0.9555555555555556}}
split-rules-list
List of split rules for decision tree and random-forest
Examples
Names
split-rules-list
;;=> (:gini :entropy :classification-error)
svm
(svm x y)
(svm {:keys [kernel c-or-cp cn strategy-for-multiclass class-weights tolerance epochs], :or {kernel (k/kernel :linear), c-or-cp 1.0, cn 1.0, strategy-for-multiclass :one-vs-one, tolerance 0.001, epochs 2}} x y)
svm classifier. Backend library: smile
Examples
Usage
(let [cl (svm train-data train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid {:count 1,
;;=> :data ([6 3 4.8 1.8]),
;;=> :prediction (:versicolor),
;;=> :truth (:virginica)},
;;=> :stats {:accuracy 0.9777777777777777}}
Different kernel
(let [cl (svm {:kernel (k/kernel :gaussian 0.15), :epochs 10}
train-data
train-labels)]
(select-keys (validate cl test-data test-labels) [:invalid :stats]))
;;=> {:invalid
;;=> {:count 5,
;;=> :data ([6 3 4.8 1.8]
;;=> [6 2.2 4 1]
;;=> [5.5 2.6 4.4 1.2]
;;=> [5.7 3.8 1.7 0.3]
;;=> [5.1 2.5 3 1.1]),
;;=> :prediction (:versicolor :virginica :virginica :virginica :virginica),
;;=> :truth (:virginica :versicolor :versicolor :setosa :versicolor)},
;;=> :stats {:accuracy 0.8888888888888888}}
train
(train model)
(train model xs ys)
Train another set of data for given classifier or force training already given data.
Examples
Train new data
(train (knn train-data train-labels))
;;=> fastmath.classification$eval16410$fn$reify__16425@38fbf42c
validate
(validate model tx ty)
Validate data against trained classifier. Same as test.
Examples
Use data provided during training
(validate (qda train-data train-labels) test-data test-labels)
;;=> {:invalid {:count 0, :data (), :prediction (), :truth ()},
;;=> :prediction (:virginica
;;=> :setosa :setosa
;;=> :virginica :setosa
;;=> :versicolor :setosa
;;=> :virginica :versicolor
;;=> :versicolor :setosa
;;=> :virginica :setosa
;;=> :virginica :virginica
;;=> :virginica :virginica
;;=> :versicolor :versicolor
;;=> :versicolor :versicolor
;;=> :versicolor :versicolor
;;=> :setosa :setosa
;;=> :versicolor :virginica
;;=> :setosa :virginica
;;=> :versicolor :setosa
;;=> :virginica :versicolor
;;=> :virginica :versicolor
;;=> :versicolor :virginica
;;=> :setosa :virginica
;;=> :virginica :setosa
;;=> :versicolor :versicolor
;;=> :setosa :versicolor),
;;=> :stats {:accuracy 1.0}}
Test other data
(validate (qda train-data train-labels)
[[1 2 3 4] [6 4 3 2]]
[:virginica :setosa])
;;=> {:invalid {:count 1,
;;=> :data ([6 4 3 2]),
;;=> :prediction (:virginica),
;;=> :truth (:setosa)},
;;=> :prediction (:virginica :virginica),
;;=> :stats {:accuracy 0.5}}