Download: Github, MathWorks

Contents

The YAN-PRTools matlab toolbox now includes 40 common pattern recognition algorithms:

Feature processing

  1. mat2ftvec : Transform sample matrices to a feature matrix
  2. zscore : feature normalization
  3. pca : PCA
  4. kpca : KPCA
  5. lda : LDA

Classification

  1. lr : Logistic regression
  2. softmax : Softmax
  3. svm : Wrapper of libsvm
  4. rf : Random forest
  5. knn : K nearest neighbors
  6. gauss : Wrapper of Matlab’s classify function, including methods like naive Bayes, fitting normal density function, Mahalanobis distance, etc.
  7. boost : AdaBoost with stump weak classifier
  8. tree : Wrapper of Matlab’s tree classifier
  9. ann : Wrapper of the artificial neural networks in Matlab
  10. elm : Basic extreme learning machine

Regression

  1. ridge : Ridge regression
  2. kridge : Kernel ridge regression
  3. svr : Wrapper of support vector regression in libsvm
  4. simplefit : Wrapper to Matlab’s basic fitting functions, inncluding least squares, robust fitting, quadratic fitting, etc.
  5. lasso : Wrapper of Matlab’s lasso regression
  6. pls : Wrapper of Matlab’s patial least square regression
  7. step : Wrapper of Matlab’s stepwisefit
  8. rf : Random forest
  9. ann : Wrapper of the artificial neural networks in Matlab
  10. elm : Basic extreme learning machine

Feature selection

  1. corr : Feature ranking based on correlation coefficients (filter method)
  2. fisher : Feature ranking using Fisher ratio (filter method)
  3. mrmr : Feature ranking using minimum redundancy maximal relevance (mRMR) (filter method)
  4. single : Feature ranking based on each single feature’s prediction accuracy (wrapper method)
  5. sfs : Feature selection using sequential forward selection (wrapper method)
  6. ga : Feature selection using the genetic algorithm in Matlab (wrapper method)
  7. rf : Feature ranking using random forest (embedded method)
  8. stepwisefit : Feature selection based on stepwise fitting (embedded method)
  9. boost : Feature selection using AdaBoost with the stump weak learner (embedded method)
  10. svmrfe_ori : Feature ranking using SVM-recursive feature elimination (SVM-RFE), the original linear version (embedded method)
  11. svmrfe_ker : Feature ranking using the kernel version of SVM-RFE (embedded method)

Representative sample selection (active learning)

  1. cluster : Sample selection based on cluster centers
  2. ted : Transductive experimental design
  3. llr : Locally linear reconstruction
  4. ks : Kennard-Stone algorithm


Interfaces

Feature processing

[Xnew, model] = ftProc_xxx_tr(X,Y,param) % training
Xnew = ftProc_xxx_te(model,X) % test

Classification

model = classf_xxx_tr(X,Y,param) % training
[pred,prob] = classf_xxx_te(model,Xtest) % test, return the predicted labels and probabilities (optional)

Regression

model = regress_xxx_tr(X,Y,param) % training
rv = regress_xxx_te(model,Xtest) % test, return the predicted values

Feature selection

[ftRank,ftScore] = ftSel_xxx(ft,target,param) % return the feature rank (or subset) and scores (optional)

Representative sample selection (active learning)

smpList = smpSel_xxx(X,nSel,param) % return the indices of the selected samples

Please see test.m for sample usages.

Besides, there are three uniform wrappers: ftProc_, classf_, regress_. They accept algorithm name strings as inputs and combine the training and test phase.


Characteristics

  • The training (tr) and test (te) phases are split for feature processing, classification and regression to allow more flexible use. For example, one trained model can be applied multiple times.
  • The struct “param” is used to pass parameters to algorithms.
  • Default parameters are set clearly at the top of the code, along with the explainations.

In brief, I aimed at three main objectives when developing this toolbox:

  • Unified and simple interface;
  • Convenient to observe and change algorithm parameters, avoiding tedious parameter setting and checking;
  • Extensibile. Simple file structures makes it easier to modify the algorithms.


Dependencies

In the toolbox, 20 algorithms are self-implemented, 11 are wrappers or mainly based on Matlab functions, and 9 are wrappers or mainly based on 3rd party toolboxes, which are listed below. They are included in the project, however, you may need to recompile some of them depending on your computer platform.

Thanks to the authors and MathWorks Inc.! I know that there is so many important algorithms not contained in the toolbox, so everybody is welcomed to contribute new codes! Also, if you find any bug in the codes, please don’t hesitate to let me know!

Ke YAN, 2016, Tsinghua Univ. http://yanke23.com, xjed09@gmail.com