Skip to content

Latest commit

 

History

History
174 lines (120 loc) · 7.03 KB

File metadata and controls

174 lines (120 loc) · 7.03 KB

14 Supervised Learning

Resources

Notes

  • Logistic Regression
  • Descision Trees
  • Random Forest
  • Gradient Boosting

Imbalanced Data | link

. . . click to expand . . .

ex: Classification problem (A vs B) where source data contains much more A cases, therefore may end up overfitting to A.

Accuracy Paradox wiki

a simple model may have a high level of accuracy but be too crude to be useful. For example, if the incidence of category A is dominant, being found in 99% of cases, then predicting that every case is category A will have an accuracy of 99%. Precision and recall are better measures in such cases.

simple model, in this case, could be as simple as always outputting A

  1. Collect more data Useful if possible!

  2. Change performance metric As mentioned above, accuracy isn't ideal. Some ideas mentioned: - Confusion Matrix - Precision and Recall - F1 Score (weighted average of above) - Kappa / Cohen's kappa - ROC Curves

  3. Resample dataset | wiki Artificially change dataset to impart more balance on the model, either:

  • over-sample: add copies of the under-represented class B
    • better with less data
  • under-sample: delete instances of over-represented class A
    • better with more data
  • Advice
    • Test random and non-random/stratified sampling schemes
    • Test different resampled ratios (don't only try 1:1 for binary problem)
  1. Generate Synthetic Samples, SMOTE
  • can sample empirically
  • utilize Naive Bayes
  • various systematic algorithms . . .
    • SMOTE, synthetic minority over-sampling technique
    • see scikit-learn-contrib package, imbalanced-learn
  1. Change algorithm Always use cross-validation to try various models. Author suggests decision trees often perform well with imbalanced data. suggestions: C4.5, C5.0, CART, Random Forest

  2. Penalized Classification Use penalized version of a classification algorithm that imposes additional cost for mistakes on the minority class. Trial and error with a variety of penalty schemes often required.

  3. Change Perspective / Get Creative ex: Instead of detecting rare events, Anomaly Detection, for a particular problem, consider as Change Detection. could be useful for a security camera or something.

Break down problem into more tractable, smaller problems. Get inspiration from other problems.

. . . click to expand . . .

Basic Idea

Training vs Testing complexity

Bias, Variance, selection of k

One nearest neighbor is low bias, high variance. With each new training point, more boundaries are added.

As neighbors are increased, bias is introduced and variance decreases. Smoother boundaries, may not be exact.

Optimizing k, distance function, voting parameters via Cross-Validation

AKA Hyper-Parameter evaluation. 5 and 10-fold are typical choices for CV. Depends on size of dataset and choice of classifier. For example, smaller datasets can't fold as much. Make sure test-data is untouched until final evaluation.

Distance Calculations, Training Classifier

choice of feature is one of most important things in classification, self-driving car example . . . many different "detectors" to provide many different features to provide best decisions

  • very basic for image classifier: pixel-by-pixel distance distance
    • euclidean
    • L1
    • L2
  • SIFT
    • Rotation, Scale Invariant

Feature additions may help training accuracy, but then encounter curse of dimensionality, the space is now too far and neighbors are far apart. Therefore . . .

Dimensionality Reduction

Idea: Bring down dimensionality of vectors generated from pixels while preserving the distance between neighbors. *watch this part again as a review timestamp

Also useful for compression and visualization (PCA music example).

  • techniques
    • linear models why are they more commonly used?
    • non-linear methods

PCA | previous notes

  • Post-Office Handwriting Recognition Example
  • Acoustic patterns in music: project to expand on my library, explore particular genre grouping

Multi-Dimensional Scaling (MDS) | wiki | other

... notes ...

Metrics

  • Confusion Matrix for Classification
    • Precision, Recall, F1 Score
    • scikit learn --> classification_report
  • ROC curve for Logistic Regression
    • see/take statistics book notes
  • Hyperparameter tuning
    • many examples elsewhere (k for kNN, alpha for ridge/lasso regression)
    • Grid Search CV, guided capstone notes
    • Randomized Search CV

Preprocessing, Pipelines

Regularized Regression

Ridge

Lasso