Springboard/Notes/Machine Learning/14_Supervised_Learning.md at main · NBPub/Springboard

14 Supervised Learning

Resources

Videos
- Classification, kNN, Cross-Validation, Dimensionality Reduction
  - part 2 timestamp
  - Harvard CS109 lecture ~72 mins
- Introduction to Machine Learning
  - MIT OCW, Introduction to Computational Thinking and Data Science
- Bias and Regression
  - Harvard CS109 lecture, find different video? parts of presentation / boardwork not shown
- Decision Trees, Random Forests
  - Harvard CS109 lecture
- Using Random Forests
  - lecture from PyData conference, only first 20 mins part of assignment
- Ensemble Methods - Bagging and Boosting
  - Harvard CS109 lecture
- Suppport Vector Machines 1/3
  - StatQuest video, illustrative
- SVM Performance Evaluation, PR ROC
  - Harvard CS109 lecture
- Short SVM with Polynomial Kernel Visualization
  - Consider this a visual demonstration of the kernel trick in SVM
- Time Series Analaysis
  - intro lecture from Jordan Kern

Notes

Logistic Regression
Descision Trees
Random Forest
Gradient Boosting

Imbalanced Data | link

. . . click to expand . . .

ex: Classification problem (A vs B) where source data contains much more A cases, therefore may end up overfitting to A.

Accuracy Paradox wiki

a simple model may have a high level of accuracy but be too crude to be useful. For example, if the incidence of category A is dominant, being found in 99% of cases, then predicting that every case is category A will have an accuracy of 99%. Precision and recall are better measures in such cases.

simple model, in this case, could be as simple as always outputting A

Collect more data Useful if possible!
Change performance metric As mentioned above, accuracy isn't ideal. Some ideas mentioned: - Confusion Matrix - Precision and Recall - F1 Score (weighted average of above) - Kappa / Cohen's kappa - ROC Curves
Resample dataset | wiki Artificially change dataset to impart more balance on the model, either:

over-sample: add copies of the under-represented class B
- better with less data
under-sample: delete instances of over-represented class A
- better with more data
Advice
- Test random and non-random/stratified sampling schemes
- Test different resampled ratios (don't only try 1:1 for binary problem)

Generate Synthetic Samples, SMOTE

can sample empirically
utilize Naive Bayes
various systematic algorithms . . .
- SMOTE, synthetic minority over-sampling technique
  - original 2002 publication
  - Randomly sample attributes from instances in the minority class to generate synthetic samples.
- see scikit-learn-contrib package, imbalanced-learn

Change algorithm Always use cross-validation to try various models. Author suggests decision trees often perform well with imbalanced data. suggestions: C4.5, C5.0, CART, Random Forest
Penalized Classification Use penalized version of a classification algorithm that imposes additional cost for mistakes on the minority class. Trial and error with a variety of penalty schemes often required.
Change Perspective / Get Creative ex: Instead of detecting rare events, Anomaly Detection, for a particular problem, consider as Change Detection. could be useful for a security camera or something.

Break down problem into more tractable, smaller problems. Get inspiration from other problems.

kNN | Harvard CS109 lecture 09

. . . click to expand . . .

Basic Idea

Training vs Testing complexity

Bias, Variance, selection of k

One nearest neighbor is low bias, high variance. With each new training point, more boundaries are added.

As neighbors are increased, bias is introduced and variance decreases. Smoother boundaries, may not be exact.

Optimizing k, distance function, voting parameters via Cross-Validation

AKA Hyper-Parameter evaluation. 5 and 10-fold are typical choices for CV. Depends on size of dataset and choice of classifier. For example, smaller datasets can't fold as much. Make sure test-data is untouched until final evaluation.

Distance Calculations, Training Classifier

choice of feature is one of most important things in classification, self-driving car example . . . many different "detectors" to provide many different features to provide best decisions

very basic for image classifier: pixel-by-pixel distance distance
- euclidean
- L1
- L2
SIFT
- Rotation, Scale Invariant

Feature additions may help training accuracy, but then encounter curse of dimensionality, the space is now too far and neighbors are far apart. Therefore . . .

Dimensionality Reduction

Idea: Bring down dimensionality of vectors generated from pixels while preserving the distance between neighbors. *watch this part again as a review timestamp

Also useful for compression and visualization (PCA music example).

techniques
- linear models why are they more commonly used?
- non-linear methods

PCA | previous notes

Post-Office Handwriting Recognition Example
Acoustic patterns in music: project to expand on my library, explore particular genre grouping

Multi-Dimensional Scaling (MDS) | wiki | other

... notes ...

Metrics

Confusion Matrix for Classification
- Precision, Recall, F1 Score
- scikit learn --> classification_report
ROC curve for Logistic Regression
- see/take statistics book notes
Hyperparameter tuning
- many examples elsewhere (k for kNN, alpha for ridge/lasso regression)
- Grid Search CV, guided capstone notes
- Randomized Search CV

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

14 Supervised Learning

Resources

Notes

Imbalanced Data | link

kNN | Harvard CS109 lecture 09

Dimensionality Reduction

Metrics

Preprocessing, Pipelines

Regularized Regression

Ridge

Lasso

FilesExpand file tree

14_Supervised_Learning.md

Latest commit

History

14_Supervised_Learning.md

File metadata and controls

14 Supervised Learning

Resources

Notes

Imbalanced Data | link

kNN | Harvard CS109 lecture 09

Dimensionality Reduction

Metrics

Preprocessing, Pipelines

Regularized Regression

Ridge

Lasso