Hand-on Scikit-learn

04/03/2020

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Edition(PartI Fundamentals of Machine Learning)の日本語訳とそのExciseについて勉強していきます。原本は、こちらのサイトです。

Part I. The Fundamentals of Machine Learning

The Machine Learning Landscape
1. What Is Machine Learning?
2. Why Use Machine Learning?
3. Examples of Applications
4. Types of Machine Learning Systems
  1. Supervised/Unsupervised Learning
  2. Batch and Online Learning
  3. Instance-Based Versus Model-Based Learning
5. Main Challenges of Machine Learning
  1. Insufficient Quantity of Training Data
  2. Nonrepresentative Training Data
  3. Poor-Quality Data
  4. Irrelevant Features
  5. Overfitting the Training Data
  6. Underfitting the Training Data
  7. Stepping Back
6. Testing and Validating
  1. Hyperparameter Tuning and Model Selection
  2. Data Mismatch
7. Exercises
End-to-End Machine Learning Project
1. Working with Real Data
2. Look at the Big Picture
  1. Frame the Problem
  2. Select a Performance Measure
  3. Check the Assumptions
3. Get the Data
  1. Create the Workspace
  2. Download the Data
  3. Take a Quick Look at the Data Structure
  4. Create a Test Set
4. Discover and Visualize the Data to Gain Insights
  1. Visualizing Geographical Data
  2. Looking for Correlations
  3. Experimenting with Attribute Combinations
5. Prepare the Data for Machine Learning Algorithms
  1. Data Cleaning
  2. Handling Text and Categorical Attributes
  3. Custom Transformers
  4. Feature Scaling
  5. Transformation Pipelines
6. Select and Train a Model
  1. Training and Evaluating on the Training Set
  2. Better Evaluation Using Cross-Validation
7. Fine-Tune Your Model
  1. Grid Search
  2. Randomized Search
  3. Ensemble Methods
  4. Analyze the Best Models and Their Errors
  5. Evaluate Your System on the Test Set
  6. Launch, Monitor, and Maintain Your System
8. Try It Out!
9. Exercises
Classification
1. MNIST
2. Training a Binary Classifier
3. Performance Measures
  1. Measuring Accuracy Using Cross-Validation
  2. Confusion Matrix
  3. Precision and Recall
  4. Precision/Recall Trade-off
  5. The ROC Curve
4. Multiclass Classification
5. Error Analysis
6. Multilabel Classification
7. Multioutput Classification
8. Exercises
Training Models
1. Linear Regression
  1. The Normal Equation 114
  2. Computational Complexity 117
2. Gradient Descent 118
  1. Batch Gradient Descent 121
  2. Stochastic Gradient Descent 124
  3. Mini-batch Gradient Descent 127
3. Polynomial Regression 128
4. Learning Curves 130
5. Regularized Linear Models 134
  1. Ridge Regression 135
  2. Lasso Regression 137
  3. Elastic Net 140
  4. Early Stopping 141
6. Logistic Regression 142
  1. Estimating Probabilities 143
  2. Training and Cost Function 144
  3. Decision Boundaries 145
  4. Softmax Regression 148
7. Exercises 151
Support Vector Machines
1. Linear SVM Classification 153
  1. Soft Margin Classification 154
2. Nonlinear SVM Classification 157
  1. Polynomial Kernel 158
  2. Similarity Features 159
  3. Gaussian RBF Kernel 160
  4. Computational Complexity 162
3. SVM Regression 162
4. Under the Hood 164
  1. Decision Function and Predictions 165
  2. Training Objective 166
  3. Quadratic Programming 167
  4. The Dual Problem 168
  5. Kernelized SVMs 169
  6. Online SVMs 172
5. Exercises 174
Decision Trees
1. Training and Visualizing a Decision Tree 175
2. Making Predictions 176
3. Estimating Class Probabilities 178
4. The CART Training Algorithm 179
5. Computational Complexity 180
6. Gini Impurity or Entropy? 180
7. Regularization Hyperparameters 181
8. Regression 183
9. Instability 185
10. Exercises 186
Ensemble Learning and Random Forests
1. Voting Classifiers 189
2. Bagging and Pasting 192
  1. Bagging and Pasting in Scikit-Learn 194
  2. Out-of-Bag Evaluation 195
3. Random Patches and Random Subspaces 196
4. Random Forests 197
  1. Extra-Trees 198
  2. Feature Importance 198
5. Boosting 199
  1. AdaBoost 200
  2. Gradient Boosting 203
6. Stacking 208
7. Exercises 211
Dimensionality Reduction
1. The Curse of Dimensionality 214
2. Main Approaches for Dimensionality Reduction 215
  1. Projection 215
  2. Manifold Learning 218
3. PCA 219
  1. Preserving the Variance 219
  2. Principal Components 220
  3. Projecting Down to d Dimensions 221
  4. Using Scikit-Learn 222
  5. Explained Variance Ratio 222
  6. Choosing the Right Number of Dimensions 223
  7. PCA for Compression 224
  8. Randomized PCA 225
  9. Incremental PCA 225
4. Kernel PCA 226
  1. Selecting a Kernel and Tuning Hyperparameters 227
5. LLE 230
6. Other Dimensionality Reduction Techniques 232
7. Exercises 233
Unsupervised Learning Techniques
1. Clustering 236
  1. K-Means 238
  2. Limits of K-Means 248
  3. Using Clustering for Image Segmentation 249
  4. Using Clustering for Preprocessing 251
  5. Using Clustering for Semi-Supervised Learning 253
  6. DBSCAN 255
  7. Other Clustering Algorithms 258
2. Gaussian Mixtures 260
  1. Anomaly Detection Using Gaussian Mixtures 266
  2. Selecting the Number of Clusters 267
  3. Bayesian Gaussian Mixture Models 270
  4. Other Algorithms for Anomaly and Novelty Detection 274
3. Exercises 275

Posted by tys-yokohama