Hand-on Scikit-learn
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Edition(PartI Fundamentals of Machine Learning)の日本語訳とそのExciseについて勉強していきます。 原本は、 こちらのサイトです。
Part I. The Fundamentals of Machine Learning
- The Machine Learning Landscape
- What Is Machine Learning?
- Why Use Machine Learning?
- Examples of Applications
- Types of Machine Learning Systems
- Supervised/Unsupervised Learning
- Batch and Online Learning
- Instance-Based Versus Model-Based Learning
- Main Challenges of Machine Learning
- Insufficient Quantity of Training Data
- Nonrepresentative Training Data
- Poor-Quality Data
- Irrelevant Features
- Overfitting the Training Data
- Underfitting the Training Data
- Stepping Back
- Testing and Validating
- Hyperparameter Tuning and Model Selection
- Data Mismatch
- Exercises
- End-to-End Machine Learning Project
- Working with Real Data
- Look at the Big Picture
- Frame the Problem
- Select a Performance Measure
- Check the Assumptions
- Get the Data
- Create the Workspace
- Download the Data
- Take a Quick Look at the Data Structure
- Create a Test Set
- Discover and Visualize the Data to Gain Insights
- Visualizing Geographical Data
- Looking for Correlations
- Experimenting with Attribute Combinations
- Prepare the Data for Machine Learning Algorithms
- Data Cleaning
- Handling Text and Categorical Attributes
- Custom Transformers
- Feature Scaling
- Transformation Pipelines
- Select and Train a Model
- Training and Evaluating on the Training Set
- Better Evaluation Using Cross-Validation
- Fine-Tune Your Model
- Grid Search
- Randomized Search
- Ensemble Methods
- Analyze the Best Models and Their Errors
- Evaluate Your System on the Test Set
- Launch, Monitor, and Maintain Your System
- Try It Out!
- Exercises
- Classification
- MNIST
- Training a Binary Classifier
- Performance Measures
- Measuring Accuracy Using Cross-Validation
- Confusion Matrix
- Precision and Recall
- Precision/Recall Trade-off
- The ROC Curve
- Multiclass Classification
- Error Analysis
- Multilabel Classification
- Multioutput Classification
- Exercises
- Training Models
- Linear Regression
- The Normal Equation 114
- Computational Complexity 117
- Gradient Descent 118
- Batch Gradient Descent 121
- Stochastic Gradient Descent 124
- Mini-batch Gradient Descent 127
- Polynomial Regression 128
- Learning Curves 130
- Regularized Linear Models 134
- Ridge Regression 135
- Lasso Regression 137
- Elastic Net 140
- Early Stopping 141
- Logistic Regression 142
- Estimating Probabilities 143
- Training and Cost Function 144
- Decision Boundaries 145
- Softmax Regression 148
- Exercises 151
- Linear Regression
- Support Vector Machines
- Linear SVM Classification 153
- Soft Margin Classification 154
- Nonlinear SVM Classification 157
- Polynomial Kernel 158
- Similarity Features 159
- Gaussian RBF Kernel 160
- Computational Complexity 162
- SVM Regression 162
- Under the Hood 164
- Decision Function and Predictions 165
- Training Objective 166
- Quadratic Programming 167
- The Dual Problem 168
- Kernelized SVMs 169
- Online SVMs 172
- Exercises 174
- Linear SVM Classification 153
- Decision Trees
- Training and Visualizing a Decision Tree 175
- Making Predictions 176
- Estimating Class Probabilities 178
- The CART Training Algorithm 179
- Computational Complexity 180
- Gini Impurity or Entropy? 180
- Regularization Hyperparameters 181
- Regression 183
- Instability 185
- Exercises 186
- Ensemble Learning and Random Forests
- Voting Classifiers 189
- Bagging and Pasting 192
- Bagging and Pasting in Scikit-Learn 194
- Out-of-Bag Evaluation 195
- Random Patches and Random Subspaces 196
- Random Forests 197
- Extra-Trees 198
- Feature Importance 198
- Boosting 199
- AdaBoost 200
- Gradient Boosting 203
- Stacking 208
- Exercises 211
- Dimensionality Reduction
- The Curse of Dimensionality 214
- Main Approaches for Dimensionality Reduction 215
- Projection 215
- Manifold Learning 218
- PCA 219
- Preserving the Variance 219
- Principal Components 220
- Projecting Down to d Dimensions 221
- Using Scikit-Learn 222
- Explained Variance Ratio 222
- Choosing the Right Number of Dimensions 223
- PCA for Compression 224
- Randomized PCA 225
- Incremental PCA 225
- Kernel PCA 226
- Selecting a Kernel and Tuning Hyperparameters 227
- LLE 230
- Other Dimensionality Reduction Techniques 232
- Exercises 233
- Unsupervised Learning Techniques
- Clustering 236
- K-Means 238
- Limits of K-Means 248
- Using Clustering for Image Segmentation 249
- Using Clustering for Preprocessing 251
- Using Clustering for Semi-Supervised Learning 253
- DBSCAN 255
- Other Clustering Algorithms 258
- Gaussian Mixtures 260
- Anomaly Detection Using Gaussian Mixtures 266
- Selecting the Number of Clusters 267
- Bayesian Gaussian Mixture Models 270
- Other Algorithms for Anomaly and Novelty Detection 274
- Exercises 275
- Clustering 236