Chapter 5: Instance-Based vs. Model-Based Learning#
5.1 Instance-Based Learning#
Definition: Instance-based learning, also known as memory-based learning, uses specific training instances to make predictions without creating a generalized model. It memorizes the training data and uses similarity measures to make predictions.
Key Characteristics:
Learns based on the closest training instances.
Requires storing the entire training dataset.
Example Algorithm: k-Nearest Neighbors (k-NN).
5.2 Model-Based Learning#
Definition: Model-based learning involves creating a generalized model using training data. This model abstracts the data patterns and uses them to predict outcomes for new data.
Key Characteristics:
Creates an abstract model from the data.
Does not require storing all training instances.
Example Algorithm: Linear Regression.
5.3 Comparison Table#
Criteria |
Instance-Based Learning |
Model-Based Learning |
|---|---|---|
Approach |
Memorizes data points |
Builds a general model |
Memory Requirement |
High (stores data) |
Lower (stores parameters) |
Training Complexity |
Simple, minimal training |
More complex, requires model training |
Prediction Complexity |
High (computes distances) |
Low (predicts using model) |
Example Algorithm |
k-Nearest Neighbors (k-NN) |
Linear Regression |
5.4 Practical Code Example#
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = 2 * X.squeeze() + np.random.randn(100) * 0.5
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
### Instance-Based Learning: k-Nearest Neighbors (k-NN)
# Initialize and train the k-NN model
knn = KNeighborsRegressor(n_neighbors=3)
knn.fit(X_train, y_train)
# Make predictions and evaluate
y_pred_knn = knn.predict(X_test)
mse_knn = mean_squared_error(y_test, y_pred_knn)
print(f"k-NN Mean Squared Error: {mse_knn}")
### Model-Based Learning: Linear Regression
# Initialize and train the Linear Regression model
linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)
# Make predictions and evaluate
y_pred_linear = linear_reg.predict(X_test)
mse_linear = mean_squared_error(y_test, y_pred_linear)
print(f"Linear Regression Mean Squared Error: {mse_linear}")
k-NN Mean Squared Error: 0.19892930792487312
Linear Regression Mean Squared Error: 0.19909184196123422
5.5 Summary#
Instance-Based Learning (e.g., k-NN) directly uses training data for predictions, making it ideal for small datasets but memory-intensive for large ones.
Model-Based Learning (e.g., Linear Regression) generalizes from training data to create a model, making it suitable for larger datasets and faster predictions.
The choice depends on factors like data size, complexity, and the need for interpretability.