Chapter 5: Instance-Based vs. Model-Based Learning#

5.1 Instance-Based Learning#

  • Definition: Instance-based learning, also known as memory-based learning, uses specific training instances to make predictions without creating a generalized model. It memorizes the training data and uses similarity measures to make predictions.

  • Key Characteristics:

    • Learns based on the closest training instances.

    • Requires storing the entire training dataset.

    • Example Algorithm: k-Nearest Neighbors (k-NN).

5.2 Model-Based Learning#

  • Definition: Model-based learning involves creating a generalized model using training data. This model abstracts the data patterns and uses them to predict outcomes for new data.

  • Key Characteristics:

    • Creates an abstract model from the data.

    • Does not require storing all training instances.

    • Example Algorithm: Linear Regression.

5.3 Comparison Table#

Criteria

Instance-Based Learning

Model-Based Learning

Approach

Memorizes data points

Builds a general model

Memory Requirement

High (stores data)

Lower (stores parameters)

Training Complexity

Simple, minimal training

More complex, requires model training

Prediction Complexity

High (computes distances)

Low (predicts using model)

Example Algorithm

k-Nearest Neighbors (k-NN)

Linear Regression

5.4 Practical Code Example#

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = 2 * X.squeeze() + np.random.randn(100) * 0.5

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Instance-Based Learning: k-Nearest Neighbors (k-NN)
# Initialize and train the k-NN model
knn = KNeighborsRegressor(n_neighbors=3)
knn.fit(X_train, y_train)

# Make predictions and evaluate
y_pred_knn = knn.predict(X_test)
mse_knn = mean_squared_error(y_test, y_pred_knn)
print(f"k-NN Mean Squared Error: {mse_knn}")

### Model-Based Learning: Linear Regression
# Initialize and train the Linear Regression model
linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)

# Make predictions and evaluate
y_pred_linear = linear_reg.predict(X_test)
mse_linear = mean_squared_error(y_test, y_pred_linear)
print(f"Linear Regression Mean Squared Error: {mse_linear}")
k-NN Mean Squared Error: 0.19892930792487312
Linear Regression Mean Squared Error: 0.19909184196123422

5.5 Summary#

  • Instance-Based Learning (e.g., k-NN) directly uses training data for predictions, making it ideal for small datasets but memory-intensive for large ones.

  • Model-Based Learning (e.g., Linear Regression) generalizes from training data to create a model, making it suitable for larger datasets and faster predictions.

  • The choice depends on factors like data size, complexity, and the need for interpretability.