Chapter 4: Batch vs. Online Learning#
4.1 Explanation of Batch Learning#
Definition: Batch learning involves training a model using the entire dataset at once. The model processes all data in a batch and updates its parameters only after seeing the entire dataset.
Pros: Suitable for stable datasets, can converge to a global optimum, and efficient when training time is not a constraint.
Cons: Requires a large amount of memory, less adaptable to new data, and is not ideal for continuously changing environments.
4.2 Explanation of Online Learning#
Definition: Online learning processes data one sample or in small batches at a time, updating the model with each new data point. It’s suitable for scenarios where data arrives continuously.
Pros: More adaptable to new data, requires less memory, and is ideal for real-time applications.
Cons: Risk of overfitting to recent data, may take longer to converge, and requires careful tuning of learning rates
4.3 Practical Code Example#
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
# Generate a synthetic dataset
X = np.random.rand(1000, 1)
y = 3 * X.squeeze() + 2 + np.random.randn(1000) * 0.1
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize features for better performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
### Batch Learning Example
from sklearn.linear_model import LinearRegression
# Train the model on the entire training set
batch_model = LinearRegression()
batch_model.fit(X_train, y_train)
# Evaluate the model
y_pred_batch = batch_model.predict(X_test)
mse_batch = mean_squared_error(y_test, y_pred_batch)
print(f"Batch Learning MSE: {mse_batch}")
### Online Learning Example
# Initialize the online learning model (Stochastic Gradient Descent)
online_model = SGDRegressor(max_iter=1, learning_rate='constant', eta0=0.01, warm_start=True)
# Simulate online learning by iterating over training data in small batches
for epoch in range(100): # Simulating multiple epochs
for i in range(0, len(X_train), 10): # Update with batches of 10 samples
online_model.partial_fit(X_train[i:i + 10], y_train[i:i + 10])
# Evaluate the online learning model
y_pred_online = online_model.predict(X_test)
mse_online = mean_squared_error(y_test, y_pred_online)
print(f"Online Learning MSE: {mse_online}")
Batch Learning MSE: 0.008928240035140297
Online Learning MSE: 0.008894697469204179
4.4 Summary#
Batch Learning is ideal for stable datasets where time is available for training and computational resources are not limited.
Online Learning is suitable for applications where data is continuously generated, and quick updates to the model are necessary.
The choice between batch and online learning depends on the availability of data, computational resources, and the need for model adaptability.