PCA Target Transformation¶
This notebook is a simple demonstration of the
TransformedTargetRegressor
found in the sklearn library. It's usefull if we have a multidimensional labels/output vector and we wish to reduce the dimensionality of that vector. We can use many different transformations but for out application, we chose to use the PCA transformation.
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.compose import TransformedTargetRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import time as time
# Make Fake Dataset
X, y = make_regression(
n_samples=10000,
n_features=1000, # Total Features
n_informative=10, # Informative Features
n_targets=100,
bias=100,
noise=0.8,
random_state=123
)
# Training and Testing
xtrain, xtest, ytrain, ytest = train_test_split(X, y, train_size=5000, random_state=123)
Test I - Standard MultiOutput¶
# Initialize Model
linear_model = LinearRegression()
# Fit model to data
t0 = time.time()
linear_model.fit(xtrain, ytrain)
t1 = time.time() - t0
# predict new datapoints
ypred = linear_model.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
print(
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)
Test II - PCA Transformation¶
# Define ml model
linear_model = LinearRegression()
# Define target transformer
pca_model = PCA(n_components=10)
# Define Wrapper for target transformation
full_regressor = TransformedTargetRegressor(
regressor=linear_model,
transformer=pca_model, # same number of components as informative
check_inverse=False # PCA is not a direct inverse transformation
)
# Fit Regressor to data
full_regressor.fit(xtrain, ytrain)
# Predict on new inputs
ypred = full_regressor.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
# Print Results
print(
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)
Significantly lower MAE, MSE and RMSE than without the target transformation. Worth keeping in ML toolbox for the future.