Skip to content

PCA Target Transformation

This notebook is a simple demonstration of the TransformedTargetRegressor found in the sklearn library. It's usefull if we have a multidimensional labels/output vector and we wish to reduce the dimensionality of that vector. We can use many different transformations but for out application, we chose to use the PCA transformation.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.compose import TransformedTargetRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import time as time
# Make Fake Dataset
X, y = make_regression(
    n_samples=10000, 
    n_features=1000,    # Total Features
    n_informative=10,   # Informative Features 
    n_targets=100,
    bias=100,
    noise=0.8,
    random_state=123
)

# Training and Testing
xtrain, xtest, ytrain, ytest = train_test_split(X, y, train_size=5000, random_state=123)

Test I - Standard MultiOutput

# Initialize Model
linear_model = LinearRegression()

# Fit model to data
t0 = time.time()
linear_model.fit(xtrain, ytrain)
t1 = time.time() - t0

# predict new datapoints
ypred = linear_model.predict(xtest)

# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)

print(
    f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}" 
    f" \nTime: {t1:.3} seconds"
)
MAE: 0.713
MSE: 0.799
RMSE: 0.894
R2: 1.000 
Time: 0.639 seconds

Test II - PCA Transformation

# Define ml model
linear_model = LinearRegression()

# Define target transformer
pca_model = PCA(n_components=10)

# Define Wrapper for target transformation
full_regressor = TransformedTargetRegressor(
    regressor=linear_model,
    transformer=pca_model,   # same number of components as informative
    check_inverse=False                 # PCA is not a direct inverse transformation

)

# Fit Regressor to data
full_regressor.fit(xtrain, ytrain)

# Predict on new inputs
ypred = full_regressor.predict(xtest)

# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)

# Print Results
print(
    f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}" 
    f" \nTime: {t1:.3} seconds"
)
MAE: 0.647
MSE: 0.657
RMSE: 0.811
R2: 1.000 
Time: 0.639 seconds

Significantly lower MAE, MSE and RMSE than without the target transformation. Worth keeping in ML toolbox for the future.