Multi-Task vs. Multi-Output¶
In this notebook, we will be looking at the difference between multitask and multioutput. In a nutshell, multioutput is when we have a single function that is able to return the multidimensional vector of labels and multitask is when we have a function for each output of the multidimensional vector of labels. There are pros and cons to each of these methods so it's up to the expert to give the data scientists some reasons as to why one would choose one over the other.
Let's define some terms: we have some dataset \mathcal{D}=\{X,Y\} of pairs of input X \in \mathbb{R}^{N\times D} and ouputs Y \in \mathbb{R}^{N \times O}. Here N is the number of samples, D is the number of features/dimensions and O are the number of outputs (or tasks).
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import WhiteKernel, ConstantKernel, RBF
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import time as time
# Make Fake Dataset
X, y = make_regression(
n_samples=1000,
n_features=10, # Total Features
n_informative=3, # Informative Features
n_targets=10,
bias=10,
noise=0.8,
random_state=123
)
# Training and Testing
xtrain, xtest, ytrain, ytest = train_test_split(X, y, train_size=500, random_state=123)
Algorithm I - Linear Regression¶
Test I - MultiOutput Linear Regression¶
linear_model = LinearRegression()
t0 = time.time()
linear_model.fit(xtrain, ytrain)
t1 = time.time() - t0
ypred = linear_model.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
print(
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)
Test II - MultiTask Linear Regression¶
linear_model_multi = MultiOutputRegressor(
LinearRegression(),
n_jobs=-1, # Number of cores to use to parallelize the training
)
t0 = time.time()
linear_model_multi.fit(xtrain, ytrain)
t1 = time.time() - t0
ypred = linear_model_multi.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
print(
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)
The results are exactly the same in the case of Linear Regression. But for more complex models, we can expect that this will not be the same. One obvious trade-off is the number of tasks (outputs) that we have. 100 outputs is quite a lot so that requires a lot of time to train.
Algorithm II - Gaussian Process Regression¶
MultiOutput¶
The GP algorithm does have multioutput so we can just pass in a vector of labels.
# define kernel function
kernel = ConstantKernel() * RBF() + WhiteKernel()
# define GP model
gp_model = GaussianProcessRegressor(
kernel=kernel, # kernel function (very important)
normalize_y=True, # good standard practice
random_state=123, # reproducibility
n_restarts_optimizer=10, # good practice (avoids local minima)
)
# train GP Model
t0 = time.time()
gp_model.fit(xtrain, ytrain)
t1 = time.time() - t0
ypred = gp_model.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
print(
f"GP Model:\n"
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)
# Define kernel function
kernel = ConstantKernel() * RBF() + WhiteKernel()
# Define GP Model
gp_model = GaussianProcessRegressor(
kernel=kernel, # kernel function (very important)
normalize_y=True, # good standard practice
random_state=123, # reproducibility
n_restarts_optimizer=10, # good practice (avoids local minima)
)
# Define Multioutput function
gp_model_multi = MultiOutputRegressor(
gp_model,
n_jobs=-1, # Number of cores to use to parallelize the training
)
# Fit Model
t0 = time.time()
gp_model_multi.fit(xtrain, ytrain)
t1 = time.time() - t0
# Predict with test set
ypred = gp_model_multi.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
print(
f"GP Model:\n"
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)
Slightly better accuracy but notice in this case the training time was similar. This sometimes happens because some algorithms have a difficult time converging when there are multiple outputs.
Algorithm III - Bayesian Ridge Regression¶
This algorithm does not have multioutput functionality so instead we will use the multi-task functionality.
Multi-Task¶
# Define GP Model
bayes_model = BayesianRidge(
n_iter=1000,
normalize=True, # good standard practice
verbose=1, # Gives updates
)
# Define Multioutput function
bayes_model_multi = MultiOutputRegressor(
bayes_model,
n_jobs=-1, # Number of cores to use to parallelize the training
)
# Fit Model
t0 = time.time()
bayes_model_multi.fit(xtrain, ytrain)
t1 = time.time() - t0
# Predict with test set
ypred = bayes_model_multi.predict(xtest)
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)
print(
f"GP Model:\n"
f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}"
f" \nTime: {t1:.3} seconds"
)