RBIG4IT: Information Theory Measures via Multidimensional Gaussianization

Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. Here we propose an indirect way of computing information based on a multivariate Gaussianization transform. Our proposal mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations, which can be interpreted as a particular deep neural network. We introduce specific Gaussianization-based methodologies to estimate total correlation, entropy, mutual information and Kullback-Leibler divergence. We compare them to recent estimators showing the accuracy on synthetic data generated from different multivariate distributions. We made the tools and datasets publicly available to provide a test-bed to analyze future methodologies. Results show that our proposal is superior to previous estimators particularly in high-dimensional scenarios; and that it leads to interesting insights in neuroscience, geoscience, computer vision, and machine learning.

Software

RBIG Python toolbox

Includes tools to compute Information Theory measures used in the current paper [RBIG4IT2020]

Demo in Google Colab

This demo provides an interactive example of how to use the RBIG Python toolbox to compute information measures like entropy and mutual information. It’s an easy-to-follow resource for those interested in testing the methods on their data.

RBIG Matlab toolbox

Includes tools to compute Information Theory measures and scripts to generate the synthetic data for the experiments in the current paper [RBIG4IT2020]

Paper Summary

The measures that can be computed using RBIG defined in this paper are the ones in the following figure + the Kulback-Leibler divergence. The main point is that RBIG allows to get acurated estimations of these measures even in multidimensional datasets.

Extended results

Here extra results for the paper are shown. Mainly figures for results on synthetic data that would taken too much space in the original paper.

Total Correlation

The total correlation is a measure of multivariate dependence among several variables. In this section, we show how the RBIG methodology allows for precise estimation of total correlation, even for datasets with non-Gaussian distributions or high dimensionality.

Entropy

Entropy is a fundamental concept in information theory, representing the amount of uncertainty in a dataset. Using RBIG, we show that entropy can be computed more efficiently compared to classical estimators, particularly in datasets with complex dependencies.

KLD

Kullback-Leibler divergence measures how one probability distribution diverges from a second, reference distribution. RBIG offers a more accurate way to compute KLD in high-dimensional settings, improving performance in tasks like anomaly detection and model comparison.

Mutual Information

Mutual information quantifies the amount of information obtained about one random variable through another. The RBIG methodology demonstrates superior performance in estimating mutual information, especially in datasets with nonlinear dependencies, making it a valuable tool for feature selection and data analysis.

References

  • Information Theory Measures via Multidimensional Gaussianization [RBIG4IT2020]
    V. Laparra, E. Johnson, G. Camps-Valls, R. Santos-Rodriguez, J. Malo.
  • Iterative Gaussianization: from ICA to Random Rotations [TNN2011]
    V. Laparra, G. Camps & J. Malo.
    IEEE Transactions on Neural Networks.

Download