ClimX
Extreme-aware climate model emulation
Build fast, accurate machine learning emulators for the NorESM2-MM Earth System Model and quantify tail risks in future climate projections.
Supported by ESA Phi-lab. Deterministic and uncertainty-aware submissions are hosted on separate Kaggle competition pages.
What is ClimX?
A benchmark built for the question that matters: how will extremes change?
Motivation
Earth System Models (ESMs) are our best tools to study climate futures, but they are computationally expensive. This limits how densely we can explore uncertainty (scenarios, initial conditions, and model structure) and makes it hard to answer policy-relevant questions about rare but high-impact events.
Climate emulators are lightweight surrogates that approximate ESM outputs, enabling rapid experimentation and risk assessment.
The task
Build a model that predicts daily 2D maps of climate variables at NorESM2-MM resolution, driven by forcing trajectories and, optionally, past climate state.
Training: historical (1850–2014) + SSP1-2.6, SSP3-7.0, SSP5-8.5 (2015–2100). Testing: held-out SSP2-4.5 (2015–2100).
Data and access
Two-tier distribution: full-resolution training on Hugging Face, lightweight prototyping on Kaggle
Use the same benchmark data for both Kaggle competitions: the deterministic main track and the separate UQ track.
Hugging Face (full)
172GB, full resolution (NetCDF)
- Best for: full-resolution training and final model development
- Format: Zarr (streamable / chunked)
- Includes: historical + training SSP targets and forcings; SSP2-4.5 test forcings (no targets)
- Historical:
(lat: 192, lon: 288, time: 60224) - Projections:
(lat: 192, lon: 288, time: 31389)
Kaggle (lite)
800Mb, 16× spatially coarsened lite (debug)
- Best for: fast prototyping and validating end-to-end pipelines
- Format: competition “Data” bundle (lightweight exports)
- Same task: same variables and split logic, at reduced spatial resolution
- Competitions: deterministic submissions go to the main track, probabilistic submissions go to the UQ track
- Historical:
(lat: 12, lon: 18, time: 60224) - Projections:
(lat: 12, lon: 18, time: 31389)
Inputs
Forcing variables include greenhouse gases (global) and aerosols (spatial). Aerosol inputs are temporally sparse for some scenarios, and are interpolated to monthly values.
Targets
Your emulator can predict daily 2D maps (192×288) of 7 variables (useful for diagnostics and for computing extremes):
| Variable | Description | Units |
|---|---|---|
tas | Near-surface air temperature | K |
tasmax | Daily max near-surface air temperature | K |
tasmin | Daily min near-surface air temperature | K |
pr | Precipitation | kg/(m² s) |
huss | Near-surface specific humidity | kg/kg |
psl | Sea level pressure | Pa |
sfcWind | Near-surface wind speed | m/s |
Submission rule: your model must emulate these daily target variables first. Direct prediction of the leaderboard indices is not allowed.
Benchmark target: the leaderboard score is computed on 15 extreme indices derived from daily temperature and precipitation.
Evaluation focused on extremes
Because averages hide risk: extremes determine impacts
Primary score
Models are scored on 15 derived extreme indices using the region-wise normalized Nash–Sutcliffe efficiency (nNSE). Cell-level \(R^2\) is transformed via \(\mathrm{nNSE}_{ij} = R^2_{ij}/(2-R^2_{ij})\), mapping \(R^2\) to \((-1,1]\). Regional scores are area-weighted over AR6 land regions and then averaged uniformly:
The uncertainty-quantification track is hosted separately on Kaggle and evaluates probabilistic submissions with the analogous CRPS-based regional score.
Why indices?
Indices convert daily fields into impact-relevant summaries: how hot the hottest day gets, how long droughts persist, how much rain falls during the wettest multi-day event, and how the fraction of rainfall from extremes changes.
Why nNSE?
Using nNSE instead of a raw error ensures the score is bounded, physically interpretable, and comparable across indices with different units and scales.
The 15 indices and the questions they answer
Temperature extremes
- TXx, TNn: hottest day / coldest night intensity (heatwaves, cold snaps)
- SU, TR: frequency of hot days and hot nights (human thermal stress)
- FD, ID: frost and ice days (ecosystems and agriculture)
- WSDI, CSDI: warm/cold spell duration (persistence of extremes)
- GSL: growing season length (shifts in crop calendars)
Precipitation extremes
- Rx5day: intensity of multi-day rainfall events (flood risk)
- CDD, CWD: dry/wet spell persistence (drought and prolonged wet periods)
- R95pTOT: share of rainfall from very wet days (tail-dominated precipitation regimes)
- R10mm, SDII: frequency and intensity of heavy rain (infrastructure design)
- Use only the data provided by the organizers for training (no external CMIP6 data; no models pre-trained on CMIP6).
- Submit predictions via Kaggle: the main track is hosted at ClimX and the UQ track is hosted at ClimX UQ Track.
- Teams are limited to 10 members (team merges allowed up to one month before the deadline).
- Organizers may run validity checks and request training/inference code and weights; suspicious entries may be temporarily removed while being reviewed.
- Your emulator must output the daily target variables first; the submitted indices must be computed from those outputs rather than predicted directly.
- To be eligible for prizes and final ranking, top-ranked participants must open-source code and weights under an MIT or Apache-2.0 license.
Timeline
Proposed schedule for the IJCAI 2026 ClimX challenge
Get started
From zero to a valid submission
Download data
Use ClimX-lite on Kaggle for quick iteration or download the full dataset on Hugging Face for full-resolution training.
Train an emulator
Start from the provided baselines and improve speed, accuracy, and extreme fidelity.
Submit on Kaggle
Compute the required indices from your emulator outputs, then submit to the deterministic or UQ Kaggle competition.
ESA Phi-lab sponsors the challenge prizes and travel support for the winning teams.

