ClimX
Extreme-aware climate model emulation
Build fast, accurate machine learning emulators for the NorESM2-MM Earth System Model and quantify tail risks in future climate projections.
What is ClimX?
A benchmark built for the question that matters: how will extremes change?
Motivation
Earth System Models (ESMs) are our best tools to study climate futures, but they are computationally expensive. This limits how densely we can explore uncertainty (scenarios, initial conditions, and model structure) and makes it hard to answer policy-relevant questions about rare but high-impact events.
Climate emulators are lightweight surrogates that approximate ESM outputs, enabling rapid experimentation and risk assessment.
The task
Build a model that predicts daily 2D maps of climate variables at NorESM2-MM resolution, driven by forcing trajectories and, optionally, past climate state.
Training: historical (1850–2014) + SSP1-2.6, SSP3-7.0, SSP5-8.5 (2015–2100). Testing: held-out SSP2-4.5 (2015–2100).
Data and access
Two-tier distribution: full-resolution training on Hugging Face, lightweight prototyping on Kaggle
Hugging Face (full)
172GB, full resolution (NetCDF)
- Best for: full-resolution training and final model development
- Format: Zarr (streamable / chunked)
- Includes: historical + training SSP targets and forcings; SSP2-4.5 test forcings (no targets)
- Historical:
(lat: 192, lon: 288, time: 60224) - Projections:
(lat: 192, lon: 288, time: 31389)
Kaggle (lite)
800Mb, 16× spatially coarsened lite (debug)
- Best for: fast prototyping and validating end-to-end pipelines
- Format: competition “Data” bundle (lightweight exports)
- Same task: same variables and split logic, at reduced spatial resolution
- Historical:
(lat: 12, lon: 18, time: 60224) - Projections:
(lat: 12, lon: 18, time: 31389)
Inputs
Forcing variables include greenhouse gases (global) and aerosols (spatial). Aerosol inputs are temporally sparse for some scenarios, and are interpolated to monthly values.
Targets
Your emulator can predict daily 2D maps (192×288) of 7 variables (useful for diagnostics and for computing extremes):
| Variable | Description | Units |
|---|---|---|
tas | Near-surface air temperature | K |
tasmax | Daily max near-surface air temperature | K |
tasmin | Daily min near-surface air temperature | K |
pr | Precipitation | kg/(m² s) |
huss | Near-surface specific humidity | kg/kg |
psl | Sea level pressure | Pa |
sfcWind | Near-surface wind speed | m/s |
Benchmark target: the leaderboard score is computed on 15 extreme indices derived from daily temperature and precipitation.
Evaluation focused on extremes
Because averages hide risk: extremes determine impacts
Primary score
Models are scored on 15 derived extreme indices using the region-wise normalized Nash–Sutcliffe efficiency (nNSE). Cell-level \(R^2\) is transformed via \(\mathrm{nNSE}_{ij} = R^2_{ij}/(2-R^2_{ij})\), mapping \(R^2\) to \((-1,1]\). Regional scores are area-weighted over AR6 land regions and then averaged uniformly:
Why indices?
Indices convert daily fields into impact-relevant summaries: how hot the hottest day gets, how long droughts persist, how much rain falls during the wettest multi-day event, and how the fraction of rainfall from extremes changes.
Why nNSE?
Using nNSE instead of a raw error ensures the score is bounded, physically interpretable, and comparable across indices with different units and scales.
The 15 indices and the questions they answer
Temperature extremes
- TXx, TNn: hottest day / coldest night intensity (heatwaves, cold snaps)
- SU, TR: frequency of hot days and hot nights (human thermal stress)
- FD, ID: frost and ice days (ecosystems and agriculture)
- WSDI, CSDI: warm/cold spell duration (persistence of extremes)
- GSL: growing season length (shifts in crop calendars)
Precipitation extremes
- Rx5day: intensity of multi-day rainfall events (flood risk)
- CDD, CWD: dry/wet spell persistence (drought and prolonged wet periods)
- R95pTOT: share of rainfall from very wet days (tail-dominated precipitation regimes)
- R10mm, SDII: frequency and intensity of heavy rain (infrastructure design)
- Use only the data provided by the organizers for training (no external CMIP6 data; no models pre-trained on CMIP6).
- Submit predictions via Kaggle.
- Teams are limited to 10 members (team merges allowed up to one month before the deadline).
- Organizers may run validity checks and request training/inference code and weights; suspicious entries may be temporarily removed while being reviewed.
- To be eligible for prizes and final ranking, top-ranked participants must open-source code and weights under an MIT or Apache-2.0 license.
Timeline
Proposed schedule for the IJCAI 2026 ClimX challenge
Get started
From zero to a valid submission
Download data
Use ClimX-lite on Kaggle for quick iteration or download the full dataset on Hugging Face for full-resolution training.
Train an emulator
Start from the provided baselines and improve speed, accuracy, and extreme fidelity.
Submit on Kaggle
Submit your predicted indices on Kaggle.

