ClimX

Extreme-aware climate model emulation

Build fast, accurate machine learning emulators for the NorESM2-MM Earth System Model and quantify tail risks in future climate projections.

Daily
outputs
spatial resolution
7
target variables
15
extreme indices

What is ClimX?

A benchmark built for the question that matters: how will extremes change?

Motivation

Earth System Models (ESMs) are our best tools to study climate futures, but they are computationally expensive. This limits how densely we can explore uncertainty (scenarios, initial conditions, and model structure) and makes it hard to answer policy-relevant questions about rare but high-impact events.

Climate emulators are lightweight surrogates that approximate ESM outputs, enabling rapid experimentation and risk assessment.

The task

Build a model that predicts daily 2D maps of climate variables at NorESM2-MM resolution, driven by forcing trajectories and, optionally, past climate state.

Formulation
$$x_t = g(f_t, f_{t-1}, f_{t-2}, \dots, f_{t-\tau}, x_{t-1}, x_{t-2}, \dots, x_{t-\tau})$$
\(x_t\): climate state, \(f_t\): external forcing at time \(t\)

Training: historical (1850–2014) + SSP1-2.6, SSP3-7.0, SSP5-8.5 (2015–2100). Testing: held-out SSP2-4.5 (2015–2100).

Data and access

Two-tier distribution: full-resolution training on Hugging Face, lightweight prototyping on Kaggle

Hugging Face (full)

172GB, full resolution (NetCDF)

  • Best for: full-resolution training and final model development
  • Format: Zarr (streamable / chunked)
  • Includes: historical + training SSP targets and forcings; SSP2-4.5 test forcings (no targets)
  • Historical: (lat: 192, lon: 288, time: 60224)
  • Projections: (lat: 192, lon: 288, time: 31389)

Open dataset page

Kaggle (lite)

800Mb, 16× spatially coarsened lite (debug)

  • Best for: fast prototyping and validating end-to-end pipelines
  • Format: competition “Data” bundle (lightweight exports)
  • Same task: same variables and split logic, at reduced spatial resolution
  • Historical: (lat: 12, lon: 18, time: 60224)
  • Projections: (lat: 12, lon: 18, time: 31389)

Open competition data

Inputs

Forcing variables include greenhouse gases (global) and aerosols (spatial). Aerosol inputs are temporally sparse for some scenarios, and are interpolated to monthly values.

Greenhouse gas forcing time series
Greenhouse gas inputs across scenarios.
Aerosol emissions inputs
Aerosol emission inputs.

Targets

Your emulator can predict daily 2D maps (192×288) of 7 variables (useful for diagnostics and for computing extremes):

Variable Description Units
tasNear-surface air temperatureK
tasmaxDaily max near-surface air temperatureK
tasminDaily min near-surface air temperatureK
prPrecipitationkg/(m² s)
hussNear-surface specific humiditykg/kg
pslSea level pressurePa
sfcWindNear-surface wind speedm/s

See more visualizations

Benchmark target: the leaderboard score is computed on 15 extreme indices derived from daily temperature and precipitation.

Evaluation focused on extremes

Because averages hide risk: extremes determine impacts

Primary score

Models are scored on 15 derived extreme indices using the region-wise normalized Nash–Sutcliffe efficiency (nNSE). Cell-level \(R^2\) is transformed via \(\mathrm{nNSE}_{ij} = R^2_{ij}/(2-R^2_{ij})\), mapping \(R^2\) to \((-1,1]\). Regional scores are area-weighted over AR6 land regions and then averaged uniformly:

$$\mathrm{nNSE}_{kv} = \frac{\sum_{(i,j)\in k \cap \mathcal{V}} \cos\phi_i \, \mathrm{nNSE}_{ij}}{\sum_{(i,j)\in k \cap \mathcal{V}} \cos\phi_i}$$ $$S = \frac{1}{|V|}\sum_{v \in V}\frac{1}{|K_v|}\sum_{k \in K_v}\mathrm{nNSE}_{kv}$$
\(S=1\): perfect agreement, \(S=0\): mean predictor, and \(S\to -1\): pathological performance. Cells with negligible temporal variability are excluded.

Why indices?

Indices convert daily fields into impact-relevant summaries: how hot the hottest day gets, how long droughts persist, how much rain falls during the wettest multi-day event, and how the fraction of rainfall from extremes changes.

Why nNSE?

Using nNSE instead of a raw error ensures the score is bounded, physically interpretable, and comparable across indices with different units and scales.

The 15 indices and the questions they answer

Temperature extremes

  • TXx, TNn: hottest day / coldest night intensity (heatwaves, cold snaps)
  • SU, TR: frequency of hot days and hot nights (human thermal stress)
  • FD, ID: frost and ice days (ecosystems and agriculture)
  • WSDI, CSDI: warm/cold spell duration (persistence of extremes)
  • GSL: growing season length (shifts in crop calendars)

Precipitation extremes

  • Rx5day: intensity of multi-day rainfall events (flood risk)
  • CDD, CWD: dry/wet spell persistence (drought and prolonged wet periods)
  • R95pTOT: share of rainfall from very wet days (tail-dominated precipitation regimes)
  • R10mm, SDII: frequency and intensity of heavy rain (infrastructure design)
Submission rules (summary)
  • Use only the data provided by the organizers for training (no external CMIP6 data; no models pre-trained on CMIP6).
  • Submit predictions via Kaggle.
  • Teams are limited to 10 members (team merges allowed up to one month before the deadline).
  • Organizers may run validity checks and request training/inference code and weights; suspicious entries may be temporarily removed while being reviewed.
  • To be eligible for prizes and final ranking, top-ranked participants must open-source code and weights under an MIT or Apache-2.0 license.

Timeline

Proposed schedule for the IJCAI 2026 ClimX challenge

April 15, 2026 — Challenge launch (Kaggle + Hugging Face release)
July 15, 2026 — Submission deadline (Kaggle evaluation closes)
July 30, 2026 — Winner notifications
August 15–21, 2026 — Workshop session at IJCAI 2026 (Bremen)

Baselines and results

Reference implementations and their performance reports

Get started

From zero to a valid submission

1

Download data

Use ClimX-lite on Kaggle for quick iteration or download the full dataset on Hugging Face for full-resolution training.

2

Train an emulator

Start from the provided baselines and improve speed, accuracy, and extreme fidelity.

3

Submit on Kaggle

Submit your predicted indices on Kaggle.