AI Prediction System Now Beats the Best Traditional Weather Forecasts

Move Over, European Model

Jan 07, 2025

GenCast is an AI-powered weather model from Google DeepMind that uses probability-based forecasting to predict weather up to 15 days in advance. This article first appeared in The Conversation on December 4 and is reprinted here with permission.

By VASSILI KITSIO

Senior Research Scientist, Climate Forecasting, Commonwealth Scientific & Industrial Research Association

A new machine-learning weather prediction model called GenCast can outperform the best traditional forecasting systems in at least some situations, according to a paper by Google DeepMind researchers published recently in Nature.

Using a diffusion model approach similar to artificial intelligence (AI) image generators, the system generates multiple forecasts to capture the complex behaviour of the atmosphere. It does so with a fraction of the time and computing resources required for traditional approaches.

How Forecasts Work

The weather predictions we use in practice are produced by running multiple numerical simulations of the atmosphere.

Each simulation starts from a slightly different estimate of the current weather. This is because we don’t know exactly what the weather is at this instant everywhere in the world. To know that, we would need sensor measurements everywhere.

These numerical simulations use a model of the world’s atmosphere divided into a grid of three-dimensional blocks. By solving equations describing the fundamental physical laws of nature, the simulations predict what will happen in the atmosphere.

Known as general circulation models, these simulations need a lot of computing power. They are usually run at high-performance supercomputing facilities.

Machine-Learning

The past few years have seen an explosion in efforts to produce weather prediction models using machine learning. Typically, these approaches don’t incorporate our knowledge of the laws of nature the way general circulation models do.

Most of these models use some form of neural network to learn patterns in historical data and produce a single future forecast. However, this approach produces predictions that lose detail as they progress into the future, gradually becoming “smoother”. This smoothness is not what we see in real weather systems.

Researchers at Google’s DeepMind AI research lab have just published a paper in Nature describing their latest machine-learning model, GenCast.

GenCast mitigates this smoothing effect by generating an ensemble of multiple forecasts. Each individual forecast is less smooth, and better resembles the complexity observed in nature.

The best estimate of the actual future then comes from averaging the different forecasts. The size of the differences between the individual forecasts indicates how much uncertainty there is.

According to the GenCast paper, this probabilistic approach creates more accurate forecasts than the best numerical weather prediction system in the world—the one at the European Centre for Medium-Range Weather Forecasts.

Generative AI for Weather

GenCast is trained on what is called reanalysis data from the years 1979 to 2018. This data is produced by the kind of general circulation models we talked about earlier, which are additionally corrected to resemble actual historical weather observations to produce a more consistent picture of the world’s weather.

The GenCast model makes predictions of several variables such as temperature, pressure, humidity and wind speed at the surface and at 13 different heights, on a grid that divides the world up into 0.25-degree regions of latitude and longitude.

GenCast is what is called a “diffusion model”, similar to AI image generators. However, instead of taking text and producing an image, it takes the current state of the atmosphere and produces an estimate of what it will be like in 12 hours.

This works by first setting the values of the atmospheric variables 12 hours into the future as random noise. GenCast then uses a neural network to find structures in the noise that are compatible with the current and previous weather variables. An ensemble of multiple forecasts can be generated by starting with different random noise.

Forecasts are run out to 15 days, taking 8 minutes on a single processor called a tensor processor unit (TPU). This is significantly faster than a general circulation model. The training of the model took five days using 32 TPUs.

Machine-learning forecasts could become more widespread in the coming years as they become more efficient and reliable.

However, classical numerical weather prediction and reanalysed data will still be required. Not only are they needed to provide the initial conditions for the machine learning weather forecasts, they also produce the input data to continually fine-tune the machine learning models.

What About Climate?

Current machine learning weather forecasting systems are not appropriate for climate projections, for three reasons.

Firstly, to make weather predictions weeks into the future, you can assume that the ocean, land and sea ice won’t change. This is not the case for climate predictions over multiple decades.

Secondly, weather prediction is highly dependent on the details of the current weather. However, climate projections are concerned with the statistics of the climate decades into the future, for which today’s weather is irrelevant. Future carbon emissions are the greater determinant of the future state of the climate.

Thirdly, weather prediction is a “big data” problem. There are vast amounts of relevant observational data, which is what you need to train a complex machine learning model.

Climate projection is a “small data” problem, with relatively little available data. This is because the relevant physical phenomena (such as sea levels or climate drivers such as the El Niño–Southern Oscillation) evolve much more slowly than the weather.

There are ways to address these problems. One approach is to use our knowledge of physics to simplify our models, meaning they require less data for machine learning.

Another approach is to use physics-informed neural networks to try to fit the data and also satisfy the laws of nature. A third is to use physics to set “ground rules” for a system, then use machine learning to determine the specific model parameters.

Machine learning has a role to play in the future of both weather forecasting and climate projections. However, fundamental physics – fluid mechanics and thermodynamics – will continue to play a crucial role.

<h1 class="theconversation-article-title">AI weather models can now beat the best traditional forecasts</h1>

<div class="theconversation-article-body">
    <figure>
      <img src="https://images.theconversation.com/files/636461/original/file-20241205-21-2wiw0n.jpg?ixlib=rb-4.1.0&rect=71%2C1302%2C6000%2C3592&q=45&auto=format&w=754&fit=clip" />
        <figcaption>
          
          <span class="attribution"><a class="source" href="https://visibleearth.nasa.gov/images/68992/low-off-iceland">NASA/GSFC, MODIS Rapid Response Team, Jacques Descloitres</a></span>
        </figcaption>
    </figure>

  <span><a href="https://theconversation.com/profiles/vassili-kitsios-2268380">Vassili Kitsios</a>, <em><a href="https://theconversation.com/institutions/csiro-1035">CSIRO</a></em></span>

  <p>A new machine-learning weather prediction model called GenCast can outperform the best traditional forecasting systems in at least some situations, according to <a href="https://doi.org/10.1038/s41586-024-08252-9">a paper by Google DeepMind researchers</a> published today in Nature.</p>

<p>Using a diffusion model approach similar to artificial intelligence (AI) image generators, the system generates multiple forecasts to capture the complex behaviour of the atmosphere. It does so with a fraction of the time and computing resources required for traditional approaches.</p>

<h2>How weather forecasts work</h2>

<p>The weather predictions we use in practice are produced by running multiple numerical simulations of the atmosphere. </p>

<p>Each simulation starts from a slightly different estimate of the current weather. This is because we don’t know exactly what the weather is at this instant everywhere in the world. To know that, we would need sensor measurements everywhere. </p>

<p>These numerical simulations use a model of the world’s atmosphere divided into a grid of three-dimensional blocks. By solving equations describing the fundamental physical laws of nature, the simulations predict what will happen in the atmosphere.</p>

<p>Known as general circulation models, these simulations need a lot of computing power. They are usually run at high-performance supercomputing facilities.</p>

<h2>Machine-learning the weather</h2>

<p>The past few years have seen an explosion in efforts to produce weather prediction models <a href="https://github.com/rebase-energy/awesome-weather-models">using machine learning</a>. Typically, these approaches don’t incorporate our knowledge of the laws of nature the way general circulation models do. </p>

<p>Most of these models use some form of neural network to learn patterns in historical data and produce a single future forecast. However, this approach produces predictions that lose detail as they progress into the future, gradually becoming “smoother”. This smoothness is not what we see in real weather systems.</p>

<p>Researchers at Google’s DeepMind AI research lab have just published <a href="https://doi.org/10.1038/s41586-024-08252-9">a paper in Nature</a> describing their latest machine-learning model, GenCast. </p>

<p>GenCast mitigates this smoothing effect by generating an ensemble of multiple forecasts. Each individual forecast is less smooth, and better resembles the complexity observed in nature. </p>

<p>The best estimate of the actual future then comes from averaging the different forecasts. The size of the differences between the individual forecasts indicates how much uncertainty there is.</p>

<p>According to the GenCast paper, this probabilistic approach creates more accurate forecasts than the best numerical weather prediction system in the world – the one at the <a href="https://www.ecmwf.int/en/forecasts/documentation-and-support/medium-range-forecasts">European Centre for Medium-Range Weather Forecasts</a>.</p>

<h2>Generative AI – for weather</h2>

<p>GenCast is trained on what is called reanalysis data from the years 1979 to 2018. This data is produced by the kind of general circulation models we talked about earlier, which are additionally corrected to resemble actual historical weather observations to produce a more consistent picture of the world’s weather. </p>

<p>The GenCast model makes predictions of several variables such as temperature, pressure, humidity and wind speed at the surface and at 13 different heights, on a grid that divides the world up into 0.25-degree regions of latitude and longitude.</p>

<p>GenCast is what is called a “diffusion model”, similar to AI image generators. However, instead of taking text and producing an image, it takes the current state of the atmosphere and produces an estimate of what it will be like in 12 hours.</p>

<p>This works by first setting the values of the atmospheric variables 12 hours into the future as random noise. GenCast then uses a neural network to find structures in the noise that are compatible with the current and previous weather variables. An ensemble of multiple forecasts can be generated by starting with different random noise. </p>

<p>Forecasts are run out to 15 days, taking 8 minutes on a single processor called a tensor processor unit (TPU). This is significantly faster than a general circulation model. The training of the model took five days using 32 TPUs.</p>

<p>Machine-learning forecasts could become more widespread in the coming years as they become more efficient and reliable.</p>

<p>However, classical numerical weather prediction and reanalysed data will still be required. Not only are they needed to provide the initial conditions for the machine learning weather forecasts, they also produce the input data to continually fine-tune the machine learning models.</p>

<h2>What about the climate?</h2>

<p>Current machine learning weather forecasting systems are not appropriate for climate projections, for three reasons.</p>

<p>Firstly, to make weather predictions weeks into the future, you can assume that the ocean, land and sea ice won’t change. This is not the case for climate predictions over multiple decades.</p>

<p>Secondly, weather prediction is highly dependent on the details of the current weather. However, climate projections are concerned with the statistics of the climate decades into the future, for which today’s weather is irrelevant. Future carbon emissions are the greater determinant of the future state of the climate. </p>

<p>Thirdly, weather prediction is a “big data” problem. There are vast amounts of relevant observational data, which is what you need to train a complex machine learning model. </p>

<p>Climate projection is a “small data” problem, with relatively little available data. This is because the relevant physical phenomena (such as sea levels or climate drivers such as the El Niño–Southern Oscillation) evolve much more slowly than the weather.</p>

<p>There are ways to address these problems. One approach is to use our knowledge of  physics to <a href="https://doi.org/10.1038/s43247-023-01011-0">simplify our models</a>, meaning they require less data for machine learning.</p>

<p>Another approach is to use <a href="https://doi.org/10.1016/j.jcp.2018.10.045">physics-informed neural networks</a> to try to fit the data and also satisfy the laws of nature. A third is to <a href="https://doi.org/10.1007/s00162-024-00719-9">use physics to set “ground rules”</a> for a system, then use machine learning to determine the specific model parameters.</p>

<p>Machine learning has a role to play in the future of both weather forecasting and climate projections. However, fundamental physics – <a href="https://ww.afms.org.au/docs/AFMS_Riding_the_Wave.pdf">fluid mechanics</a> and thermodynamics – will continue to play a crucial role.<!-- Below is The Conversation's page counter tag. Please DO NOT REMOVE. --><img src="https://counter.theconversation.com/content/245168/count.gif?distributor=republish-lightbox-basic" alt="The Conversation" width="1" height="1" style="border: none !important; box-shadow: none !important; margin: 0 !important; max-height: 1px !important; max-width: 1px !important; min-height: 1px !important; min-width: 1px !important; opacity: 0 !important; outline: none !important; padding: 0 !important" referrerpolicy="no-referrer-when-downgrade" /><!-- End of code. If you don't see any code above, please get new code from the Advanced tab after you click the republish button. The page counter does not collect any personal data. More info: https://theconversation.com/republishing-guidelines --></p>

  <p><span><a href="https://theconversation.com/profiles/vassili-kitsios-2268380">Vassili Kitsios</a>, Senior Research Scientist, Climate Forecasting, <em><a href="https://theconversation.com/institutions/csiro-1035">CSIRO</a></em></span></p>

  <p>This article is republished from <a href="https://theconversation.com">The Conversation</a> under a Creative Commons license. Read the <a href="https://theconversation.com/ai-weather-models-can-now-beat-the-best-traditional-forecasts-245168">original article</a>.</p>
</div>

Ruv Draba

4dEdited

I read this back in December with my informatician's hat on. Peter, thank you for reproducing in a sailing context. Some parochial commentpinions follow.

Here in Australia, the Bureau of Meteorology ('BoM') uses the ACCESS model (http://www.bom.gov.au/australia/charts/about/about_access.shtml). It's produced in much the same way as GFS in the US or ECMWF in Europe. It can offer a granularity of three hours over ten days or one hour over three days but can give substantially different results to the other major models. (PSA: Sailors from other continents visiting Australia's sandbar-riddled, shipwreck-dotted coast often complain that they're not getting the same reliability from GFS and ECMWF here over the same periods -- especially on wind.)

I don't do much coastal sailing though -- I live in the mountains, and here the weather is always volatile. A ten-day forecast might be 90% right on total precipitation over an area, but wrong on when, where and how you'll see it. Base wind-speeds are only indicative, gust ranges are putative, and forecast temperatures can easily be undercooked or overcooked by 10C and more depending on where you are.

If you go out hiking in Summer dressed for 26C/79F you could easily encounter over 37C/100F. Locals know this, but visitors can run out of water in as little as two hours. If you camp on a Summer night in a sleeping bag rated for 6C/43F, you could actually see -3C/27F because of rivers of alpine air from higher altitudes, running through what are called frost hollows. Locals know this, but visitors shiver in their icing tents because they're camped in the wrong valley, didn't notice the utter absence of tall trees and that the grass species had changed character. I volunteer as a campground host for state parks, and spent one Easter fixing torn visitor tents, shredded by unforecast winds encountered in an unforecast hailstorm. (Our own shelter held up fine.)

Australian populations are mainly coastal and most of our weather sensors are located on the coast. We don't have the granularity of data samples to support a detailed machine-learning style model for our alpine and remote regions -- aside from a few meteo stations in ski resorts, our data are already interpolated from sensors hundreds of kms away. (I had a great chat with a data officer at BoM once who explained just how dependent the Bureau is on remote volunteers submitting observational data, and how the historical data evaporated during periods of conscription and world wars.)

So in my context, a 15 day forecast is laughable. Here in the Australian Alps you can trust forecasts maybe three days out, but you'll still watch weather like a hawk and when you're planning a lake sail, the wind is whatever you get every 15 minutes and you might as well bring a canoe-paddle.

So based on those experiences and my own local poking, if I were planning a long cruising trip using such a forecast method I'd really *really* want to know where the nearest data are being gathered from, how often they're gathered, how long they've been gathered for and even *how* they're being gathered. 15 days of forecast accuracy for some sixty year-old airport in a city is not 15 days of accuracy in a volatile weather-zone when the nearest observations are taken 500nm away, record-keeping is manual, and interrupted every other year by hurricanes and cyclones. It may be that for these locations, empirical modelling is more reliable than machine-learned stats, and you'll still need local knowledge to know how reliable because reliability never has been uniform due to data sources, sampling biases and curation quirks.

I hope that may help.

Expand full comment

10 replies by Peter Swanson and others

10 more comments...

LOOSE CANNON

Discussion about this post