I read this back in December with my informatician's hat on. Peter, thank you for reproducing in a sailing context. Some parochial commentpinions follow.
Here in Australia, the Bureau of Meteorology ('BoM') uses the ACCESS model (http://www.bom.gov.au/australia/charts/about/about_access.shtml). It's produced in much the same way as GFS in the US or ECMWF in Europe. It can offer a granularity of three hours over ten days or one hour over three days but can give substantially different results to the other major models. (PSA: Sailors from other continents visiting Australia's sandbar-riddled, shipwreck-dotted coast often complain that they're not getting the same reliability from GFS and ECMWF here over the same periods -- especially on wind.)
I don't do much coastal sailing though -- I live in the mountains, and here the weather is always volatile. A ten-day forecast might be 90% right on total precipitation over an area, but wrong on when, where and how you'll see it. Base wind-speeds are only indicative, gust ranges are putative, and forecast temperatures can easily be undercooked or overcooked by 10C and more depending on where you are.
If you go out hiking in Summer dressed for 26C/79F you could easily encounter over 37C/100F. Locals know this, but visitors can run out of water in as little as two hours. If you camp on a Summer night in a sleeping bag rated for 6C/43F, you could actually see -3C/27F because of rivers of alpine air from higher altitudes, running through what are called frost hollows. Locals know this, but visitors shiver in their icing tents because they're camped in the wrong valley, didn't notice the utter absence of tall trees and that the grass species had changed character. I volunteer as a campground host for state parks, and spent one Easter fixing torn visitor tents, shredded by unforecast winds encountered in an unforecast hailstorm. (Our own shelter held up fine.)
Australian populations are mainly coastal and most of our weather sensors are located on the coast. We don't have the granularity of data samples to support a detailed machine-learning style model for our alpine and remote regions -- aside from a few meteo stations in ski resorts, our data are already interpolated from sensors hundreds of kms away. (I had a great chat with a data officer at BoM once who explained just how dependent the Bureau is on remote volunteers submitting observational data, and how the historical data evaporated during periods of conscription and world wars.)
So in my context, a 15 day forecast is laughable. Here in the Australian Alps you can trust forecasts maybe three days out, but you'll still watch weather like a hawk and when you're planning a lake sail, the wind is whatever you get every 15 minutes and you might as well bring a canoe-paddle.
So based on those experiences and my own local poking, if I were planning a long cruising trip using such a forecast method I'd really *really* want to know where the nearest data are being gathered from, how often they're gathered, how long they've been gathered for and even *how* they're being gathered. 15 days of forecast accuracy for some sixty year-old airport in a city is not 15 days of accuracy in a volatile weather-zone when the nearest observations are taken 500nm away, record-keeping is manual, and interrupted every other year by hurricanes and cyclones. It may be that for these locations, empirical modelling is more reliable than machine-learned stats, and you'll still need local knowledge to know how reliable because reliability never has been uniform due to data sources, sampling biases and curation quirks.
As I just wrote to someone else, I think the most value to mariners is the better advance warning of a hurricane track this technology promises to provide
I hope so, Peter. But consider what a small proportion of data points represent hurricane behaviour in comparison to other data points. Hurricanes are big energy systems, sampled rarely. The reliability that you get on 15 day forecasts for 'ordinary' days may not be reflected in hurricane events.
I understand that cruising sailors want an extra 50% notice on the track, but the volatility may be higher than you'd think and experiences with other forecasts may give a false sense of security. This is one of those likelihood/impact risk analyses where the high impact weights even small changes of likelihood in important ways.
So do you use that extra notice so you can plan what *to* do, or what *not* to do? Some sailors will surely jump one way; some the other.
Over time, I'd be watching how the actuaries in insurance companies start changing terms in the policies -- when they all start treating 15-day warnings as authoritative that'll be our first social indication that it's now reliable for maritime use.
In followup, I was just nosing through the source paper (https://www.nature.com/articles/s41586-024-08252-9). It's published in *Nature*, which in the sciences is prestigious, peer-reviewed and hard to publish in. A good sign, but let's also note that their parent organisation, Deepmind Technologies Limited, is a subsidiary of Alphabet who owns Google. These folks (12 authors in total) are paid to research on topics that make money from data and it's a substantial team: the sort of artist-credits that you'd expect on a Taylor Swift single.
I won't attempt a technical appraisal because I'm not a meteorologist, but it's instructive to note what they were interested in for evaluating their test-cases:
1. A statistical scorecard measure called Continuous Ranked Probability Score (CRPS) -- they reckon it's competitive or better than the comparable component of ECMWF 86% of the time. That's one to impress the statisticians then.
2. Regional wind power forecasting, since that affects load-balancing on wind farms -- there'll be serious $$ in that, and they think it's 20% better across two-day forecasts, 10% better over four days, and has some benefit out to 7 days. That could secure funding from the energy sector; and finally
3. Cyclone tracking per Peter's comment. Again, big buxx if you can do it better. They think they're 12 hours more accurate 1-4 days ahead but haven't made claims about 15 days out. That could interest the financial sector.
To satisfy some professional skepticism I tried to understand what data-sets they're using to test this stuff (data are never as clean, extensive and reliable as you want.) They used ERA5, supplied by ECMWF (https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5). I haven't played with it, but it's a global dataset of historical estimates -- meaning that it's highly processed and interpolated, rather than raw.
That's a convenient data-set to start with for R&D, but it's not local validation and verification against data in some remote tropical island group -- that has yet to come so I'll stand by my earlier concerns.
Regardless though, they claim to offer a calculated 12h advantage over a four day cyclone warning. At 6kts, that's 72nm better if you knew right away, the wind and currents favoured you and the logistics worked out . You'd take it, but I think you'd still be losing tooth-enamel.
Having slept on it, I had two afterthoughts. This is the first.
When developing machine learning systems it's common to take a single data set and split it -- say, 80/20, although the proportions vary.
80% of the data set goes into training (the 'learning' part), and 20% of the data set goes into testing. You'll often use the same data set because testing on other data sets can introduce errors from different data production methods, which can harm the results.
I couldn't tell from a first read whether they did the training on ERA5, or used ERA5 only for testing, but I'll treat the ERA5 as a likely source for training too.
So if they've trained on ERA5 then that raises a concern for me because it's an interpolated data set, constructed by imagining that weather sensors are gridded around the planet, 0.25 degrees of lat and long apart, that they each record more than 80 surface and atmospheric variables, report them reliably every 12 hours and have been doing so reliably over decades.
Which is a meteorological fiction of course -- our weather sensors aren't distributed like that. They don't all report twice per day -- some are reporting every minute, and some might only report when you get it, and most don't know 80 facts about their weather -- they might know only three. And some might be decades old, and others might be years old. And many have known accuracy issues but are too remote to service frequently, so their data get marked up as maybe unreliable on some facts some of the time.
So where's all this smooth, gridded ERA5 data coming from?
I'm pretty sure that it's coming retrospectively from the same empirical modelling that presently forecasts ECMWF weather.
So the slugline 'move over ECMWF' is misleading. For this machine learning system to function, if it's training on ERA5, it's not an alternative to ECMWF, but a potential value-add.
That's not a false claim found in the source paper, but does appear to be invented woo from the reporting media.
You should do this for a living. Feel free to write a debunking response to the article, which I think you already have, and I will happily publishi it.
On further digging, it turns out that ECMWF is neck-deep in Machine Learning for forecasting too, and the research is galloping (https://www.ecmwf.int/en/about/media-centre/aifs-blog/2024/year-ml-weather-forecasting.) They're not just developing multiple international ML forecast collaborations, but whole frameworks for training and evaluation so it's a strategic commitment with what looks like the usual meteorological rigour, noting that lives and livelihoods depend daily on both accuracy and transparency about inaccuracy.
The Deepmind paper then is just one result of potentially many, presumably advanced promotionally and The Convo's 'move over ECMWF' slugline misrepresents ECMWF's research activity as well as the importance of ECMWF's datasets in any Deepmind results.
From a sailing perspective though, I think the main questions are where all this might be heading, how usable it might become, and what to look out for. I need to do some more reading and mull that over.
I'm a community intellectual rather than a public intellectual Peter, for the same reason that my household toilet can be used by friends and neighbours, but is not a public toilet.
But I strongly believe in science being accountable to community -- the same sort of ethos around the power of information that underpins what I'd call ethical journalism but which perhaps could be more properly called 'journalism'. It was your dedication to that principle that had me shell out for a subscription.
Thank you for the invitation. Let me have more of a read and a think and I'll see if I can produce something focused for your readers.
What does substack use for formatting -- is it markdown?
My second afterthought was that yesterday I was thinking like a publicly-funded scientist and for this report I shouldn't.
Publicly-funded scientists are interested in grant money and typically share everything -- data, methods, even code. Once in a while though, they'll patent something and pick a route to market through a spin-off company, venture capital, industry partnership and so on.
But these guys are on Alphabet's payroll. They're paid to find ways to make money from data. It's not like these researchers will pick their own route to market from public good. When mature (and I think it's not), this tech will pick a route to market that Alphabet already excels in.
So how does Alphabet do it? Generally, they do it by making an asymmetric trade -- with individual consumers, with industries, with governments if they can.
Something you want, for all the data you have, to be used for any Alphabet purpose at all, in perpetuity.
I don't want to moralise here, but that realisation gave me indigestion.
Weather-observation is the biggest, longest-running example of citizen science that I can think of, except maybe astronomy and cartography, and our species' ability to produce global weather forecasts today owes itself to that activity.
There's nothing wrong with industries producing commercial value-add to these species-owned data-sets.
But I think there's everything wrong with potentially making asymmetric trades on how to use them..
I read this back in December with my informatician's hat on. Peter, thank you for reproducing in a sailing context. Some parochial commentpinions follow.
Here in Australia, the Bureau of Meteorology ('BoM') uses the ACCESS model (http://www.bom.gov.au/australia/charts/about/about_access.shtml). It's produced in much the same way as GFS in the US or ECMWF in Europe. It can offer a granularity of three hours over ten days or one hour over three days but can give substantially different results to the other major models. (PSA: Sailors from other continents visiting Australia's sandbar-riddled, shipwreck-dotted coast often complain that they're not getting the same reliability from GFS and ECMWF here over the same periods -- especially on wind.)
I don't do much coastal sailing though -- I live in the mountains, and here the weather is always volatile. A ten-day forecast might be 90% right on total precipitation over an area, but wrong on when, where and how you'll see it. Base wind-speeds are only indicative, gust ranges are putative, and forecast temperatures can easily be undercooked or overcooked by 10C and more depending on where you are.
If you go out hiking in Summer dressed for 26C/79F you could easily encounter over 37C/100F. Locals know this, but visitors can run out of water in as little as two hours. If you camp on a Summer night in a sleeping bag rated for 6C/43F, you could actually see -3C/27F because of rivers of alpine air from higher altitudes, running through what are called frost hollows. Locals know this, but visitors shiver in their icing tents because they're camped in the wrong valley, didn't notice the utter absence of tall trees and that the grass species had changed character. I volunteer as a campground host for state parks, and spent one Easter fixing torn visitor tents, shredded by unforecast winds encountered in an unforecast hailstorm. (Our own shelter held up fine.)
Australian populations are mainly coastal and most of our weather sensors are located on the coast. We don't have the granularity of data samples to support a detailed machine-learning style model for our alpine and remote regions -- aside from a few meteo stations in ski resorts, our data are already interpolated from sensors hundreds of kms away. (I had a great chat with a data officer at BoM once who explained just how dependent the Bureau is on remote volunteers submitting observational data, and how the historical data evaporated during periods of conscription and world wars.)
So in my context, a 15 day forecast is laughable. Here in the Australian Alps you can trust forecasts maybe three days out, but you'll still watch weather like a hawk and when you're planning a lake sail, the wind is whatever you get every 15 minutes and you might as well bring a canoe-paddle.
So based on those experiences and my own local poking, if I were planning a long cruising trip using such a forecast method I'd really *really* want to know where the nearest data are being gathered from, how often they're gathered, how long they've been gathered for and even *how* they're being gathered. 15 days of forecast accuracy for some sixty year-old airport in a city is not 15 days of accuracy in a volatile weather-zone when the nearest observations are taken 500nm away, record-keeping is manual, and interrupted every other year by hurricanes and cyclones. It may be that for these locations, empirical modelling is more reliable than machine-learned stats, and you'll still need local knowledge to know how reliable because reliability never has been uniform due to data sources, sampling biases and curation quirks.
I hope that may help.
As I just wrote to someone else, I think the most value to mariners is the better advance warning of a hurricane track this technology promises to provide
I hope so, Peter. But consider what a small proportion of data points represent hurricane behaviour in comparison to other data points. Hurricanes are big energy systems, sampled rarely. The reliability that you get on 15 day forecasts for 'ordinary' days may not be reflected in hurricane events.
I understand that cruising sailors want an extra 50% notice on the track, but the volatility may be higher than you'd think and experiences with other forecasts may give a false sense of security. This is one of those likelihood/impact risk analyses where the high impact weights even small changes of likelihood in important ways.
So do you use that extra notice so you can plan what *to* do, or what *not* to do? Some sailors will surely jump one way; some the other.
Over time, I'd be watching how the actuaries in insurance companies start changing terms in the policies -- when they all start treating 15-day warnings as authoritative that'll be our first social indication that it's now reliable for maritime use.
In followup, I was just nosing through the source paper (https://www.nature.com/articles/s41586-024-08252-9). It's published in *Nature*, which in the sciences is prestigious, peer-reviewed and hard to publish in. A good sign, but let's also note that their parent organisation, Deepmind Technologies Limited, is a subsidiary of Alphabet who owns Google. These folks (12 authors in total) are paid to research on topics that make money from data and it's a substantial team: the sort of artist-credits that you'd expect on a Taylor Swift single.
I won't attempt a technical appraisal because I'm not a meteorologist, but it's instructive to note what they were interested in for evaluating their test-cases:
1. A statistical scorecard measure called Continuous Ranked Probability Score (CRPS) -- they reckon it's competitive or better than the comparable component of ECMWF 86% of the time. That's one to impress the statisticians then.
2. Regional wind power forecasting, since that affects load-balancing on wind farms -- there'll be serious $$ in that, and they think it's 20% better across two-day forecasts, 10% better over four days, and has some benefit out to 7 days. That could secure funding from the energy sector; and finally
3. Cyclone tracking per Peter's comment. Again, big buxx if you can do it better. They think they're 12 hours more accurate 1-4 days ahead but haven't made claims about 15 days out. That could interest the financial sector.
To satisfy some professional skepticism I tried to understand what data-sets they're using to test this stuff (data are never as clean, extensive and reliable as you want.) They used ERA5, supplied by ECMWF (https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5). I haven't played with it, but it's a global dataset of historical estimates -- meaning that it's highly processed and interpolated, rather than raw.
That's a convenient data-set to start with for R&D, but it's not local validation and verification against data in some remote tropical island group -- that has yet to come so I'll stand by my earlier concerns.
Regardless though, they claim to offer a calculated 12h advantage over a four day cyclone warning. At 6kts, that's 72nm better if you knew right away, the wind and currents favoured you and the logistics worked out . You'd take it, but I think you'd still be losing tooth-enamel.
Having slept on it, I had two afterthoughts. This is the first.
When developing machine learning systems it's common to take a single data set and split it -- say, 80/20, although the proportions vary.
80% of the data set goes into training (the 'learning' part), and 20% of the data set goes into testing. You'll often use the same data set because testing on other data sets can introduce errors from different data production methods, which can harm the results.
I couldn't tell from a first read whether they did the training on ERA5, or used ERA5 only for testing, but I'll treat the ERA5 as a likely source for training too.
So if they've trained on ERA5 then that raises a concern for me because it's an interpolated data set, constructed by imagining that weather sensors are gridded around the planet, 0.25 degrees of lat and long apart, that they each record more than 80 surface and atmospheric variables, report them reliably every 12 hours and have been doing so reliably over decades.
Which is a meteorological fiction of course -- our weather sensors aren't distributed like that. They don't all report twice per day -- some are reporting every minute, and some might only report when you get it, and most don't know 80 facts about their weather -- they might know only three. And some might be decades old, and others might be years old. And many have known accuracy issues but are too remote to service frequently, so their data get marked up as maybe unreliable on some facts some of the time.
So where's all this smooth, gridded ERA5 data coming from?
I'm pretty sure that it's coming retrospectively from the same empirical modelling that presently forecasts ECMWF weather.
So the slugline 'move over ECMWF' is misleading. For this machine learning system to function, if it's training on ERA5, it's not an alternative to ECMWF, but a potential value-add.
That's not a false claim found in the source paper, but does appear to be invented woo from the reporting media.
You should do this for a living. Feel free to write a debunking response to the article, which I think you already have, and I will happily publishi it.
On further digging, it turns out that ECMWF is neck-deep in Machine Learning for forecasting too, and the research is galloping (https://www.ecmwf.int/en/about/media-centre/aifs-blog/2024/year-ml-weather-forecasting.) They're not just developing multiple international ML forecast collaborations, but whole frameworks for training and evaluation so it's a strategic commitment with what looks like the usual meteorological rigour, noting that lives and livelihoods depend daily on both accuracy and transparency about inaccuracy.
The Deepmind paper then is just one result of potentially many, presumably advanced promotionally and The Convo's 'move over ECMWF' slugline misrepresents ECMWF's research activity as well as the importance of ECMWF's datasets in any Deepmind results.
From a sailing perspective though, I think the main questions are where all this might be heading, how usable it might become, and what to look out for. I need to do some more reading and mull that over.
I'm a community intellectual rather than a public intellectual Peter, for the same reason that my household toilet can be used by friends and neighbours, but is not a public toilet.
But I strongly believe in science being accountable to community -- the same sort of ethos around the power of information that underpins what I'd call ethical journalism but which perhaps could be more properly called 'journalism'. It was your dedication to that principle that had me shell out for a subscription.
Thank you for the invitation. Let me have more of a read and a think and I'll see if I can produce something focused for your readers.
What does substack use for formatting -- is it markdown?
My second afterthought was that yesterday I was thinking like a publicly-funded scientist and for this report I shouldn't.
Publicly-funded scientists are interested in grant money and typically share everything -- data, methods, even code. Once in a while though, they'll patent something and pick a route to market through a spin-off company, venture capital, industry partnership and so on.
But these guys are on Alphabet's payroll. They're paid to find ways to make money from data. It's not like these researchers will pick their own route to market from public good. When mature (and I think it's not), this tech will pick a route to market that Alphabet already excels in.
So how does Alphabet do it? Generally, they do it by making an asymmetric trade -- with individual consumers, with industries, with governments if they can.
Something you want, for all the data you have, to be used for any Alphabet purpose at all, in perpetuity.
I don't want to moralise here, but that realisation gave me indigestion.
Weather-observation is the biggest, longest-running example of citizen science that I can think of, except maybe astronomy and cartography, and our species' ability to produce global weather forecasts today owes itself to that activity.
There's nothing wrong with industries producing commercial value-add to these species-owned data-sets.
But I think there's everything wrong with potentially making asymmetric trades on how to use them..