The Met Office, where I work, has just finalised an agreement to buy a new supercomputer. This isn’t that rare an event – you can’t do serious weather forecasting without a supercomputer and, just like everyday computers, they need replacing every few years as their technology advances. But this one’s a big-un, and the news reminded me of the importance of high-performance computing, even to observational projects like oldWeather.
To stand tall and proud in the world of supercomputing, you need an entry in the Top500: This is a list, in rank order, of the biggest and fastest computers in the world. These machines are frighteningly powerful and expensive, and a few of them have turned part of their power to using the oldWeather observations:
- Currently at number 34 in the world is Hopper: A Cray XE6 at the US National Energy Research Scientific Computing Centre (NERSC). Hopper is the main computing engine for the current developments of the Twentieth Century Reanalysis (20CR).
- At numbers 60 and 61 in the list are the pair of IBM Power775s (1,2) which used to support the European Centre for Medium-Range Weather Forecasts (ECMWF). Operational centres, like ECMWF, tend to buy supercomputers in pairs so they can keep working even if one system needs repair or maintenance – we have to issue weather forecasts every day, we can’t just stop for a while while we fix the computer. These two machines were used to produce ERA-20C.
Two other machines have not used our observations yet (except for occasional tests), but are gearing up to do so in the near future:
- At number 18 in the world is Edison: NERSC’s latest supercomputer, a Cray XC30.
- At number 64 is Gaea C2 – the US National Oceanic and Atmospheric Administration (NOAA)‘s supercomputer at Oak Ridge.
My personal favourite, though, is none of these: Carver is not one of the really big boys. An IBM iDataPlex with only 9,984 processor cores, it ranked at 322 in the list when it was new, in 2010, and has since fallen off the Top500 altogether; overtaken by newer and bigger machines. It still has the processing power of something like 5000 modern PCs though, and shares in NERSC’s excellent technical infrastructure and expert staff. I use Carver to analyse the millions of weather observations and terabytes of weather reconstructions we are generating – almost all of the videos that regularly appear here were created on it.
The collective power of these systems is awe-inspiring. One of the most exciting aspects of working on weather and climate is that we can work (through collaborators) right at the forefront of technical and scientific capability.
But although we need these leading-edge systems to reconstruct past weather, they are helpless without the observations we provide. All these computers together could not read a single logbook page, let alone interpret the contents; the singularity is not that close; we’re still, fundamentally, a people project.
Today is the fourth birthday of oldWeather, and it’s almost two years since we started work on the Arctic voyages. So it’s a good time to illustrate some more of what we’ve achieved:
I’m looking at the moment at the Arctic ships we’ve finished: Bear, Corwin, Jeannette, Manning, Rush, Rodgers, Unalga II, and Yukon have each had all of their logbook pages read by three people; so it’s time to add their records to the global climate databases and start using them in weather reconstructions. From them we have recovered 43 ship-years of hourly observations – more than 125,000 observations concentrating on the marginal sea-ice zones in Baffin Bay and the Bering Strait – an enormous addition to our observational records.
The video above shows the movements of this fleet (compressed into a single year). They may occasionally choose to winter in San Pedro or Honolulu, but every summer they are back up against the ice – making observations exactly where we want them most.
So in our last two years of work, we’ve completed the recovery of 43-ship years of logbooks, and actually we’ve done much more than that: The eight completed ships shown here make up only about 25% of the 1.5 million transcriptions we’ve done so far. So this group is only a taster – there’s three times as much more material already in the pipeline.
Sometimes there is just no word powerful enough to describe the achievements of oldWeather.
Back in March we reached a million, and since then we’ve powered on from that milestone, now having added an additional five hundred thousand observations to our tally. That’s two new observations every minute, night and day, 7 days a week: Come rain or shine; snow or sleet; ice, fire, or fog.
As I’ve mentioned previously, last Thursday I was warm up man for Charles Darwin and Robert Fitroy (finally, a job truly worthy of oldWeather) – I was giving a talk about the project at the Progress Theatre in Reading.
HMS Beagle isn’t (yet) one of our ships, the observations from her 1831-6 circumnavigation had been rescued before oldWeather started; but I could use what I’ve learned from analysing the oldWeather observations to show the route of the ship, the weather they experienced, and the effect of their observations on our reanalyses for the period.
The answer, as we know, is 42 – but does that mean that it’s exactly 42; or somewhere between 41.5 and 42.5; or is 42 just a ball-park estimate, and the answer could actually be, say, 37?
The value of science is its power to generate new knowledge about the world, but a key part of the scientific approach is that we care almost as much about estimating the accuracy of our new knowledge as about the new knowledge itself. This is certainly my own experience: I must have spent more time calculating how wrong I could be – estimating uncertainty ranges on my results – than on anything else.
One reason I like working with the 20th Century Reanalysis (20CR) is that it comes with uncertainty ranges for all of its results. It achieves this by being an ensemble analysis – everything is calculated 56 times, and the mean of the 56 estimates is the best estimate of the answer, while their standard deviation provides an uncertainty range. This uncertainty range is the basis for our calculation of the ‘fog of ignorance‘.
We are testing the effects of the new oldWeather observations on 20CR – by doing parallel experiments reconstructing the weather with and without the new observations. We have definitely produced a substantial improvement, but to say exactly how much of an improvement, where, and when, requires careful attention to the uncertainty in the reconstructions. In principle it’s not that hard: if the uncertainty in the reanalysis including the oldWeather observations is less than the uncertainty without the new observations, then we’ve produced an improvement (there are other possible improvements too, but let’s keep it simple). So I calculated this, and it looked good. But further checks turned up a catch: we don’t know the uncertainty in either case precisely, we only have an estimate of it, so any improvement might not be real – it might be an artefact of the limitations of our uncertainty estimates.
To resolve this I have entered the murky world of uncertainty uncertainty. If I can calculate the uncertainty in the uncertainty range of each reanalysis, I can find times and places where the decrease in uncertainty between the analysis without and with the oldWeather observations is greater than any likely spurious decrease from the uncertainty in the uncertainty. (Still with me? Excellent). These are the times and places where oldWeather has definitely made things better. In principle this calculation is straightforward – I just have to increase the size of the reanalysis ensemble: so instead of doing 56 global weather simulations we do around 5600; I could then estimate the effect of being restricted to only 56. However, running a global weather simulation uses quite a bit of supercomputer time; running 56 of them requires a LOT of supercomputer time; and running 5600 of them is – well, it’s not going to happen.
So I need to do something cleverer. But as usual I’m not the first person to hit this sort of problem, so I don’t have to be clever myself – I can take advantage of a well-established general method for faking large samples when you only have small ones – a tool with the splendid name of the bootstrap. This means estimating the 5600 simulations I need by repeatedly sub-sampling from the 56 simulations I’ve got. The results are in the video below:
By bootstrapping, we can estimate a decrease in uncertainty that a reanalysis not using the oldWeather observations is unlikely to reach just by chance (less than 2.5% chance). Where a reanalysis using the oldweather observations has a decrease in uncertainty that’s bigger than this, it’s likely that the new observations caused the improvement. The yellow highlight in this video marks times and places where this happens. We can see that the regions of improvement show a strong tendency to cluster around the new oldweather observations (shown as yellow dots) – this is what we expect and supports the conclusion that these are mostly real improvements.
It’s also possible, though unlikely, that adding new observations can make the reanalysis worse (increase in estimated uncertainty). The bootstrap also gives an increase in uncertainty that a reanalysis not using the oldWeather observations is unlikely to reach just by chance (less that 2.5% probable) – the red highlight marks times and places where the reanalysis including the observations has an increase in uncertainty that’s bigger than this. There is much less red than yellow, and the red regions are not usually close to new observations, so I think they are spurious results – places where the this particular reanalysis is worse by chance, rather than systematically made worse by the new observations.
This analysis meets it’s aim of identifying, formally, when and where all our work transcribing new observations has produced improvements in our weather reconstructions. But it is still contaminated with random effects: We’d expect to get spurious red and yellow regions each 2.5% of the time anyway (because that’s the threshold we chose), but there is a second problem: The bootstrapped 2.5% thresholds in uncertainty uncertainty are only estimates – they have uncertainty of their own, and where the thresholds are too low we will get too much highlighting (both yellow and red). To quantify and understand this we need to venture into the even murkier world of uncertainty uncertainty uncer… .
No – that way madness lies. I’m stopping here.
OK, as you’re in the 0.1% of people who’ve read all the way to the bottom of this post, there is one more wrinkle I feel I must share with you: The quality metric I use for assessing the improvement caused by adding the oW observations isn’t simply the reanalysis uncertainty, it’s the Kullback–Leibler divergence of the climatological PDF from the reanalysis PDF. So for ‘uncertainty uncertainty’ above read ‘Kullback–Leibler divergence uncertainty’. I’d have mentioned this earlier, except that it would have made an already complex post utterly impenetrable, and methodologically it makes no difference, as one great virtue of the bootstrap is that it works for any metric.
Imagine you have a free hour, one wet weekend, so you settle down to a little light reading:
- You might begin with Marcel Proust’s classic À la recherche du temps perdu – with 3,031 pages it would keep you occupied for a while.
- When you’d finished that you could try Samuel Richardson’s Clarissa – a mere 1,534 pages
- And how about Luo Guanzhong’s Romance of the three kingdoms – 2,340 pages,
- Leo Tolstoy’s War and Peace – 1,440 pages,
- and finally, relax your mind with David Foster Wallace’s Infinite Jest – 1,104 pages.
That’s 8,000 pages. If you read them all again the following weekend (to catch the subtleties that escaped you the first time), and then again, and again, and again, and again, and again; you’d still be 6,000 pages short of matching the work we’ve done reading the logbooks of USS Bear.
So congratulations to lollia paolina, gastcra, Hanibal94, DennisO, jil, pommystuart, LarryW, smith7748, tastiger, and every one of the 402 other crew members – on an achievement of epic proportions: From the 20,930 pages of the Bear’s logs (each read 3 times, remember), we’ve recorded 349,015 weather observations, each with several components (wind speed, barometer, air temperature, etc.) making more than 2.9 million data points.
And of course it’s not just weather, those logs also provided 22,957 dates, 6427 longitudes, 2947 people, 189 animals, 19,489 places, and more … 8,872,438 characters in all.
As before, I’ve used the transcriptions to make a movie version. But the sheer size of the achievement causes problems even here: I thought that a maximum movie length of 10,000 seconds was way more than enough, but not for the Bear. So while I sort that out, here’s just the first installment: 1884-1890.
If you look at weatherdetective.net.au you might get a feeling of deja-vu – a sense that you’ve seen something similar before.
oldWeather is not yet four, a bit young to be having children. But that’s four internet years, so maybe it’s time: We’ve contributed DNA to plenty of other projects, but Weather Detective is our first direct descendant.
As with all children it’s a separate person – with its own science team, volunteer community, logbooks, and interface. They are friends, but they will be doing things differently from us (and we’re not too old to learn from their approach).
Noted naval exploring captain; surveyor and hydrographer; Vice-Admiral; pioneering weather forecaster, founder of the Met Office (first Meteorological Statist to the Board of Trade); governor of New Zealand. Robert Fitzroy was a man of parts, who made a great impact on the world.
But he sits in the shadow of Charles Darwin, who accompanied him on his 1831-6 circumnavigation in HMS Beagle. This September (8-13) the Progress Theatre in Reading, jointly with WAM, the Festival of Weather, Arts and Music, is staging Juliet Aykroyd’s play ‘Darwin and Fitzroy’. Each day, the performance is supported by a side event celebrating Fitzroy’s life and influence, and on Thursday September 11th that side event is me, talking about oldWeather. (We have not put the Beagle on oldWeather (yet) but I’ve got her weather observations from the 1831-6 circumnavigation).
Why not come along? Tickets are £10 (£8 concessions) – it’s an amateur performance, that’s just to cover the theatre costs. There is a different side-event on each day, but if you choose to come on Thursday, I’ll be happy to see you.
Naval-History.net has been a core partner since the very start of oldWeather: That’s where we publish all the historical events we find in the logbooks – these are being carefully edited into ship histories (UK and US) by a team of volunteers. Naval-History.net is also run by a group of dedicated (and expert) amateurs, and adding the vast quantity of new material we are transcribing is a lot of work. Some of our volunteer editors have taken on the additional responsibility for this, and Gordon (who runs Naval-History.net) has added them to his crew:
- Navigator Maikel prepares and publishes completed RN and US ship histories. In addition, he has published a large backlog of RN ship histories that were waiting for the addition of corrected navigation data and Journey Plotter maps (Maikel built the Journey Plotter program in his spare time), and published new US logs ready for editing. Not only that; he has created individual pages for the ships in the overhauled US index, and made many behind the scenes improvements, such as redirection pages and improvements to the RN index page.
- Paymaster Janet sends out transcribed logs for editing and receives the completed edits in return. She checks them for style and obvious errors before making them available for Maikel to publish on the Naval-History website. She also keeps track of the status of all available logs – waiting to be edited, being edited, complete and published, or awaiting publication or updates; she also keeps record of which logs have been reserved by which editors. Janet also has checked the new US ship files for errors and supplied missing service histories.
- Leading Stoker Howard does a lot of background work. He converts oldWeather’s plain text transcription files into the formatted Word files that Janet sends to the editors. He also creates many Journey Plotter maps for those editors who request his help. He corrects or adds latitudes/longitudes as necessary that allow Journey Plotter to produce maps – both of the ship’s full voyage and of more detailed sub-sections. Leading Stoker is his choice of rank; he keeps the fires burning.
- Leading Telegraphist/Writer Caro advertises, using Naval-History’s accounts, new and updated ship histories on the social media, getting the word out on Facebook, Twitter and Google+. She also rewrites important sections of text for the Naval-History site and supplies artwork on occasion. She created the new logo for Naval-History.net that is starting to appear on the site and in social media.