We highlight improvements to the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) in the latest Release 3.0 (R3.0; covering 1662–2014). ICOADS is the most widely used freely available collection of surface marine observations, providing data for the construction of gridded analyses of sea surface temperature, estimates of air–sea interaction and other meteorological variables. ICOADS observations are assimilated into all major atmospheric, oceanic and coupled reanalyses, further widening its impact. R3.0 therefore includes changes designed to enable effective exchange of information describing data quality between ICOADS, reanalysis centres, data set developers, scientists and the public. These user-driven innovations include the assignment of a unique identifier (UID) to each marine report – to enable tracing of observations, linking with reports and improved data sharing. Other revisions and extensions of the ICOADS’ International Maritime Meteorological Archive common data format incorporate new near-surface oceanographic data elements and cloud parameters. Many new input data sources have been assembled, and updates and improvements to existing data sources, or removal of erroneous data, made. Coupled with enhanced ‘preliminary’ monthly data and product extensions past 2014, R3.0 provides improved support of climate assessment and monitoring, reanalyses and near-real-time applications.
Sounds exciting, doesn’t it? Well, it’s even more exciting than it sounds, because that’s the abstract of Freeman, E., Woodruff, S. D., Worley, S. J., Lubker, S. J., Kent, E. C., Angel, W. E., Berry, D. I., Brohan, P., Eastman, R., Gates, L., Gloeden, W., Ji, Z., Lawrimore, J., Rayner, N. A., Rosenhagen, G. and Smith, S. R. (2016), ICOADS Release 3.0: a major update to the historical marine climate record. Int. J. Climatol.. doi:10.1002/joc.4775 which has just hit the scientific literature, and which describes the latest version of the International Comprehensive Ocean-Atmosphere DataSet, the collection of marine weather data.
Everything about oldWeather has been free from the start: our ambition has always been to make the information in our logbooks openly available for anyone to use, and our results already have seen use in datasets, reanalyses, historical and personal projects, … But whenever anyone asks me “What are you doing with the results of this project?”, I’ve always answered “We’re going to put the new observations in ICOADS – to make them available for all future uses, in climate and other fields”. With ICOADS R3.0, we have finally achieved this: ICOADS is internally arranged in ‘decks’ (a reminder that data collection is older than the digital computer) – it now includes decks 249 “World War I (WW1) UK Royal Navy Logbooks” and 710 “US Arctic Logbooks” – clearly illustrated in figure 1.
I’ve been working as a scientist for a while now, but the publication of a new paper is still something of an event. Scientific papers come in many forms: some describe wild new ideas, brave experiments, or dramatic breakthroughs – this one is nothing like that; it just reports the work of many people, over several years, scavenging observations from wherever we can find them, systematising them, quality controlling them, analysing them, and now releasing them. It makes up for its lack of drama by being useful – the surface marine record is one of the most widely used datasets in all of climate: our observations are used directly in the monitoring datasets that measure climate change, they are assimilated in all reanalyses, they provide boundary conditions and validation for the models we use for predicting future change, and they provide calibration to palaeoclimate reconstructions of the deep past.
So from now on, if you’ve contributed to oldWeather, keep an eye out for any new climate results. Whether it’s a new global temperature record, a prediction of climate for the next generation, a study of changes in flood or drought risk, a government report on climate impacts and adaptation, … anything really. When you see it, square your shoulders and stand a little taller – that result owes something to you.
Kelp is, perhaps, more important than you might guess: Not only does it thicken your toothpaste, it supports whole marine ecosystems where it grows. It is important enough to have a Zooniverse project devoted to watching it grow, through Landsat images.
Satellite imagery is a great way to monitor the world – providing frequent, comprehensive pictures of the whole planet. But in-situ observations also have their place: people on the ground, interacting directly with the system being monitored, can often provide a detail and precision that the satellite records lack.
One of the unexpected joys of oldWeather is that it provides in-situ observations of a vast range of different things. Most often kelp is mentioned in the logs simply as a highlight of a day at sea:
Fine weather. Light breeze from South. At 2.30 took in and furled the sails. Passed a piece of kelp. [Yorktown, May 1892.]
4 to 8 a.m. Overcast but pleasant. Airs from NE. Passed some kelp. [Same, a day later.]
Sometimes we see the kelp interacting with the environment:
Saw a large patch of kelp with a dozen seals hauled out on it. [Rush, June 1891]
Sighted a whale and a bunch of kelp. [Yorktown, May 1892]
At daylight passed much drift kelp, to one large batch a boulder about 3 ft in diameter was attached [Patterson February 1885]
Occasionally they do seem to be actively surveying it:
Steaming along at various speeds, locating outer limit of kelp beds off La Jolla, fog gradually increasing, log hauled in. [Pioneer, spring 1923].
Continued sounding passing inside of Aleks Rock. No signs of kelp were seen. [Patterson].
But the most interesting mentions feature it as a hazard to navigation. I suspect most of our log-keepers would see definite benefits in any decline of kelp:
Slowed down a few minutes on account of kelp. [Concord, August 1901]
4:15 Kelp ahead, full speed astern … Ran about 1/2 mile SWxW and ran into kelp again. Wreck bore E 1/2 N. Stopped and backed away from it [Patterson].
found four masted schooner “Watson A. West” in the kelp on the outer edge of the shoal, broadside to the beach, close in and in dangerous position [Unalga, October 1916].
Between six and seven o’clock, patent log registered only 3.9 knots: hauled in rotator and found it fouled with kelp; cleared it, and allowed 2.6 knots for the discrepancy. [Commodore Perry, July 1896].
Found spar buoy #16, two hundred yards NE of true position and in kelp. [Commodore Perry, February 1903].
At 10.36 sighted what appeared to be a pinnacle rock. Stopped ship lowered boat and after inspection the object proved to be a much worn spar, heel up, with kelp attached. [Yorktown, June 1894].
We don’t have that many observations of kelp – we probably won’t be much help to the Floating Forests team mapping the distribution, but we do have our own viewpoints to add – aspects that the satellites will never see:
oldWeather does not produce that many research papers: The process of science is changing fast at the moment – moving away from individual researchers and small groups working independently, and towards much larger consortia working together with big datatsets on major problems – fewer small papers, more big science. As a major data provider, closely linked into the international climate research community, we fit nicely into this new model.
But the academic papers industry is still out there, and now and again one appears that involves us directly. One of the major datasets we contribute to is the International Surface Pressure Databank, and Tom Cram and colleagues have a new paper about the data (including our contributions) and how it’s assembled and distributed through the excellent Research Data Archive at NCAR. It’s open access – free for all to read (here’s the link) – why not check out the oldWeather references and see exactly how our results are being used by researchers.
On April 15th (2015) the (UK) National Maritime Museum is hosting a special one-day seminar organised by the (UK) Royal Meteorological Society. The meeting is in honour of the remarkable Matthew Fontaine Maury (1806-1873), who established the value of marine weather observations for scientific research.
The meeting covers everything from using the very earliest records to make circulation indices, to modern satellite observations. The speakers include several members of the oldWeather science team, and one of the talks is about the leading current method of recovering marine observations: I’m talking about oldWeather at 14:40. (Full agenda).
It’s an open meeting – all are welcome.
The Met Office, where I work, has just finalised an agreement to buy a new supercomputer. This isn’t that rare an event – you can’t do serious weather forecasting without a supercomputer and, just like everyday computers, they need replacing every few years as their technology advances. But this one’s a big-un, and the news reminded me of the importance of high-performance computing, even to observational projects like oldWeather.
To stand tall and proud in the world of supercomputing, you need an entry in the Top500: This is a list, in rank order, of the biggest and fastest computers in the world. These machines are frighteningly powerful and expensive, and a few of them have turned part of their power to using the oldWeather observations:
- Currently at number 34 in the world is Hopper: A Cray XE6 at the US National Energy Research Scientific Computing Centre (NERSC). Hopper is the main computing engine for the current developments of the Twentieth Century Reanalysis (20CR).
- At numbers 60 and 61 in the list are the pair of IBM Power775s (1,2) which used to support the European Centre for Medium-Range Weather Forecasts (ECMWF). Operational centres, like ECMWF, tend to buy supercomputers in pairs so they can keep working even if one system needs repair or maintenance – we have to issue weather forecasts every day, we can’t just stop for a while while we fix the computer. These two machines were used to produce ERA-20C.
Two other machines have not used our observations yet (except for occasional tests), but are gearing up to do so in the near future:
- At number 18 in the world is Edison: NERSC’s latest supercomputer, a Cray XC30.
- At number 64 is Gaea C2 – the US National Oceanic and Atmospheric Administration (NOAA)‘s supercomputer at Oak Ridge.
My personal favourite, though, is none of these: Carver is not one of the really big boys. An IBM iDataPlex with only 9,984 processor cores, it ranked at 322 in the list when it was new, in 2010, and has since fallen off the Top500 altogether; overtaken by newer and bigger machines. It still has the processing power of something like 5000 modern PCs though, and shares in NERSC’s excellent technical infrastructure and expert staff. I use Carver to analyse the millions of weather observations and terabytes of weather reconstructions we are generating – almost all of the videos that regularly appear here were created on it.
The collective power of these systems is awe-inspiring. One of the most exciting aspects of working on weather and climate is that we can work (through collaborators) right at the forefront of technical and scientific capability.
But although we need these leading-edge systems to reconstruct past weather, they are helpless without the observations we provide. All these computers together could not read a single logbook page, let alone interpret the contents; the singularity is not that close; we’re still, fundamentally, a people project.
Today is the fourth birthday of oldWeather, and it’s almost two years since we started work on the Arctic voyages. So it’s a good time to illustrate some more of what we’ve achieved:
I’m looking at the moment at the Arctic ships we’ve finished: Bear, Corwin, Jeannette, Manning, Rush, Rodgers, Unalga II, and Yukon have each had all of their logbook pages read by three people; so it’s time to add their records to the global climate databases and start using them in weather reconstructions. From them we have recovered 43 ship-years of hourly observations – more than 125,000 observations concentrating on the marginal sea-ice zones in Baffin Bay and the Bering Strait – an enormous addition to our observational records.
The video above shows the movements of this fleet (compressed into a single year). They may occasionally choose to winter in San Pedro or Honolulu, but every summer they are back up against the ice – making observations exactly where we want them most.
So in our last two years of work, we’ve completed the recovery of 43-ship years of logbooks, and actually we’ve done much more than that: The eight completed ships shown here make up only about 25% of the 1.5 million transcriptions we’ve done so far. So this group is only a taster – there’s three times as much more material already in the pipeline.
The answer, as we know, is 42 – but does that mean that it’s exactly 42; or somewhere between 41.5 and 42.5; or is 42 just a ball-park estimate, and the answer could actually be, say, 37?
The value of science is its power to generate new knowledge about the world, but a key part of the scientific approach is that we care almost as much about estimating the accuracy of our new knowledge as about the new knowledge itself. This is certainly my own experience: I must have spent more time calculating how wrong I could be – estimating uncertainty ranges on my results – than on anything else.
One reason I like working with the 20th Century Reanalysis (20CR) is that it comes with uncertainty ranges for all of its results. It achieves this by being an ensemble analysis – everything is calculated 56 times, and the mean of the 56 estimates is the best estimate of the answer, while their standard deviation provides an uncertainty range. This uncertainty range is the basis for our calculation of the ‘fog of ignorance‘.
We are testing the effects of the new oldWeather observations on 20CR – by doing parallel experiments reconstructing the weather with and without the new observations. We have definitely produced a substantial improvement, but to say exactly how much of an improvement, where, and when, requires careful attention to the uncertainty in the reconstructions. In principle it’s not that hard: if the uncertainty in the reanalysis including the oldWeather observations is less than the uncertainty without the new observations, then we’ve produced an improvement (there are other possible improvements too, but let’s keep it simple). So I calculated this, and it looked good. But further checks turned up a catch: we don’t know the uncertainty in either case precisely, we only have an estimate of it, so any improvement might not be real – it might be an artefact of the limitations of our uncertainty estimates.
To resolve this I have entered the murky world of uncertainty uncertainty. If I can calculate the uncertainty in the uncertainty range of each reanalysis, I can find times and places where the decrease in uncertainty between the analysis without and with the oldWeather observations is greater than any likely spurious decrease from the uncertainty in the uncertainty. (Still with me? Excellent). These are the times and places where oldWeather has definitely made things better. In principle this calculation is straightforward – I just have to increase the size of the reanalysis ensemble: so instead of doing 56 global weather simulations we do around 5600; I could then estimate the effect of being restricted to only 56. However, running a global weather simulation uses quite a bit of supercomputer time; running 56 of them requires a LOT of supercomputer time; and running 5600 of them is – well, it’s not going to happen.
So I need to do something cleverer. But as usual I’m not the first person to hit this sort of problem, so I don’t have to be clever myself – I can take advantage of a well-established general method for faking large samples when you only have small ones – a tool with the splendid name of the bootstrap. This means estimating the 5600 simulations I need by repeatedly sub-sampling from the 56 simulations I’ve got. The results are in the video below:
By bootstrapping, we can estimate a decrease in uncertainty that a reanalysis not using the oldWeather observations is unlikely to reach just by chance (less than 2.5% chance). Where a reanalysis using the oldweather observations has a decrease in uncertainty that’s bigger than this, it’s likely that the new observations caused the improvement. The yellow highlight in this video marks times and places where this happens. We can see that the regions of improvement show a strong tendency to cluster around the new oldweather observations (shown as yellow dots) – this is what we expect and supports the conclusion that these are mostly real improvements.
It’s also possible, though unlikely, that adding new observations can make the reanalysis worse (increase in estimated uncertainty). The bootstrap also gives an increase in uncertainty that a reanalysis not using the oldWeather observations is unlikely to reach just by chance (less that 2.5% probable) – the red highlight marks times and places where the reanalysis including the observations has an increase in uncertainty that’s bigger than this. There is much less red than yellow, and the red regions are not usually close to new observations, so I think they are spurious results – places where the this particular reanalysis is worse by chance, rather than systematically made worse by the new observations.
This analysis meets it’s aim of identifying, formally, when and where all our work transcribing new observations has produced improvements in our weather reconstructions. But it is still contaminated with random effects: We’d expect to get spurious red and yellow regions each 2.5% of the time anyway (because that’s the threshold we chose), but there is a second problem: The bootstrapped 2.5% thresholds in uncertainty uncertainty are only estimates – they have uncertainty of their own, and where the thresholds are too low we will get too much highlighting (both yellow and red). To quantify and understand this we need to venture into the even murkier world of uncertainty uncertainty uncer… .
No – that way madness lies. I’m stopping here.
OK, as you’re in the 0.1% of people who’ve read all the way to the bottom of this post, there is one more wrinkle I feel I must share with you: The quality metric I use for assessing the improvement caused by adding the oW observations isn’t simply the reanalysis uncertainty, it’s the Kullback–Leibler divergence of the climatological PDF from the reanalysis PDF. So for ‘uncertainty uncertainty’ above read ‘Kullback–Leibler divergence uncertainty’. I’d have mentioned this earlier, except that it would have made an already complex post utterly impenetrable, and methodologically it makes no difference, as one great virtue of the bootstrap is that it works for any metric.
The UK is too small to have its own weather, we participate in the weather of the North Atlantic region, or indeed the whole world. So to understand why the last UK winter was record-breakingly wet, we need to look at atmospheric behaviour on a large scale. I’ve turned to MERRA – an hour-by-hour reconstruction of the whole atmosphere – and made the video of northern hemisphere weather above. (There’s a lot going on, I recommend watching it in full-screen, press the ‘X’ on the control bar).
The key feature is the sequence of storms that spin off North America, and then head out into the North Atlantic in the Prevailing Westerly Circulation (anti-clockwise in the projection of the video). In November these storms mostly follow the standard path northwest to Greenland or Iceland and Scandinavia, but in December the weather changes: North America becomes much colder and the path of the storms moves south, driving the bad weather straight at the UK. This persistent pattern of a cold North America and southerly Atlantic Storm Track is the outstanding feature of the winter, and it shows up even more clearly a bit higher in the atmosphere – the Upper-Level Winds have a simpler structure, as they are not complicated by contact with the Earth’s surface.
The temperature difference between cold polar air and warmer southerly air stirs up an overturning circulation, and the rotation of the Earth turns this into a strong anti-clockwise (westerly) rotating wind – the Polar Vortex. As early as 1939, Carl-Gustaf Rossby realised that this circulation would not be smooth and stable, and the characteristic undulations (Rossby waves) have a major impact on our weather. It’s a series of these waves that push cold polar air much further south than usual over eastern North America, producing a Very Cold Winter in those parts, shifting the storm tracks south and causing the wet, stormy weather in the UK.
But of course I’m not really interested in modern weather – that’s too easy, with ample satellite observations and tremendous tools like MERRA to show us what’s going on. The challenge is in providing the long-term context needed to understand these modern events – is there a consistent pattern, if not, what’s changed. And it just happens that a previous Markedly Wet UK Winter occurred 99 years earlier, in 1914/5, and we’ve been rescuing logbook observations for that time so we can use them to make improved studies of that winter.
This time we use the Twentieth Century Reanalysis (more precisely a test version of 20CR updated to benefit from oldWeather-rescued observations). In some areas (most obviously the high Arctic) there are no observations so the analysis is too uncertain to be useful, but over the US, UK, and Atlantic storm-track region we can reconstruct the weather of that year.
Again, the picture is clearer if we look at the upper-level circulation:
Do we see the same picture in 1914/5 as in 2013/4? Reality tends to be somewhat messier than the simple explanations that scientists treasure – but I think we do see the same pattern: a persistent tendency for cold, polar air to extend south over North America, and a North Atlantic storm track shifted to the south.
We can say quite precisely what happened last winter, and (thanks, in part, to oldWeather) how last winter compared to previous Exceptional Winters. However the obvious follow-on question is ‘Why did the polar vortex behave like that, and can we predict when it’s going to do it again? We’re still working on that one.
This week, atmospheric scientists are gathering in Queenstown, New Zealand, for the fifth general assembly of the SPARC program (Stratosphere-troposphere Processes And their Role in Climate). We’ve mentioned New Zealand before: both as a country who’s isolation means that its historical weather is poorly documented, and as a Battlecruiser in the original oldWeather fleet. In September 1919 the two met: the battlecruiser visited the country, giving us an opportunity to make a major improvement in reconstructing the climate of the region.
As we showed back in October, we’re now re-doing our analysis of global weather, so we can see exactly how much the observations we’ve recovered from HMS New Zealand have improved our knowledge of the climate of New Zealand (the country). The figure above (made for the SPARC meeting) shows our estimates of the weather in each region visited by HMS New Zealand during her circumnavigation in 1919: blue for before oldWeather, and red a new revision using our observations. The width of the band indicates uncertainty – narrower is better – and the improvement we’ve made is very large.
One of the fun parts of working as a scientist is going to conferences, and in the geosciences, conferences don’t come much bigger than AGU. The American Geophysical Union’s 46th annual Fall Meeting ran last week in San Francisco, and it brought together more than 22,000 scientists for a week of presentations, discussions, celebrations, and beer.
Our man at AGU this year was Gil Compo, and he represented oldWeather at an important side event: The prize ceremony for the 2013 International Data Rescue Award in the Geosciences. We didn’t quite win this prize (the winner was the excellent Nimbus Data Rescue Project), but the judges liked us a lot, and we were awarded an honourable mention. So well done to all the oldWeather participants on a further well-deserved honour, and thanks to the award sponsors and organisers.
Every scientist’s must-have accessory, at any large conference or meeting, is a poster: This is a large sheet of paper (typically A0, or about 4′ by 3′) covered with artistically arranged images and results from your project, which you attach to a wall or display board, and use as a visual aid. Kevin made an excellent poster for us, combining images from all aspects of the project. You can see it on display in the background of the photo above, and if you’d like your own copy, it’s on our resources page.