Archive by Author | The Zooniverse

The weather in 1.85 characters.

My desk in the Met Office is some way from a window, but if I peer across the heads of a few colleagues I can see that the weather outside is, well, disappointing: A gloomy day, with the sky filled with mottled grey clouds from horizon to horizon (though at least it’s stopped raining). Here in the UK we’re famously obsessed with talking about the weather, but sailors would have no time for such waffle: Following an example set by the famous Admiral Beaufort they record the current weather in a terse code, and today’s weather in Exeter would be simply ‘o’ (overcast), or perhaps ‘oc’ (overcast cloudy) if they were feeling extravagant.

The weather code system has evolved quite a bit since Beaufort’s day, and it’s a powerful and concise way of recording notable weather events. The basic code records the amount of cloud in the sky, and ranges from ‘b’ (clear sky or mostly so), through ‘bc, and ‘c’ to ‘o’ (overcast). These are by far the most common codes, but you can add to them to record many of the various nastys the atmosphere can inflict on you – there are codes for rain, snow, hail, gales, squalls, fog etc.

This means that the longer the code recorded in a logbook, the worse the weather was (or at least the more exciting it was). The longest code I’ve found in the logs completed so far is ‘ocpqrlt’ (overcast, clouds, showers, squalls, rain, thunder and lightning) from HMS Bacchante, at Dakar at midnight on 31st August 1917. (Thanks to captain richbr15, lieutenant dazedandconfused, and the crew for patiently typing all that in). This sort of detail, however, is rarely necessary, and, on average, the logs only need 1.85 characters to record the current weather.

I’m excited by the weather codes because they offer a new opportunity to test our climate models. In principle, if we know the surface pressure and temperature (also in the logs, of course) our models should tell us where it’s clear, where it’s cloudy, where it’s raining, and even about thunderstorms and squalls. In practice it’s not quite as easy as that, partly because our computers are not yet powerful enough to run atmosphere models that are detailed enough to resolve small features like thunderstorms and squalls; but even so I look forward to learning more about the accuracy of our cloud and rainfall models. So please keep entering the weather codes – we need the ordinary records of cloud cover as well as the unusual events.

Since I started writing this the rain has come back, so I should modify my current weather report to ‘or'; but improvement is in sight – the forecast for this weekend is for ‘bc’ (broken cloud), maybe even ‘b’ (little or no cloud) at times. The designers of the weather codes were uninterested in particularly fine weather, so there’s no way of encoding ‘glorious sunshine’ for example (‘gs’ would be gales and snow). Still I wish you all as much ‘b’ as you care for, except for a dose of ‘r’ (rain) for anybody praying for it.

Old weather at 50%

Hey guys

We are so excited that old weather has hit 50%. We thank you for all your help and to give you an idea of how much you are helping climate scientists we thought we would make a plot of all your weather transcriptions so far. You might remember the visualization that Philip created a while ago showing the FOG OF IGNORANCE,  our lack of knowledge about the history of the climate:

Reconstructed weather for March 8th 1918. Colours mark sea-level pressure (red high, blue low), black arrows give surface wind speed and direction. Foggy areas are where we can't say what's happening because we haven't (yet) got any observations.

Well compare that to this map of all your weather classifications! Each point represents a new valuable piece of information about the climate entered by the old weather community. We are really helping to fill in those gaps! Keep up the great work!

Better than the Defence.

One question I’m asked again and again by people encountering OldWeather for the first time is ‘How accurate are the transcriptions?’. We’ve known for a while that the answer is ‘very accurate’, but it’s always nice to be precise about such things, so just how accurate are we?

To find out, let’s look at HMS Defence, which we followed through much of 1914 and 1915, on a voyage from the Dardanelles, to Montevideo, to South Africa, and then back to the UK and patrol in the North Sea. The figure shows the air temperature and pressure recorded during this voyage.

Temperature and pressure timeseries from HMS Defence.

Time series of air temperature (top) and pressure (bottom) transcribed from the logs of HMS Defence

We can see clearly in this image the date when they stopped cruising in tropical and sub-tropical oceans, and returned to the colder and stormier seas around Great Britain – around the beginning of 1915 the air temperature fell by around 30F and the pressure became much more variable. But looking closely at the image, we can also see some errors, both ours and those of the mariners writing the logs in the first place.

We can spot our own errors because each log page is transcribed by at least three people, and when those three people disagree, someone has made a mistake. The logs of the Defence yielded 1119 pressure observations (six a day for about 6 months). For 997 of those observations (89%) everyone who transcribed the observation agreed what it was; for 107 of the observations (10%) two or more of the transcribers agreed on a value, but 1 person disagreed; and for the remaining 15 observations (1.3%) the transcribers did not agree, there was no value with a clear majority of the inputs. (The values entered by individuals that did not agree with the majority are shown in the figure as small red points.)

From the first two categories we can estimate the transcription error rate: in 997*3+107*2=3205 cases the value entered is correct, and in 107 cases it is incorrect, so the error rate is 107/3312 – about 3%. So transcriptions are about 97% accurate – in other words, about 97% of the time the value entered by an individual transcriber is the value that most people would agree is written in the logs – an excellent individual accuracy rate.

If you are familiar with statistics, you may have spotted an inconsistency here: if one person makes a mistake 3% of the time, at least two out of three people should make a mistake on the same observation only about 0.3% of the time (3%*3%*3), while actually this happens much more often than that (1.3% of the time). The reason for this excess of cases where all the transcribers disagree, is that some of the entries are illegible. For example, consider the barometer height at 4am in the log for Thursday 10th September 1914; this was variously transcribed as ‘30.18’, ‘30.10’, and ‘30.12’ – all of which are plausible readings. In this case there is no one answer we can agree on and the disagreement is not a transcription error but a success – we have flagged an entry which cannot be transcribed with confidence. (This is why we encourage you to guess when entering hard-to-read values, when everybody guesses a different answer we know the entry is illegible.)

Even when we have transcribed a value with certainty it may not be correct – sometimes the log-keepers wrote the wrong value in the log: There is no doubt that the barometer height entered for midnight on Wednesday 7th October 1914 is ‘28.80’ inches, but there is also no doubt that the actual pressure was much higher than this (possibly ‘29.80’), and this error can be seen as the first of the three spikes in the figure above. So there are three errors in the log big enough to be obvious in the plot, and probably others with a smaller effect.
This post has turned out much longer and more complicated than I planned – mostly because the definition of ‘transcription error’ from a logbook containing erroneous and illegible entries is not simple – so, in summary:

  1. Individual transcriptions are about 97% accurate
  2. Of 1000 transcribed logbook entries:
    • 3 will be lost because of transcription errors
    • 10 will be illegible
    • At least 3 will be errors in the logs

So for every 16 errors in the transcribed data (which we pass to the science team), only 3 are the responsibility of those of us reading the logs; the other 13 are the problems in the logs themselves. We can say with some confidence that we are better at reading the logs than the original log-keepers were at writing them.
Congratulations to captain ebaldwin and the crew for an excellent job on HMS Defence; and to all the oldWeather participants, as the accuracy of transcription is similarly high on all the ships I’ve looked at.

Old Weather sails on….


Excellent news arrived today. The powers-that-be at JISC have seen fit to reward Old Weather’s success with further funding. We’re one of the projects who were successful in the latest round of their rapid digitization grants. The application – which incidentally laid great emphasis on the hard work that you’d all already done – went in six weeks after the original project launched, and will provide for a host of new things in the next six months or so.

Firstly, we’re going to get more logs. The grant provides funding for the imaging of another (roughly) 3000 ship’s logs – so that’s 3000 more months of history to feed to the site. The idea is to go back and fill in the gaps during the existing time period covered by the site, so there will be new ships, and some existing ships will gain new images.

Secondly, we’re going to add an interface that allows you to assist us in the task of cleaning up the transcribed data. In the proposal, I used the example of HMS Acacia, where the final temperature record contained some sudden jumps between temperatures in the 70s and in the 40s. Reviewing the logs makes it pretty clear what went wrong – 7s are easy to misread as 4s (at least in the handwriting of the officer who wrote the log of the HMS Acacia) – but that’s easily fixed once you review the temperature series. The same goes for sudden jumps in position caused by mixing up East and West. Producing a tool to help make changes like this will not only help us maintain data quality, but also will mean that it’ll be easy to review your ship’s results once it reaches the end of a log. We’re also going to take the chance to build a more flexible interface, allowing us potentially to transcribe logs that aren’t in the same format as the current set.

We’ll report on progress here as we get cracking. Thanks, JISC, for the support, but thank you all for your hard work that led to this vote of confidence. As a thank you, we’ve got a little surprise prepared but I’ll save that for tomorrow.

Stanley to Archangel, and all points in between.

It probably won’t surprise many of you to hear that hear that the Earth is generally warmer at the equator, and colder towards the poles. I base my holiday plans heavily on latitude: going north (from England) for snow, and south for sunbathing. We all know this, but OldWeather has now completed enough log pages that we can prove it just from the logbook observations – the image below shows how air temperature changes with latitude, using the 120,000 temperature observations from pages that have already been examined by the three people we need to provide reliable results.

Air temperature as a function of latitude, with some of the more common port locations marked.

Air temperature as a function of latitude, with some of the more common port locations marked.

So it’s warmer (on average) in Singapore, and colder in Scandinavia; we didn’t need the logbook records to tell us that, but that doesn’t mean that this way of looking at the data is not interesting – partly because comparing the temperature records with others made at the same latitude is a good way of finding outliers: values that are likely to be errors in either recording or transcription.

One thing we can immediately see from the figure without breaking out the evening garcinia cambogia is the spikes at locations associated with ports. The spikes go both up and down, meaning there more of both high and low temperatures at these locations. Partly this will be a a physical effect – temperatures over land do vary more than those over the ocean – but it’s also partly an artefact of the way I’ve made the plot: The Navy ships spend a lot of time in port, so we have many more observations from those locations, and so more unusually high or low values. Even in the ports, however, there are very few really way out values, but some are suspicious: are there really marine temperatures below 0F at about 45N? (Seawater freezes at 29F) Those values come from HMS Bayano, off the Canadian coast in December, (thanks captain spudman and lieutenant Dinsdale, among others) so very low temperatures can’t instantly be ruled out, but they will need further investigation.

The variation of barometer height (air pressure) with latitude is less well known, but just as interesting: this picture is dominated by the low pressure variability in the tropics (steady weather) and the much more variable pressure in the higher latitudes (anticyclones, depressions and storms). We can see very nicely the transition, in the southern hemisphere, from the steady trade-wind regions to the famous ‘roaring forties’ and ‘furious fifties’.

Barometer height (pressure) as a function of latitude

Barometer height (pressure) as a function of latitude

Captains care about the air pressure because it warns them of changes in the wind. This sort of plot isn’t ideal for showing winds, because the wind measurements are restricted to the Beaufort scale categories, but we can still see where the strong winds are to be found. Cruising in the North Atlantic, the Royal Navy’s main stamping ground, was clearly no picnic: with temperatures down to freezing, variable weather and strong winds.

Wind force as a function of latitude.

Wind force as a function of latitude.

The Beaufort scale only goes up to 12; extensions are sometimes used for severe tropical storms, but the value of 15 recorded by HMS Cambrian in Rosyth dockyard in March 1919 is not credible. (Though I congratulate captain MamaLizard and the crew on correctly entering the value in the log – we always want the value written, even when it’s obviously an error). If we disregard the Cambrian’s exaggerations, there are four reports of wind force 12 so far, but they are all typographical errors – it’s not much of a slip of the pen to turn ‘1-2′ into ’12’. We’re still waiting for our first real hurricane.

HMS Africa in action against Orthomyxoviridae

Old Zooniverse hands know all about serendipity – Hanny’s Voorwerp was an unplanned and unexpected discovery from Galaxy Zoo, and has generated a great deal of new science. It could be that involving lots of people in the process of science (the citizen science approach) is a particularly good way to generate surprising and unexpected discoveries.

When OldWeather started, the science team were clear about what we wanted – weather observations please, and lots of them – but we were also aware that there was a lot of other material in the logbooks, and that we needed to enable those reading the logs to record whatever they thought was important. I only really do weather, but I’m not immune from the general fascination that the logs exert over those of us who use them, so I’m very curious about what will emerge from our collective studies – what will the citizen scientists decide is worth recording?

I haven’t yet seen anything as dramatic as a Voorwerp, but quite a few people are getting interested in the sickness records in the logs, and their relationship to the well-known ‘spanish flu’ outbreak in 1918. Many of the logs contain a record of the ‘Number on Sick List'; we didn’t ask for this number to be recorded, but some people have decided to record it anyway, and so far the database has accumulated values from almost 10,000 log pages from 126 different ships. I don’t have any professional expertise in medicine or epidemiology, but I couldn’t resist the temptation to have a peek at the data, and see what had been found.

Number of people on the sick list - all ships.

Number of people on the sick list - all ships.

The figure above shows all 10,000 values, arranged by date. We need to be careful when looking at the data, because I haven’t been able to quality control the sickness counts the way we do with the weather observations, so certainly a few of these points will be errors; but we can see a basic pattern. There are generally only a few people off sick, but occasionally there is a short-lived spike in the number – sometimes to hundreds on the list at once.

A closer look shows that the spikes are infectious outbreaks on a single ship. The biggest spike is seen in the logs of HMS Africa

Sick List for HMS Africa

Sick List for HMS Africa

On September 2nd 1918 the Spanish flu caught up with HMS Africa, and the cases mounted fast – reaching a peak with 476 people ill on the 9th. The Africa was a big ship, a King Edward VII-class battleship, but even so 476 people is nearly 2/3 of the crew; and it must have been a major challenge keeping her operational. The logbook, however, gives few hints of the struggle; and certainly they managed to continue their weather observations through the outbreak – the Navy’s dedication to duty is justly famous. Though virulent, the outbreak was brief; By September 21st the Africa had overcome the virus, but the victory was not without cost – log pages over the infection period regularly record deaths from influenza.

So congratulations to all those who’ve been recording sick-list counts – there’s definitely some interesting material there, and it’s great to see the collective intelligence of the project generating new research directions. I can’t say whether the sick-list counts will lead to new published science in the end – I’m only a climatologist – but I look forward to finding out.

A local habitation

To use the weather records in the OldWeather logbooks, we need to know not only what the observed temperature and pressure were and the date and time of the observation, but also the position of the ship at the time the observation was made (its latitude and longitude).

We are collecting quite a bit of position information in the logs: if the ship is in port, we get the port name; if at sea, the latitude and longitude at noon (and sometimes at 8am and 8pm as well). From this information it’s relatively straightforward to estimate the ship position at any time of the day (we just draw a line between the noon positions and use positions along this line). But, as with all the data we collect, this doesn’t give us exact positions – any of various problems might cause the positions to be inaccurate, and in using the observations we have to make allowances for these inaccuracies.

What might go wrong:

  1. The port name is not always enough to uniquely identify the ship’s location. There is a Devonport in Plymouth in the UK – but there is another one in Auckland, New Zealand. HMS New Zealand was in Devonport on valentine’s day 1919, but which?
  2. The latitude and longitude might be wrong, either because of transcription error, or because of an error in the log. A common such error is confusion between east and west longitudes (or between north and south latitude).
  3. The noon observations are not perfect: they rely on the accuracy of the chronometers and observations used to calculate them. Dead-reckoning and observed noon positions are commonly a few minutes (maybe 10 miles) apart.
  4. When in port, we only have the name of the port, not the precise position of the ship in the associated harbour or anchorage. On December 16th 1919 the New Zealand’s position was given as ‘in Panama Canal': The Panama Canal is 48 miles long , where were they exactly?
  5. Estimating a position at midnight (say) by assuming it’s half way between the preceding and following noon positions assumes the ship is sailing all the time on the same course at constant speed. This is rarely true, so interpolating positions at times of observation from noon positions will introduce an error. In theory, this could produce a big error – if a ship travelled at full speed (say 30 knots) in a straight line between noon and midnight, and then turned around and returned to its starting point for noon the next day, our estimate of the position at midnight would be out by about 400 miles; but ships almost never actually behave like this: as we’ve seen in the routes of the New Zealand, and the Goliath, Gloucester, and Glowworm, ships generally either hang around in one place, or move fairly directly from one point to another. This estimation does introduce an error into positions, but it’s usually modest, rarely more than 30 or 50 miles.

The first two of these will produce big errors in the positions (chosing the wrong Devonport is close to being the biggest error possible), so they are serious, but also easy to spot. These are a nuisance (because we have to correct them), but hardly ever a problem. The last three points are hard to correct, and do introduce modest errors in the observations positions.

The figure shows the noon positions of HMS Pegasus on her Voyage from Plymouth to Arkhangelsk and back to Dundee in 1919 – ably digitised by captain Uldis Ohaks, lieutenants Manock and elizabeth, and their crew.

Noon positions for HMS Pegasus

Noon positions for HMS Pegasus

One position is clearly in error – the visit to Baffin Bay (marked in red) is impossible. Checking the log page for that day shows that, most unusually, both the latitude and longitude of this point are wrong. The longitude has been incorrectly captured by our system as 59 degrees west, while it is actually 59 minutes (i.e. 1 degree) west. We have correctly captured the latitude given in the log (68 degrees north) but this must be an error in the log, because that page also refers to sighting landmarks in Orkney and Caithness, so the ship must be around 10 degrees south of this. We have to guess the actual position, and, in this case it seems likely that the ship was actually at 58 degrees north, and the entry in the log is a typo.

The other positions clearly don’t contain big errors, but it’s clear that the ship didn’t always travel in a straight line between her noon positions – the lines linking them on the map often cross land, in England, northern Norway, and the Kola Penninsula. In these cases, the actual midnight position of the ship is probably about 100 miles from the straight-line-guess.

It’s tempting to fix these problems: we could improve our ship position estimates noticeably by using more sophisticated methods of tracking them. For example, the logs often contain hourly course and speed information, so it would be possible to digitise this and make hourly position estimates by dead reckoning; when within sight of land the logs often contain bearings to points on shore, which could also, in principle, be used to derive a more precise route for the ship. So we could certainly do better (at the cost of a great deal of work) but we’ll never have perfect information on the ship positions, so it’s worth asking first, how precisely we need to know them.

The weather shows itself both in very local effects (showers, contrails, frost hollows) and in very big effects (such as the drought, and now flooding, in Australia). A good illustration of this is the excellent Oldweather authors poster – if you wanted to paint this image you’d need a large brush for North Africa and the tropical oceans, and a very small brush to capture the fine detail. We can capture this detail with modern satellites, but to do the same from ship observations we’d need millions of ships, densely packed over all the oceans – an obvious impossibility. So when doing historical weather reconstructions from ship data we have to forget about the small-scale effects, but it’s still really important to get the large-scale effects correct. Throw away the small brush, and concentrate on using the big one to best effect.

It turns out that the combination of the number of weather observations we can collect, the accuracy of their pressure and temperature measurement, and the power of the computers available to us, mean that we can’t reconstruct weather effects at smaller scales than about 200 miles. (For the Oldweather period – we can do better in the present day). And that’s with the new observations we’re producing; without them in many areas we can’t reconstruct the weather at all.

This means that, in practice, a ship position error of a few 10s of miles is not a big problem, and doing all the work necessary to get more precise positions would hardly help us at all. It’s much more useful to spend our effort on entering more data, and that’s what we’re doing.

So if your ship seems to have taken the M1 (road) out of London instead of the more conventional choice of using the Thames and North Sea, don’t worry – the observations are still useful. Unauthorised excursions to Greenland, Baffin Bay, and other far-flung locations are unacceptable, however, and will have to be corrected.

HMS Goliath: In Her Own Words


For Day 18 of the Zooniverse Advent Calendar, we’re following-on from Day 12 and I have created this image of a Royal Navy ship built up of the words from the HMS Goliath logs – captained by Zooniverse user Roar. There are some great words hidden in here – it makes a compelling read.

You can download the large version (16 megapixels) or the small version (4 megapixels).

Marking time.

Quite a few people have asked why we don’t have to input the time of each weather observation. It’s a sensible question, and we do need the observation times, particularly for tracking fast-changing weather events like moving fronts. But one of the clever features of the the oldweather website is that we don’t have to enter the times – they are automatically collected through the process of entering the weather data.

To do this we take advantage of a symmetry between space and time in the logs (scientists love symmetries). The top of each log page corresponds to the beginning of the day, and the bottom of the page corresponds to the end of the day. So the further down the page an entry is, the later in the day it was taken. We record the position of the push-pin for each entry digitised from the page and from that push-pin position, we can find the time associated with that entry. The image below shows the positions of all the weather observations entered from the logs of HMS Bacchante (thanks captain richbr15, lieutenants dazedandconfused and davemcg, and all the crew).

Positions of all weather observations entered from the logs of HMS Bacchante. Red points are on left-hand pages, blue points on right-hand pages.

Positions of all weather observations entered from the logs of HMS Bacchante. Red points are on left-hand pages, blue points on right-hand pages.

As with most books, there are two sorts of pages: left-hand (red dots) and right-hand (blue dots). They have different margins in our images, so they don’t line up precisely horizontally, but their vertical position is the same, and that’s what gives us the time.

The Bacchante recorded the weather at the end of each watch: so at 4, 8 and 12 a.m., and the same times in the afternoon. They also recorded it at the end of the first dog watch (6 p.m.) – so we should expect to see three equally-spaced groups of points in the top half of the figure (the morning), and four, more irregularly spaced, groups in the bottom half (the afternoon). This is exactly what we see, and it’s clear that for the vast majority of the observations, we can easily say which watch they are associated with, and so when they were taken.

There are a few observations that are not quite so easy – we can see some smaller clusters of observations above and to the left of the main clusters; but again, it’s easy to see which watches there observations correspond to. There are also a few observations in irregular positions – lost in time and space – but these are only a tiny fraction of the total.

So it’s going to be easy to find the time of observation in the usual case where the observations are 2 or 4 hours apart. For the diligent few log-keepers who recorded observations every hour or even more frequently, we will have to be a bit cleverer; and use the differences between the observed weather values, as well as the position on the page, to group the observations into clusters and assign them to times.

All this, of course, relies on having accurate positions on the page for each observation, which means lining up the entry box with the observation text carefully each time when entering the data. So far, we’ve done well at this (as the figure shows); I’ve come to expect no less from oldweather, but it’s still a pleasure to see.

East and west and south and north

Oldweather is now really hitting its stride, with a stream of ships reaching completion and so becoming available for analysis. One of those that has completed recently is the battlecruiser HMS New Zealand, which was named for the country that funded her construction, and carried Admiral of the Fleet John Jellicoe on a tour to India, Australia, New Zealand and Canada in 1919. Her logs from this circumnavigation are those we’re looking at in Oldweather.

Long voyages like this are particularly desirable as a source of weather observations, because the same ship, crew and weather instruments experience and record a wide range of different weather conditions; from the hot calms of the doldrums, through the steady trade-wind regions, to the stormy conditions of the roaring forties. On her circumnavigation New Zealand sees all of these conditions, and the change in weather conditions shows up clearly in her barometer records – Captain toucans, Lieutenants keybasher, jdulak, Cyzaki, and all her crew have provided a good picture of the weather in 1919, over a wide section of the world.

So the records digitised from the New Zealand, like those from the other ships completed so far, have been entered with accuracy and skill; but there is always one component of the data that presents us with more trouble than the others, and for the Oldweather ships, it’s the ship positions. Many of you will have noticed that the maps showing the ship positions as they are being entered sometimes show unlikely or impossible positions – HMS New Zealand’s positions are not unusually problematic, but, because the ship travelled so widely, they make a good example to illustrate the issue. The figure below shows the positions digitised for the New Zealand.

Raw position records for HMS New Zealand: Blue points mark the most popular entries for each log-page, red points mark other entries.

Raw position records for HMS New Zealand: Blue points mark the most popular entries for each log-page, red points mark other entries.

It’s clear where the ship was going, and it’s also clear that sometimes we’re getting the positions wrong. In this case, all the wrong positions have the same cause – we know the longitude, but not whether it’s East or West of Greenwich.
Position of HMS New Zealand on May 7th 1919

Position of HMS New Zealand on May 7th 1919

This image shows a typical at-sea position entry – we can see that the latitude is 2 degrees 17 minutes, and the longitude is 88 degrees 5 minutes, but the letters showing the position as south of the equator (`S’) and east of Greenwich (`E’) are a bit detached from the position entries. The position of these letters seems to vary from log to log – sometimes they are put immediately after the position values, sometimes merely nearby, as here. But either way, please always enter them along with the positions, so this example should be entered as latitude “2 17 S” and longitude “88 5 E”.

Even if we always enter the direction letters, we won’t get rid of all position errors of course – sometimes the letters are missing in the log, sometimes the log-keeper makes a mistake entering the position. It’s important to faithfully reflect the logs, so we shouldn’t try to fix such problems. But please do enter them however they appear on the page.


Get every new post delivered to your Inbox.

Join 1,422 other followers