Support Our Work
Blogs Section
Thoughts on everything from climate modeling to energy policy.

Researchers Work to Create New Surface Temperature Datasets

Left aligned Images Title Goes Here

Guest Post by Peter Thorne

While current data assists us with understanding global temperature trends, there is a growing need to address questions about how climate is changing regionally; how the frequency and severity of extremes such as the recent Russian heat wave are changing; and, crucially, our degree of certainty in these changes. It is not as if we are starting from scratch. There exist several regional analysis products (e.g. the European Climate Assessment and Dataset) and also output from reanalyses, but we can and should do better. 

As the study of climate science increases in its importance for decision and policy making — decisions that could have multi-billion dollar ramifications – the expectations and requirements of our data products will continue to grow. Society expects openness and transparency in the process, and to have a greater understanding of the certainty regarding how climate has changed and how it will continue to change.

This was not the case twenty years ago when the original global analyses were constructed. In today’s age of terrabyte hard drives and 20 megabyte-plus broadband it is easy to email the global surface data archives around the world, whereas 20 years ago it needed specialist IT solutions at substantial costs. It is all too easy for us to forget how substantial a technical challenge this was 20 years ago, and how cutting edge and laborious these efforts really were. We should be grateful to these pioneers, rather than critical of them.

For several years it has been recognized that we need to pay renewed attention to attempts to characterize changes in temperature at the earth’s surface. The century timescale global datasets that already exist — NOAA NCDC, HadCRUT3, and NASA GISS — are available only as monthly average temperature anomaly products, and primarily provide for analyses on large spatial scales. They have served us well in answering the question of whether the globe has been warming. That the globe is warming, as indicated by these datasets, is supported by changes in many other indicators from the top of the atmosphere to the depths of the oceans, all of which are expected to change in a warming world.



Key observed indicators of warming global temperatures. Credit: NOAA: State of the Climate in 2009.

In early September, 80 climate scientists, statisticians, measurement scientists, economists, and software engineers gathered at the UK Met Office in Exeter to formalize plans to create a suite of temperature data products to meet the growing need for climate information in the 21st Century. The meeting, held under the auspices of the World Meteorological Organization, the World Climate Research Program, and the Global Climate Observing System included representatives from every continent.

The meeting kick-started a multi-year process to create a verifiable suite of data products to meet these 21st Century requirements. This suite of products will include renewed efforts to estimate century scale changes to complement the existing analyses, but more importantly, products at daily and sub-daily resolution with greater coverage that can support decision making from the regional to the local level. It will also be open and transparent to the greatest extent possible, and include a common benchmarking and assessment exercise to improve the understanding of scientific uncertainty. 

Some outcomes from the event included:


Currently, there is no single repository of all raw data from surface weather stations. Participants agreed to make strenuous efforts to create such a databank. This databank would have version control and efforts would be made to ascertain the provenance of the underlying data. Several stages were envisaged in this databank going (wherever possible) from the data source to a collated digital database in a common format. This way, investigators could drill down to the actual data as originally recorded. Associated with this would be metadata — data describing the data — documenting changes in temperature recording station siting, observation practices, etc. to the extent feasible. Some of this effort would simply entail combining freely available databases — many of which were offered at the meeting. These will constitute the first version of the databank. But the majority is much harder as discussed below.

For many data rights holders, including most meteorological services outside of North America, the data hold substantial intrinsic economic and/or geopolitical value. Although WMO Resolution 40 was envisaged to enable the free exchange of data, in reality numerous exceptions are applied to historical archives. So, a lot of data that exists in digital form currently cannot be truly freely exchanged. In many cases the rights holders are happy for investigators to produce gridded products, but not to distribute the underlying station data. This has the effect of breaking the traceability chain, and makes things more difficult for researchers. This was and remains the case for an unknown subset of the HadCRUT product.

Participants at the meeting recognized that substantial work on individual approaches on an institution by institution basis will likely be required to reverse this situation, and that the economic benefits to national economies of truly free data needed to be quantified and communicated very clearly. However, this will take time. In the meantime, participants agreed to investigate "open" and "private" areas of the databank, recognizing that although that is undesirable, the alternative of not having data at all is even less desirable.

There also exists a lot of data, either in hard copy or digital-image-only format. This amounts to several million images, rescued as part of The National Climatic Data Center's (NCDC) Climate Database Modernization Program, and available online in NOAA’s Central Library of foreign data. There also remain at least 2,000 boxes held at NCDC and countless other known and unknown repositories and archives in both digital image and hardcopy forms around the world. Participants recognized that there is a lack of resources to digitize these through traditional means. Participants were excited by the prospect of undertaking crowdsourcing digitization through, for example, zooniverse (see for a beta project to be launched, looking at digitizing World War One ship logs from the U.K.).

A depiction of how the new surface temperatures database might be structured.
Courtesy of Peter Thorne. (Click to view larger image in a new window)

Data products

Participants agreed there is a need for multiple redundant efforts at dataset creation, and that no single data product or methodology could suit all envisaged applications in the era of climate services. Substantial support will need to be made available to enable multiple independent groups to look at the problem from many different angles and consider changes across the full range of spatial scales from the individual station to global levels, and timescales from individual observations to century scale changes.

Given the wealth of data, participants recognized that for truly global analyses, fully automated procedures would be needed to ascertain and adjust for issues in the data.

Critically, this creation of data products should not be solely within the purview of climate scientists — expertise from other fields can bring critical insights. Participants agreed that audit trails and disclosure of code is essential in the effort to create open and transparent data products.

Benchmarking and Assessment

To ascertain the fundamental quality of the data products that are produced, the meeting’s participants agreed that a common performance benchmarking system is necessary. Put simply — such an exercise should ascertain whether the algorithms used to create the data products are doing something sensible, and should indicate their strengths and weaknesses in extracting truth from data that we know to be imperfect.

The real world does not afford us the luxury of being able to undertake a 100 percent accurate assessment, as we do not know the truth, and if we did we would have moved on to other things! But we can create suitably realistic alternative “worlds” where we are afforded such a luxury of knowing the required solution which allows such an assessment. This requires the construction of direct analogs to the databank, as we cannot know in advance what stations and periods research groups will choose to consider, and it is important to benchmark every effort. These analogs would have similar statistical properties to the real world data. When the dataset creators run their approaches on these analogs, it enables an objective assessment of their strengths and weaknesses.


To avoid over-tuning, several analogs that differed systematically should be produced and it was seen as desirable to have a range of complexity from over-simplistic to harder than we believe the real-world issues to be. The ethos being that we want to ascertain at what point different approaches break down. It was recognized that the benchmarking exercise should be double blind as is common in medical trials to build confidence in the process. Finally, it was decided that this exercise should be cyclical, as the value of a given benchmark will rapidly decrease with its age — with assessments at the end of each cycle that bring together the dataset creators and benchmark creators and assessors. It was recognized that such an approach would engender scientific benefits as well as greater buy-in.

Analysis and visualization

Without creating guidance and tools for experts and non-experts to use the data products produced by the effort, it will arguably fail. Users will require a decision tree that can guide them to the best products for their applications, and the associated uncertainties. For many applications, it is also important to create spatially complete products from the spatially incomplete network. At the meeting, participants agreed to develop a similar benchmarking and assessment exercise for constructing these spatially complete products.

The first phase of this effort focuses on characterizing centennial scale climate change and variability at global and regional scales. This reflects the greater maturity of analyses that consider monthly rather than daily or sub-daily resolution data, but at the same time recognizes that we cannot duck from this challenge. This also provides a focus from which to start on the databank effort. As these issues evolve we will provide updates on the blog and through the peer reviewed literature and technical documentation.

In conclusion, the potential exists for a step change in our ability to understand and characterize historical surface temperature changes. But this will only be successful if people become involved. So, please, consider how you can get involved — be it digitizing, creating datasets, creating funding opportunities or any other aspect of this considerable challenge.

Peter Thorne is a climate scientist at the Cooperative Institute for Climate and Satellites at the National Climatic Data Center in Asheville, North Carolina.


2017 U.S. Temperature Review Several states had one of their hottest years on record in 2017.

View Gallery