About GHCN Temperature Data

A post below this one presents a comparison between unadjusted and adjusted GHCN temperature data. This article provides some background information on these data and how they are treated in Temperature Trend Analysis methodology.

GHCN acts as a global repository for surface weather station records submitted to it by National Weather Services (NWS). Each NWS reviews local records according to their procedures and certifies that the data accurately represent the weather experienced in their jurisdiction. The GHCN version 3 qcu file is composed of these data (qcu signifies quality controlled unadjusted.) Various bloggers, ranging from E.M. Smith (chiefio) to Nick Stokes (moyhu) are satisfied that this file is close to the data submitted by NWS agencies.

The quality control consists of attaching flags to values appearing in the file. Because my home computer has limited power, I worked with the Taverage monthly datasets. There a monthly value is flagged with an “a” if 1 daily value is missing in calculating the average, “b” is 2 dailies missing, and so on up to 9 omissions. 10 or more missing dailies and the month is assigned a “-9999”, indicating a blank for the month. An additional column beside each month identifies outlier values.

My principle is to include all data unless there is good reason to exclude. The data preparation procedure involves unzipping the downloaded file and opening it as a word document. The station records of interest are copied into a new word document, which my notebook can handle without processing delays. The text data is then put into an excel workbook, spread into cells, -9999s converted to blanks, flags and additional columns are removed.

My data quality assurance practices include scrutinizing each value greater than 2 Standard deviations away from mean. I use CUSUM and first differences to test for step changes in the record, which would suggest a non-climatic change in the data (e.g. Change of equipment, procedure or location). In the US CRN#1 dataset I found no step changes, and the outlier values were few. I tested excluding some high or low values, but found no discernible effect on the slopes

The same procedure was followed for the qca file (quality controlled adjusted). This meant adding two additional sheets into each station’s data workbook, examples of which are provided through links below.

In the US CRN#1 unadjusted workbook, there is a sheet for each station with the data pasted into a template that calculates several measures. The basic analysis is to compute the slopes for each month (Jan, Feb, etc.) over the lifetime of that station. The 12 slopes are then averaged for the station trend. In addition, trends are calculated for several shorter periods of interest, again by combining 12 monthly slopes for that station period.  A summary page brings together results from all the stations and generates averages of trends for the set of stations, by months, and by periods of years.

Data workbooks for two stations are provided here:
350412 Baker City, Oregon 350412
417945 San Antonio, Texas 417945

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s