the Air Vent

Because the world needs another opinion

CRU Howto

Posted by Jeff Id on December 24, 2009

CRU data is available from this page.

I’ve made some progress finally with Perl. It’s always a pain learning a new language and Perl is no different. I’m going to document here how to get the original version of CRU running so anyone can do it.

First, you need an interpreter for perl. This is the code which understands perl commands and runs them. After a bit of searching, I chose Strawberry Perl, it’s free so I downloaded and installed it. Perl runs from the command line interface which is far too crude for my liking. I found a couple of debuggers and settled on Perl Express. This allows a gui with breakpoints to get things running.

If you load the script for creating the gridded CRU data, into Express and run it, it gives an error. You need to point the interpreter to your perl.exe file. The error gives easy instructions on how to do it.

You need to download the zip file into a folder and unzip.

Perl prints and reads data from a STDIO channel naturally. This took a bit to figure out so I’m not going to describe the endless (and stupid) details I went through to make it work. Download the file station gridder.perl from CRU and run it in Perl express. The output is completely blank for a long time, it’s like nothing at all is happening, then things finally start printing out to the std output window. After the program completes, you go to the std output window and use the save as icon to put it into a text file. I named it gridded data.txt.

The next step to getting a global mean is to open a second script called make_global_average_ts_ascii.perl. This file was a bit of trouble for me. It is supposed to read from stdin so I specified in the standard input window the gridded data.txt file. The program just kept freezing on the read line. No message, no indication of what the problem was. Eventually (after a couple hours of messing around), I modified the file to read from the folder rather than stdin.

The modified code is shown below. I copied the whole beginning of the file, although the open line and the line directly after are the only modifications.

# Global average calculated as (NH+SH)/2

use strict;
use warnings;

# Extract the best-estimate area averaged time-series for the selected area
my @Date;
my @Best_estimate;
my @Best_estimate_field;

open FILE, “C:/cru/gridded data.txt” ;
while ( my $Field = read_month( \*FILE ) ) {

if ( $Field->{month} == 1 ) { push @Date, $Field->{year}; }
push @Best_estimate_field, $Field;

This script outputs a time series for  global temperature average.

32 Responses to “CRU Howto”

  1. Jeff Id said

    If someone knows how to get the stdio section to work properly, it would be nice to know.

  2. Kate said

    Please go to PBS here and post a comment regarding Jim Lehrer’s interview with Obama.

  3. Jeff, what operating system are you using? Or are you using Windows instead?

  4. Jeff Id said

    This laptop is WinXP 32 bit.

  5. Richard Sharpe said

    To enter data interactively you need to use the Interactive I/O field in the Std. Output tab.

    When I did that, this simple script did what I expected:

    while ( )
    print $_, "\n";

  6. Richard Sharpe said

    BTW, I am using XP on this system as well.

    Looks like a limitation of Perl Express in that you cannot paste lots of lines into the Interaction I/O field.

    Better to go back to command-line mode where it should work 🙂

  7. Richard Sharpe said

    Hmmm, got bitten by html problems. That simple program was:

    while ( < > )
    print $_, "\n";

  8. George said

    An easy way to get perl (and many other programming/scripting languages) is to install a cygwin environment ( ) which basically gives you a linux environment on a windows computer. You can install perl, python, tcl/tk, c, c++, all sorts of stuff.

  9. Mark T said

    At least it’s not lolcode:

    VISIBLE “You said ” N VAR N ” !!”


  10. Kenneth Fritsch said

    Jeff ID, I am attempting to determine what you have specifically in mind in extracting the data that you have shown in this thread and what your purposes are for doing it. I extracted from KNMI (linked below) two sets of data: for CRUTEM3 and CRUTEM3+HadSST2 which represent the globe for land only and sea and land, respectively. In order to obtain the range of years that you extracted I had to lower the percent valid points to 10%.

    I show those sets of data graphed with your set in the link below and I get a reasonably close match for CRUTEM3 for the globe.

    Are you attempting to test the validity of what CRU (or the Met Office) has posted?

  11. Jeff Id said

    #10, I want to check the individual temp stations used against as raw a dataset as I can. I also would like to do a different grid average than they provide.

    Did you create your plots using the perl code?

  12. Kenneth Fritsch said

    Did you create your plots using the perl code?

    Jeff ID, if that is a snide question I would say that I deserve it and it reminded me to compliment you on how fast you are able to get up and running with a program new to you. I am continuing to learn R and am at a point where I can do most functions even if sometimes is by brute force and not at all pretty.

    I downloaded the KNMI data into Notebbook and then into Excel and plotted from there. If I had to download a lot of information and do some serious statistical analysis I would use R.

    I hope you picked up on the fact that data going back as far as we went uses only 10% of the valid data points and thus requires a lot of infilling. The KNMI default is 30% valid data points and that will not get you close to 1850. What I like about KNMI is that I can readily obtain the number of stations and locations of stations used in the temperature series along with the % valid data points – and do it all in a matter of minutes for any region of the globe. I don’t need no stinkin perl.

  13. PeterS said

    The chart implies a 1C rise over the last 20 or so years – I just don’t believe it especially when you consider the Antarctic area is at or close to it’s maximum extent over the same period, and that many parts of the Northern hemisphere are experiencing some of the coldest weather for a very long time. It just doesn’t add up.

  14. Jeff Id said

    #12, It’s not even a little snide. Rather it was a recognition of your abilities. I was hoping to learn a bit – which I did.

    I do realize the old data is a mess but hadn’t gotten to the actual percentage of infilling yet.Your plots are almost the same as the CRU stuff. Did you use the same station set or something else?

  15. NikFromNYC said

    You made a Hockey Stick.

    Single records rarely show hockey sticks:

  16. Bryan H. said

    I hate to say it honestly I think this release of code is no more than a PR stunt. Perl? Honestly? Why would you do any sort data crunching in perl? You don’t, you would barely even do it in Python.

    All the CRU has done was had one of their web code monkeys (or interns) write up a quick gridding routine in the language they are most familiar to publish.

    I believe their climate models and gridding routines are still in Fortran (I’m not even convinced the CRU even does math on data anymore). If they have been ported to a new language it’d most likely be C or Matlab. Science and engineering software development is not all that different, except scientists tend to be sloppier coders.

  17. Mark T said

    MATLAB, indeed. I’m biased, myself, because I’ve been using it since the late 80s. 🙂


  18. hpx83 said

    #16 : Agreed, Perl is (at least was at some point) primarily a text-processing language – old linux web nerds tend to know it because you wrote cgi-scripts for “dynamic” content in webpages in Perl once-upon-a-time. But who knows – you could probably write it up in QBasic if you really wanted ….

    IMHO, the main “goal” at this point should probably be to find where those darn hockey-sticks are coming from – because it is pretty obvious that a few rogue thermometers could wreak havoc with a global average that is derived from a gridded dataset that is derived from a station-based dataset….? Antarctica was indeed interesting, the question is – how many more interesting things are going on? I think we are looking mainly at a combination of a few conscious acts of “adjustments”, inter-mixed with large quantities of “duh, didn’t think of that …”, ergo a large number of the “hockey-stickiness” may not have been intentional, but due to flaws in scientific methods etc.

    Since people who are quickly teaching themselves the basics of checking data anomalies are popping up all over this CRU scandal – this seems like a good time to try and give the climate science community a mathematical “peer review”.

  19. mrpkw said

    Love your blog, but this non engineer’s head is spinning !!

    Great work though !!


    Dont need to understand it.Big al Gore has made it nice and simple for us

  21. carl said

    no need to modify the code at all if you are using either linux or using something like cygwin or minGW to run linux under windows

    simply do this:
    put the CRU data in your working directory with the scripts then running this command will output a file with the yearly temp averaged data:

    ./station gridder.perl | ./make_global_average_ts_ascii.perl >> out.txt

    then you can graph or chart it with whatever you like

    that little character in the middle “|” is the pipe command which sends output to the command following it–standard bash fare on linux 🙂

    works in cygwin and linux, I have tried both

  22. Not Sure said

    STDIO redirection should also work on Windows:

    Might also be useful:

    (I have no experience with PERL express)

  23. Emil said

    “cygwin environment” best idea: you get perl and a decent shell too.

    “Why would you do any sort data crunching in perl?”

    … well, perl had decent OO with Moose, and quite a lot of tools for large datasets, matrices, statistics, infinite precision computations (kind of like BigDecimal in Java, only a lot faster and better designed), bindings for Gnuplot, graphics etc. Check out PDL for matrices (if I am not mistaken, has support for those weird NetCDF files published by a some .gov agencies), Math::GMP for fast decimal calculations.

    Question is why would anyone use Fortran for dealing with datasets with a few thousand lines only …

  24. John Blommers said

    A good workman NEVER blames his tools.

    Please let’s have no Unix/Linux bashing. If we’re analyzing temperature data and reveal a basic lack of understanding of how to use our tools, then the results will not be credible either. Even experts have made hockey sticks using inexpert analysis methods.

    Perl/Python/Bash whatever are fine super stable tools that anybody can use on their favorite platform. R seems to be a very good choice as it is cross platform and GUI-based.

    Merry Christmas!

  25. […] CRU Howto I’ve made some progress finally with Perl. It’s always a pain learning a new language and Perl is no […] […]

  26. denise said

    #20 That’s easy for you to say!:>) I think Big Al Gore is a big wasteful ass and has a lot of nerve. OOPS my bad. Sorry.

  27. Carrick said

    Jeff, nice post and hope you had a good Christmas.

    This code is pretty low-level as you probably have noticed. Just 400 line of code total, and nearly half of those in make_global_average_ts_ascii.perl, which just does a cos-weighted average of the gridded data.

    This is my understanding of the function of the code. Let me know if you see anything wrong with it:

    The gridded average is only over those 5°x5° sectors that have any data for a give year. There is no effort to interpolate into empty regions that contain no data, using e.g., a RegEM approach, and contains no area weighting factors within a 5°x5° sector. Thus if you have 5 stations within 10-km (hypothetically), and one station at a larger range, all of the data are combined as if they have equal weightings, where as in fact, the 5-stations are (or should be) measuring the same thing, and should be effectively treated as one station.

    This of course can be automatically handled with a bit more intelligent algorithm.

    The other thing to be addressed is the correlation length as you move “up” or “down” in latitude versus “right” to “left” in longitude. I would expect the correlation length to be much larger east-to-west than north-to-south, as a result you should use a gridding that reflects this, like 5° E/W x 2.5° N/S.

    Any ideas on how that should be properly handled?

  28. Carrick said

    Also, for anybody who hasn’t parsed all of this, this is a subset of the data, not the entire set (many missing stations are from the US, so it’s virtually certain they have no nondisclosure agreements associated with them).

    It’s the land temperature record only, includes the “valued added (now with zesty lemon)” data only.

    There’s nothing special about any of this code, many of us could have rigged that up in an afternoon.

  29. Bryan H. said

    #27: Convert degrees to spherical coordinates and recompute (considering the sources of error throughout the whole system I doubt the eccentricity of the earth is enough to make that much of a difference).

    Of course you could always map to an oblate spheroid that is more representative…but that adds a lot more terms and who can be inconvenienced with that much math?

  30. Kenneth Fritsch said

    Jeff ID @ #14:

    What I get from KNMI is a temperature data set like CRUTEM3 that I specify, and then with further criteria requests I obtain any bounded latitude and longitude and for any period of time. You can set the % valid data points you want to use for extracting a series.

    The historical numbers of stations used in the CRUTEM3 series and obtained from KNMI, are given in the link below:

    From KNMI you can obtain information for the actual stations used for GHCN adjusted with latitude, longitude, population, start and end dates and time in operation. I assume these are the data used for the CRUTEM3 series. This information is not in the more convenient table form but instead listed as text for each station. I have copied some R code that RomanM used recently that I think can be applied to getting the KNMI text information into a table. For the globe I get 4771 stations that were used historically for at least 10 years time for GHCN adjusted and 7279 stations for GHCN (not adjusted).

    I would like to compare data extracted from KNMI to what others have obtained, like you have here, by more direct means to assure myself that KNMI, with its convenient buttons to be pushed for obtaining data, can be reliably used.

  31. […] to CRU Figure 3, which seems to provide initial verification of CRU accuracy. Remember, in a previous post, we recreated the CRU gridded average from the code and data presented. – Not bad I think. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: