the Air Vent

Because the world needs another opinion

Updated Flow Chart for Antarctic Paper

Posted by Jeff Id on April 6, 2009

This is the updated flow chart by Jeff C which represents the best guess as to how processing occured for the data in the Antarctic paper. The entire chart represents software code of which the only part which has been released is the little circle at the end representing RegEM.

Dr. Steig should have hired Jeff C for this work but I doubt he could afford Jeff’s rates.


RegEM Estimated Data Flow Chart 4-6-09

27 Responses to “Updated Flow Chart for Antarctic Paper”

  1. Ryan O said

    This is fantastic.
    At some point, someone needs to make Steig realize that he should have afforded Jeff’s rates. 😉

  2. TCO said

    The flow chart is good. I remember criticizing Steve for not having one for Mann. (And of course Mann should have had one too, but still STeve spent a bazillion years complaining about various nuances of Mann that almost none of his audience could disaggregate into area of recon.)[/Steve complaint]

  3. kim said

    When will they ehhhhhhhaver learn? 8 beats.

  4. TCO said

    Kim: Why did you go for the bailout, you RINO vixen? You knifed conservatism in the back. Have fun with Cantor and the other Keynsians. Just leave my Republican party. We don’t want your kind.

  5. Jeff Id said

    Eight beats.

  6. Jeff Id said

    #4 TCO, I don’t know your beef with Kim but since I don’t have my soon to be plagiarized policy created yet. Please keep the comments on topic for the posts. BTW, I kinda lost my temper with Chairman Chu earlier, you might want to check it out.

  7. TCO said

    Here you go, Kim. You RINO scum traitor. Republicans in Congress are part of the problem. Have fun sucking up to the Harriet Mieirs lovers…

  8. TCO said


    If you ask me to behave I will (except when I forget).

    Kim is a little bailout lover. She and McCain knifed conservatism in the back. And now that exactly what I predicted happened, she lacks the balls (but she’s a split tail anyhow) to admit it.

    I know her from JOM. She is very sweet, but not smart. But she tries to tell the other posters to tolerate me.

  9. Jeff Id said


    Kim has left a few comments which somehow catch your attention in such a minimalist fashion… It’s probably true that we could sit and drink beer and complain about government for an entire night without much disagreement but please take it to another thread.

    Jeff C has done an absolutely incredible job of laying out the antarctic process and I’m going to clip every single comment which doesn’t directly relate to it from now on.

  10. Kenneth Fritsch said

    Jeff C, thanks for your efforts. The chart looks just like what the doctor ordered and I look forward to studying it.

    It has always been my contention that we need to see more of these timely reviews in technical blog discussions — knowing full well that most bloggers seem to want to move too fast to do good and comprehensive reviews. I have seen more thoughtful reviews and explanations with the Steig analyses than I recall from the past. I hope it is something that will continue into the future and is simply not something unique to this one topic.

  11. Jeff C. said

    Jeff, thanks for posting this and thanks for the comments. My goal in posting this was as Kenneth suggests; review what we have learned before we forget it. Also note that the stuff on the left side of the heavy dashed line contains some guesswork. I think all the boxes are correct, what might need some adjustment is the order. I have quite a bit more detail on the left side than before, mainly things we figured out while working with the U Wisconsin data set. Any comments as to where I might have things wrong are welcome and appreciated.

  12. Fluffy Clouds (Tim L) said

    Jeff & Jeff,
    great work, but would be nice to have a bigger one or clearer? for us 50s pluss to read.
    (yes i did click it lol)

    (what might need some adjustment is the order. Any comments as to where I might have things wrong are welcome and appreciated.)

    well should there be a line from the cloud data to the enhance box?
    and did you publish the trashed data out puts?
    as i recall ID did do some residuals back a bit.
    are my questions at all any good or am i blowing smoke out my tale feathers?

    OT links

    For Jeff ID —- this is banned.

    we see kim and tco here…lol
    what got me there was a link from rtomes links any way tammy does a good job of obfuscating
    any link between sun earth temp. using frequency/year? for temps?
    kim // April 5, 2008 at 8:16 pm

    TCO // April 10, 2008 at 12:24 am
    am sorry for wandering topically speaking lol
    but can’t help it when it comes to finding the most construed way to hide the most obvious thing.

    good work, we are too late to stop the house/congress/white house/ world….. but the truth is always a good thing to pursue!

  13. Hu McCulloch said

    Jeff C — Thanks for the update. I have a couple of specific questions for you (and Ryan O) before I write Steig and Comiso back thanking them for cloudmaskedAVHRR.txt and asking for more information —

    First, is the raw AVHRR data available from NSIDC through 2006, or just 2004? Their website says it is only available through 2004. Or can one get this data from U. Wisc? Or did Steig and Comiso have data you can’t get?

    Second, do you have any idea how missing months were infilled to arrive at cloudmaskedAVHRR.txt? Was there a preliminary RegEM step here? Wouldn’t it have already reduced the rank of the matrix, contrary to the full rank of cloudmaskedAVHRR.txt?

  14. Jeff C. said


    “well should there be a line from the cloud data to the enhance box?”

    No, I don’t think so. The “enhanced” step doesn’t used cloud data but compares the temps to the “climatogogical mean” (Steig’s exact words). That is why I show a monthly mean calculation in the box above the enhanced step.

    The “combine four cells to one” and “remove ocean cells” may happen further down the line instead of right before the “enhanced” box. I don’t think that will affacet the output.

    As far as the residuals being discarded, Jeff Id’s post where he used 20 PCs as input to RegEM shows what happens when you keep those residuals.

  15. Jeff C. said


    Hu, I believe you can get the data up to 2006 from NSIDC by special request. Ryan has been assembling all of the daily data. It is huge, he said it was over a terabyte in size. UWisc only has data online through 2004, but they may be open to a special request. The UWisc data is really a goldmine as it contains many other parameters beyond temperature. Studying them has helped me understand how the processing algorithm works.

    I don’t know how Dr. Comiso infilled the missing months at the end of 1994. I haven’t found this discussed in any of Comiso’s papers. It is a good question for him. If you do write him, please also ask how they deal with cloud masked days when calculating a monthly mean.

    For example, if 20 days out of 30 are discarded as cloud cover, do they calculate the monthly average using only the 10 remaining days? Or do they infill the missing 20 days somehow and calculate the average?

    This is important as some cells have 85% cloud cover regularly. That leaves only 4 or 5 days a month of actual measured data for calculating the monthly mean.

  16. Layman Lurker said

    #15 Jeff C.

    “For example, if 20 days out of 30 are discarded as cloud cover, do they calculate the monthly average using only the 10 remaining days? Or do they infill the missing 20 days somehow and calculate the average?”

    Since the NSIDC data has already had the standard cloud mask correction, is it realistic that there would be a further 20 days discarded due to +/- 10C?

  17. Fluffy Clouds (Tim L) said

    thank you jeff,
    No, I don’t think so. The “enhanced” step doesn’t used cloud data but compares the temps to the “climatological mean” (Steig’s exact words). That is why I show a monthly mean calculation in the box above the enhanced step.

    more interpolation, good call.

  18. Paul Penrose said

    At a minimum there should have been a flow chart like this in the paper. Ideally all this preprocessing code that feeds RegM should have been in the SI. How they let them publish without even that minimal information, I don’t know, because the conclusions are impossible to evaluate with it.

  19. Dev said

    I have technical suggestion for improving chart readability:

    Jeff, please re-save the source chart in GIF format instead JPG. The JPG format is great for reducing filesize with color and halftone graphics, but its compression artifacts render line art and pure text horribly fuzzy. Saved as GIF, the filesize for pure line art is reduced tremendously, and sharpness and edge definition is retained.

    Do this for both the preview 600×350 and fullsize 1000×800 web versions.

  20. Earle Williams said

    Jeff C,

    Nice work! I second the request to repost the image in a different format. I don’t know if the GIF patent has expired, but an alternative to GIF is the PNG format (portable network graphic). You definitely do not want to use JPEG for line art graphics, photos and scans only.

  21. Hu McCulloch said

    RE JC #15, Layman L # 16,

    Are there some cell/months when every day is cloud covered, so that just averaging the non-missing days won’t work?

  22. Hu McCulloch said

    RE JC #15, Layman L # 16, Ryan O,

    Which months are missing at the end of 1994? Are they missing entirely, or just some cells?

    Are there some cell/months outside this range when every day is cloud covered, so that just averaging the non-missing days won’t work?

    PS: While GIF might be better for future editions of the chart, I find that I can read this one well enough, just by printing it full page in Landscape mode. (With glasses on, to be sure!)

  23. #22. Hu, the last 3 months of 1994 are entirely missing in the UW dataset.

  24. Jeff C. said

    #22 and #23

    This is a good point. Aside from the last 3 months of 1994 where everything is missing, we should have scattered missing points of various month/cell combinations. Coastal Antarctica has cloud cover around 80% of the time. Surely there are some periods where the clouds never lifted for the entire month. How are these handled? Other months may have only 1 or 2 clear days. Do they calculate a monthly average using so little data?

    The might just infill with the mean for that month, that should be easy enough to check for Oct-Dec 1994.

    Re #16 – yes, this will further reduce the number of days thus adding additional complications to calculating an average.

    Up in comment #14 I said Dr. Steig’s words were “climatogogical mean”. Obviously a typo with one too many “g”s, but for the record, Dr. Steig’s phase was “climatological mean”.

  25. Kenneth Fritsch said

    The flow chart provided by the inputs of Jeff C and Jeff ID has reminded and helped me to put my thoughts away from the trees and back to the forest in the Steig et al. reconstructions.

    Now I feel obligated to attempt to put in layperson’s terms what I think Steig did (from my (mis)understanding of the Jeff C flow chart) in the TIR reconstruction and then ask what was done for the AWS reconstruction.

    For TIR (AVHRR) the 1982-2006 AVHRR data for all the measurement grids are cloud and missing data adjusted and reduced to 3 principle components (the first 3). We then have all the grid AVHRR data in a convenient reduced form for the 1982-2006 period. This part is easy to comprehend generally if not the involved mathematics and statistics. Important to note is that the spatial PC coefficients are provided at this point and that that derivation is not real clear to me.

    We then have the 42 Surface Station data to consider (which are different than the 63 AWS stations) for the entire period of some their existences from 1957-2006. For the overlap period of 1982-2006 of the surface stations and AVHRR PC1, PC2 and PC3 data, a correlated reconstruction can be constructed whereby the surface stations are related to the AVHRR PC grid data. I assume part of this period was used for construction and part for validation of that construction. RegEM is applied for this process to correlate station data to grid PC1, PC2 and PC3 data for the 1982-2006 period.

    The combining of the above information with the PC spatial coefficients is the least clear to me. The final output shows me that the reconstruction covers the entire 1957-2006 period and does not provide a reconstruction whereby, for the 1957-1981 period, the reconstruction is used in conjunction with the AVHRR adjusted PC data for the 1982-2006 period without using the reconstruction process. Also obvious is that the reconstruction generates another PC 1-3 time series after the combining of the surface station data with the initial PC1-3 series.

    I need my assumption to be clear or clarified that the controlling temperatures in this reconstruction are from the surface stations and that the AVHRR data in PC1-3 form are used to spatially relate the surface temperatures to the AVHRR grids during the 1982-2006 period for later application in relating the 1957-1981 surface station data to the AVHRR grids. To me this would mean that the relationship of the surface stations temperature trends to the AVHRR grids must remain essentially constant for the 1982-2006 and 1957-1981 periods for the reconstruction to work. Does the relationship hold reasonably well during the overlap period?

    I know that most of the effort has been focused on the TIR reconstruction over that for the AWS reconstruction, but I would think a flow chart on that simpler reconstruction might shed more light on the TIR reconstruction and the trend differences between the 2 reconstruction for Antarctica overall and its three regions.

  26. Jeff C. said


    I think your explanation is close, but I have difficulty following some parts. Let me try and explain what I think is going on and how the AWS and TIR recons differ. I apologize if some of this is obvious, but it might not be to others.

    The AWS is much more straight-forward than the TIR recon. The 63 AWS series and the 42 occupied station series are dumped into RegEM. Despite Steig’s description of the occupied station as the predictor and the AWS and the predictand, RegEM has no idea which is which. It sees 105 series with missing data points and infills accordingly based on the correlation of the series. The “regpar” option sets the number of principal components used in the infilling. Based on Steig’s method summary, it appears regpar should be set at 3. RegEM iterates until it reaches the stagnation limit or hits the maximum number of iterations limit. The final product is 105 completely infilled series. The infilled 42 occupied station series are discarded and the infilled 63 AWS station series form the AWS recon. Both Jeff and I have duplicated this process and the results are a very close match to Steig’s AWS recon.

    In the TIR recon, the cloud-masked satellite data is first run through a PCA. The output of the PCA is 300 principal components (PC-the time series with 300 entries) and 300 eigenvectors (EV-the spatial coefficients with 5509 entries). If we wanted to get back to the original input data time series for the first cell we would do the following:

    PC1*EV1(cell 1)+PC2*EV2(cell 1)+PC3*EV3(cell 1)+….PC300*EV300(cell 1)

    Likewise, if we wanted to get back to the temps for all cells for the first month:


    What Steig has done in his recon is extend the PCs back to 1957 using RegEM. Then he plugs the extended time series into the equation above using the spatial coefficients from the PCA. Of course, he limits the number of PCs used in the equation above to three, discarding the information contained in PC4 through PC300. This is a completely separate issue from the “regpar = 3” setting used in RegEM when infilling. Steig’s method summary does not make clear this distinction (I think deliberately) when describing how the reconstruction was put together.

    Steig extends the PCs back to 1957 by dumping the 3 PCs into RegEM along with the 42 occupied station series. RegEM then infills the 45 series (3 plus 42) with no idea which is which. The 3 PCs are complete from 1982 to 2006, so RegEM does nothing to them in this period. The 3 PCs are blank from 1957 to 1981, so RegEM infills them based on the correlation with the 42 station series during the 1982 to 2006 period. In an identical fashion to the AWS recon, RegEM is set at regpar =3 meaning that 3 principal components are used during the infilling. Also like the AWS recon, the 42 infilled station data series are an end byproduct of the process, but they are discarded.

    The whole idea of using the original spatial coefficients with the extended time series seems flaky to me. I’m not saying it doesn’t work, but it seems like you need to prove it works before you herald the results as groundbreaking. The nebulous description and dancing around what they did in the methods summary makes me think they knew it was an iffy process.

    In addition to that, the reduction to 3 PCs is problematic, as is using regpar=3 for the infilling. Jeff had a post a few days back where he used 20 PCs as the input to RegEM and had regpar = 10 (the highest RegEM will work). The trend went to nearly zero.

  27. Kenneth Fritsch said

    Jeff C, thanks much for your detailed reply above. My initial understanding of the TIR reconstruction for the 1982-2006 period was as you outlined in your most recent post on this thread. I thought that my understanding of your flow chart had contradicted my initial understanding and so now I feel I am back on track.

    Your explanation of the AWS reconstruction calibration where the 42 surface station and 63 AWS stations results are thrown together for a RegEM treatment provides a new insight into that process for me.

    I think when one can get one’s mind around the entire process, without having to go back to individual analysis, the understanding of new analysis and the importance of the results becomes less difficult. Your flow chart and post above have helped me get closer to that level of comprehension.

    I now need to go back to the original Steig paper in order to check the authors’ verification processes and any sensitivty testing they did before posing any questions here about those issues.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: