the Air Vent

Because the world needs another opinion

Steig’s Code

Posted by Jeff Id on February 5, 2009

justice

I have sent several requests to Real Climate requesting when the code would be released for the paper — Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year paper AKA Steig et. al.

I wonder if you know when the data and code for this will be released. If it has, where can I find it?

It doesn’t matter to me if the antarctic is warming or not, but I would like to know the details of this study. I’ve read the paper and SI and it isn’t exactly chock full of detail.

It was of course cut so I followed it by:

If you wouldn’t mind encouraging your colleagues to publish the data and code used, the review process may gain you considerable support.

I for one wouldn’t be surprised to find the Antarctic was warming, but I need to see the calculations used in order to trust the result. If it looks reasonable, there’s nothing wrong with that. That’s exactly what my blog will say.

Cut again so I asked again…………………………..

gavin,

After having so many reasonable comments cut I need to add something.

You may find working with me instead of actively suppressing my questions to be less troublesome, my blog is more popular every day.

All I really want to do is understand, Mann08 deserved every criticism I leveled at it (and more), you couldn’t force me to put my name on it. It’s rather unfortunate that it was the first climate paper from which I looked at the data, I understand now that despite the high profile of Mann, most papers are better quality but how am I supposed to react to a high profile climate paper like that?

This is a different paper and a different problem. As I have attempted to say, it has every potential for being accurate. Let it out in the light and let’s see.

I realize this will also be cut, but consider my words I do honor them.

Ok, how on my knees was that. Well Eric Steig put this comment up to someone elses question shortly afterward.

[Response: ALL of the data that were used in the paper, and EXACTLY the code used in our paper have been available for a long time, indeed, long before we published our paper. This is totally transparent, and attempts to make it appear otherwise are disingenuous. This has always been clear to anyone that asked. If you wanted to do the work yourself, for legitimate reasons, you could do so. If the point is to “audit” our work, it makes no sense whatsoever to provide all the intermediate products used in our analysis. That would defeat the purpose of the supposed “audit”.–eric]

Look at how he words that paragraph, polyscientician style all the data and exactly the code. Not ALL the code. A few comments later he puts this beauty in the thread.

I released an electronic version of our data and links to all the original data and code almost as soon as our paper was published. Anyone paying attention would know that. Releasing it earlier would have broken the embargo policy that I signed in agreeing to have my work published in Nature. The embargo policy is designed precisely to avoid the rampant speculation that we were seeing on the web, even before the paper was published. To wit: “This [policy] may jar with those (including most researchers and all journalists) who see the freedom of information as a good thing, but it embodies a longer-term view: that publication in a peer-reviewed journal is the appropriate culmination of any piece of original research, and an essential prerequisite for public discussion.”-eric]

Here’ s the link Eric pointed to for his code. It looks like this.

code-for-antarctic-reg-em-page

It’s the damn manual for RegEm (a nice link to the original RegEm paper for those who are interested). But it isn’t the code. There’s no code for the antarctic here, these are functions in a manual. Yes the function code is here but there is no description of how the functions were used.

Ok, I’m madder than hell by now, my good Irish temper is taking over so this is what I posted.

You know, if you claim you archived the code, you could actually archive the code rather than a link to the functions which may have been used in the aforementioned code.

Over the last couple of days, I’ve seen Eric accuse people of everything short of fraud including Steve McIntyre who does everything in the open. In the meantime my reasonable requests for data and code have been cut from the threads repeatedly. This man is intentionally deceptive about his openness and he’s doing it pretty effectively because nobody at RC made it through the thread who had questions and because I was actually had to disagree with by smart people at CA who believed Eric actually had archived the code by his link.

Dr. Steig, put up or shut up. Enough bull about the posting all the code and the data (including the satellite data). I will call you out to no end and people will know the truth. If your paper is good, they will know – I promise. If not, I also promise.

——————–

Update, I got through. I did I really did. I sent this comment to Real Climate.

4 February 2009 at 11:21 PM

A link to my recent post requesting again that code be released.
[edit]
I believe your reconstruction is robust. Let me see the detail so I can agree in public.

[Response: What is there about the sentence, “The code, all of it, exactly as we used it, is right here,” that you don’t understand? Or are youasking for a step-by-step guide to Matlab? If so, you’re certainly welcome to enroll in one of my classes at the University of Washington.–eric]

I replied again like this.

I’d love to take your class, but I’m busy running a company.

You point to the RegEm manual for use. I was hoping you could show the setup and implementation used. i.e. The actual code you used. I’m sure you would agree, it is pretty important as far as understanding the result. Also the satellite data set.

If you present this as used and it works as advertised, you’ll find yourselves with a big pile of supporters instead of the ridiculous situation we have now. As I have repeatedly stated, I believe your result simply because warming is true everywhere else.

Jeff

———————–

Banned again Real Climate.

[edit — thanks for your support dhogaza, but I’m not allowing ‘jeff id’s’ rants to show up here, even if passed on to me by someone else–eric]

———————-

This stuff is so silly, you know it saps you to listen to it. Doggozza calling me all kinds of things, real climate not allowing quesitons, they want me to be a denier. It’s like they’re saying please Jeff please deny our work just don’t make us disclose the methods.


89 Responses to “Steig’s Code”

  1. wattsupwiththat said

    Jeff,

    I found the code, all the matlab modules in the TAR file

    “The program package consists of several Matlab modules. To install the programs, copy the package (available as a tar.gz-file) into a directory that is accessible by Matlab.”

    here is the URL for the TAR file:

    http://web.gps.caltech.edu/~tapio/imputation/imputation.tar.gz

    Unless I’m missing something, that’s it.

  2. wattsupwiththat said

    I’m not familiar with matlab, so I don’t know if these are the files you seek, or simply generic modules. – Anthony

  3. Jeff Id said

    Wow, thanks for trying Anthony. These are the generic modules though.

    My beef is that Steig is claiming they are his code with a link to these modules but he didn’t write any of this code. How he uses these functions is my question, even then the detail will be challenging to grasp but at least we’ll know how the functions were used.

  4. dhogaza said

    You:

    My beef is that Steig is claiming they are his code with a link to these modules but he didn’t write any of this code.

    Eric:

    “The code, all of it, exactly as we used it, is right here,”

    Steig doesn’t say it’s *his* code or that *he* wrote it.

    Why do you think he should hold your hand and take you step by step through their analysis?

    (I know, it’s so you can “agree in public” with his paper, but honestly, who cares?)

  5. Jeff, over the past couple of years, Jean S and UC have looked at Mannian RegEM code in connection with Rutherford et al 2005 and Mann et al 2008. I’ve made a category Statistis- RegEM which searches a few of these papers. My guess is that it would make more sense to start with the RegEM of Mann et al 2008 (the EIV section) where code is archived and come back to Steig et al later.

  6. Jeff Id said

    dhogaza, welcome to the air vent.

    I won’t be old enough for the fields prize till next month. I don’t think it’s unreasonable to ask for the code. Steig tricked (the right word) a lot of people into believing he had actually released his code.

    Why would he do that if he was honest?

    I’m tired now and it’s giving me a screw it all attitude. It’s just a damn joke. Thanks for the link on RC.

    Jeff

  7. Chris H said

    This is absolutely disgusting, it’s the most blatant lies & deception I’ve yet witnessed from prominent AGW-supporting scientist(s). What does Eric hope to achieve, apart from making it clear that he & people like him are not to be trusted? Why should I believe his analysis now, if he is actively trying to avoid releasing the code? One can only conclude that there are serious flaws in his analysis.

    For me this is the last straw with Real Climate et al. I was always very suspicious of these people, but now I shall have a hard time believing them, even if they say we have an ice-age coming…

  8. Chris H said

    BTW, for people not savvy with computer programming, it appears that Steig is basically saying “Write the code to perform the analysis yourself, using the standard Matlab functions that I linked to”. In other words “Please take a blind guess at how I did my analysis, because I won’t tell you”. This isn’t how (non-fake) science works.

  9. James Mayeau said

    Somewhere along the line Steig’s declaration that Harry isn’t used in the reconstruction morphed into Steig says the error will result in “minor corrections” that will result in differences in the reconstruction that “are too small to be discernible”.

    Are we going to let him get away with it?

  10. John M said

    Dhogaza #4:

    Dhogaza cites Eric Steig on his Antarctic paper:

    Eric:

    “The code, all of it, exactly as we used it, is right here,”

    Then Dhogaza dissembles:

    Steig doesn’t say it’s *his* code or that *he* wrote it.

    You missed the key deception, dgohaza. It’s the phrase “The code, all of it….”

    And they claim to be the scientists over at RC. It would be sad if it weren’t so pathetic–pretending to be objectively serious about their blog is what destroys their credibility.

  11. Chris H said

    @Jeff Id
    “It’s like they’re saying please Jeff please deny our work just don’t make us disclose the methods.”

    If you were actually agreeing with a pro-GW study, you might appear balanced (i.e. not a “denier”), and we couldn’t possibly have that now, or it would lend weight to your criticisms of the sacred texts of Mann et al.

  12. MrPete said

    By the standards of Reproducible Research, if “all the code” and “all the data” have been made available, then properly installing The Code, and properly linking to The Data should, upon running The Code, reproduce the original analysis outcome.

    Clearly that’s not true, even if you’ve archived The Data as it existed when the paper was published.

    Code is missing.

  13. Demesure said

    Of course code is missing ! If not, why all the obfuscation, the hand waiving and the censoring of Jeff’s simplest requests for reproducibility ?
    This is corruption of science at the stupidest level by unethical people who behave not like scientists but like propagandists caught lying. Steig is shitting in his pants that his budget for the next Antarctica trip might be cancelled. That’s as pathetic as it.

  14. Layman Lurker said

    Dohgaza, Eric stated on RC that people were being “disingenuous” when maintaining that code was not available. It is clear what skeptics meant by releasing code. Don’t you find that statement to be the least bit curious? Do you not think that Eric understood the distinction before he made the statment? He made no attempt dinstinguish empty RegEm from a working file until he gave that snarking response to Jeff last night (even then it was implied – not an outright distinction).

  15. Jeff Id said

    James Mayeau

    James has a good point here too. This is exactly how I read it. They did what they could to minimize damage but made several false statements on the way.

    I think the whole gavin pre-empting Steve on the Harry find incident (the GPSOTHFI file) was designed to make sure that Steve got no credit for finding the error in the Nature paper. Just think how much credibility gavin would have had if he said – I found this problem after noticing a teaser comment on Steve McIntrye’s blog, please correct the record.

    Then he could come out on RC and say bad things about McIntyre playing games and real scientists just need to make the correction. They may have been forced to give a half point to SM but gavin would look a hundred times more credible. Instead they end up looking like childish idiots more interested in playing games.

    Now we have Eric Steig claiming to have released all of the code exactly as used yet not one line of the code he points to came from his hands. Not one number.

    In Steig’s defense I did impute the optical transparency of his outer garments in a previous post but that was after most of the comments above were already rejected at RC.

  16. ed said

    Jeff
    My suggestion is for you to submit a FOIA request as soon as possible to recieve the information you seek.

  17. joshv said

    Jeff, I asked about you posting the code and data for your arctic work, and you did not answer. Are you planning on releasing the code for the statistical work you display here?

  18. Jeff Id said

    JoshV,

    I forgot about you. Sorry about that, the code needs a bit of cleaning up. I will make it a priority tonight.

    Would you settle for a link to the CA utilities that my file uses?
    http://www.climateaudit.org/scripts/utilities.txt

    Jeff

  19. Kondealer said

    Jeff as a lurker at CA, RC and other “climate” sites, I have become increasingly appalled at the behaviour of Schmidt, Mann and more lately Steig. They are “spinning” like politicians and expensive lawyers. “All of the code” is technically correct. But like any statement from politicians/lawyers, the Devil is in the detail.

    Given the “Hockey Stick” saga and RCs latest pronouncement from on High that the Antarctic warming study is “robust”, I now have serious doubts as to whether in fact it is.

    What I would like to know in words of one syllable, not RC lawyer-speak is;
    1) Was Harry’s (faulty) data used used in the reconstruction?
    2) If so, what weight (like the Bristlecones) was put on Harry?
    3) Is the reconstruction “robust” when Harry (and God know’s what) other inappropriate data have been removed?

    I did ask these questions over at RC, in unconfrontational terms, but was cut.
    Doesn’t matter, I’m sure that you or Steve M will find out.

  20. Jeff Id said

    These guys are really stinging. They’re doing everything they can to claim people are disagreeing with their study. I’ve said not one word in disagreement, McIntyre not one word. Nobody claimed dishonesty in the paper. Nobody even hinted at it yet his is the drivel Eric Steig posts.

    [Response: #2 and #36 are at the base of the Antarctic Peninsula (on the Weddell side). #60 is ‘Theresa’, not too far from “Harry”. If you were still thinking like McIntyre, you’d accuse me of dishonestly including those two stations which are arguably ‘really’ on Antarctic Peninsula in my average for West Antarctica (they are just south of 72 S). Sorry, but doing that makes the comparison with the satellite data even better. Oh wait, about face! Channeling SM.. hang on a sec…. “Better idea: Steig is right about West Antarctica. It’s the Antarctic Peninsula where we made up data. Yeah, that must be it. The Antarctic Peninsula is cooling!!” …sigh…–eric]

  21. AEGeneral said

    Eric Steig states:

    “If the point is to “audit” our work, it makes no sense whatsoever to provide all the intermediate products used in our analysis. That would defeat the purpose of the supposed “audit”.”

    So is this study not “audited” in the peer-review process? If those who reviewed this study weren’t provided any details of the analysis, then there are some piss-poor standards that need to be changed, because the end-users of these studies enact laws that affect the entire world.

    Surely they were provided more than this. Surely I’m not held to far higher standards as an accountant, and my end-users pale in comparison. You can’t possibly conduct an audit without having any details as to how the analysis was conducted.

  22. joshv said

    Jeff, though I might eventually at some point download your code and play with it, I don’t have any immediate personal interest.

    I was just pointing out that, as a matter of consistency, you should endeavor to be just as transparent and open about your code, data and methods as you are asking Steig to be. Not being critical here – I think you do a very good job of explaining your methodology, but I think putting your code out there would be an excellent starting point for others in the field. It would also allow you to share what you’ve learned about the mysterious incantations of R.

  23. Jeff Id said

    The reason I didn’t post this particular R code before you requested it is because I expect people will be unwilling to download the 1.5 Gb of data and unzip it to actually run the code but it’s no problem.

  24. Inge said

    Isn’t referring you to the REgEm manual for the source code sort of like mailing a thesaurus to a publisher with the note “All the words used in my magnificent new novel are contained in this documentation. Please publish”?

  25. Carl G said

    #12 is spot on. You should be able to receive data and code, and after changing file directories, hit “run” and receive a result. Why is that so hard to understand?

  26. Layman Lurker said

    OT Jeff, check out David Stockwell: http://landshape.org/enm/

    What is this password protection stuff all about?

  27. Matt Y. said

    Ugh. This is why I can’t read RC, even though I’d like to follow what they are up to and think there are legitimate issues that deserve study. They are as much activists as scientists, and the contempt they have for anybody who is at all skeptical is palpable.

    This thing with the code is totally childish. And what is with the hold your hand and walk you through it crap? I’m guessing you have a life — and better things to do — than try and play some half-ass detective game just to understand what they actually did. Steig would probably like nothing better than to have SM and yourself waste countless hours trying to figure out little details he could easily clear up just by posting a few kb of code. It all looks very political to me. They got the headline they wanted. After being repeated in the media enough times, the talking point will be taken as a given. If it turns out later that the analysis was flawed, what difference will it make? They just need to buy enough time to let the meme take hold.

  28. Layman Lurker said

    re: 26

    Maybe David will do an open post later with results and conclusions of his analysis and refer people to data files and software tools needed to verify the research. 🙂

  29. Jeff Id said

    On Steve McIntyre’s advice, I’m going to try cramming the data through the matlab routines in different formats and try to reproduce the results. Since I’ve never run matlab, it might be interesting.

  30. Jeff Id said

    Here are a partial list of options I’m going to have to figure out in order to replicate the use of matlab RegEm

    %
    % OPTIONS.regress Regression procedure to be used: ‘mridge’
    % ‘mridge’: multiple ridge regression
    % ‘iridge’: individual ridge regressions
    % ‘ttls’: truncated total least squares
    % regression
    %
    % OPTIONS.stagtol Stagnation tolerance: quit when 5e-3
    % consecutive iterates of the missing
    % values are so close that
    % norm( Xmis(it)-Xmis(it-1) )
    % <= stagtol * norm( Xmis(it-1) )
    %
    % OPTIONS.maxit Maximum number of EM iterations. 30
    %
    % OPTIONS.inflation Inflation factor for the residual 1
    % covariance matrix. Because of the
    % regularization, the residual covariance
    % matrix underestimates the conditional
    % covariance matrix of the imputation
    % error. The inflation factor is to correct
    % this underestimation. The update of the
    % covariance matrix estimate is computed
    % with residual covariance matrices
    % inflated by the factor OPTIONS.inflation,
    % and the estimates of the imputation error
    % are inflated by the same factor.
    %
    % OPTIONS.disp Diagnostic output of algorithm. Set to 1
    % zero for no diagnostic output.
    %
    % OPTIONS.regpar Regularization parameter. not set
    % For ridge regression, set regpar to
    % sqrt(eps) for mild regularization; leave
    % regpar unset for GCV selection of
    % regularization parameters.
    % For TTLS regression, regpar must be set
    % and is a fixed truncation parameter.
    %
    % OPTIONS.relvar_res Minimum relative variance of residuals. 5e-2
    % From the parameter OPTIONS.relvar_res, a
    % lower bound for the regularization
    % parameter is constructed, in order to
    % prevent GCV from erroneously choosing
    % too small a regularization parameter.
    %
    % OPTIONS.minvarfrac Minimum fraction of total variation in 0
    % standardized variables that must be
    % retained in the regularization.
    % From the parameter OPTIONS.minvarfrac,
    % an approximate upper bound for the
    % regularization parameter is constructed.
    % The default value OPTIONS.minvarfrac = 0
    % essentially corresponds to no upper bound
    % for the regularization parameter.
    %
    % OPTIONS.Xmis0 Initial imputed values. Xmis0 is a not set
    % (possibly sparse) matrix of the same
    % size as X with initial guesses in place
    % of the NaNs in X.
    %
    % OPTIONS.C0 Initial estimate of covariance matrix. not set
    % If no initial covariance matrix C0 is
    % given but initial estimates Xmis0 of the
    % missing values are given, the sample
    % covariance matrix of the dataset
    % completed with initial imputed values is
    % taken as an initial estimate of the
    % covariance matrix.
    %
    % OPTIONS.Xcmp Display the weighted rms difference not set
    % between the imputed values and the
    % values given in Xcmp, a matrix of the
    % same size as X but without missing
    % values. By default, REGEM displays
    % the rms difference between the imputed
    % values at consecutive iterations. The
    % option of displaying the difference
    % between the imputed values and reference
    % values exists for testing purposes.
    %
    % OPTIONS.neigs Number of eigenvalue-eigenvector pairs not set
    % to be computed for TTLS regression.
    % By default, all nonzero eigenvalues and
    % corresponding eigenvectors are computed.
    % By computing fewer (neigs) eigenvectors,
    % the computations can be accelerated, but
    % the residual covariance matrices become
    % inaccurate. Consequently, the residual
    % covariance matrices underestimate the
    % imputation error conditional covariance
    % matrices more and more as neigs is
    % decreased.

  31. Jeff Id said

    Here’s eric steigs admissioin that he’s playing games.

    [Response: I do routinely make all our data available, as does everyone else that I know. In this particular case, anyone legitimate who has asked for all our data, including the intermediate steps, has received it. To continue with the analogy with financial auditing, let me very clear on what I mean by legitimate: In the business world, auditors 1) don’t publicly accuse a company of withholding data prior to requesting said data; 2) are not self-appointed; 3) have to demonstrate integrity and competence; 4) are regulated. On this point, if you are suggesting that Steve McIntyre be regulated by an oversight committee, and have his auditor’s license revoked when he breaks ethical rules, then we may have something we can agree on.–eric]

  32. BDAABAT said

    #31: Now we get to the REAL answer! When he says:
    [Response: ALL of the data that were used in the paper, and EXACTLY the code used in our paper have been available for a long time, indeed, long before we published our paper. This is totally transparent, and attempts to make it appear otherwise are disingenuous. This has always been clear to anyone that asked. If you wanted to do the work yourself, for legitimate reasons, you could do so. If the point is to “audit” our work, it makes no sense whatsoever to provide all the intermediate products used in our analysis. That would defeat the purpose of the supposed “audit”.–eric]

    What he REALLY means is, “I want to convey the appearance of transparency, but do so in a way that is anything BUT transparent.”

    So, who is being disingenuous???

    Totally amazing!
    Bruce

  33. mikep said

    Although many of your readers will have seen it before it’s worth having a look at the polices for submissions to the American Economic Review – one of the premier economics journals. The whole document can be found at

    http://www.aeaweb.org/aer/data.php

    but here is a selection of some salient points:

    “It is the policy of the American Economic Review to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication.” and later

    “For econometric and simulation papers, the minimum requirement should include the data set(s) and programs used to run the final models, plus a description of how previous intermediate data sets and programs were employed to create the final data set(s). Authors are invited to submit these intermediate data files and programs as an option; if they are not provided, authors must fully cooperate with investigators seeking to conduct a replication who request them. The data files and programs can be provided in any format using any statistical package or software. Authors must provide a Readme PDF file listing all included files and documenting the purpose and format of each file provided, as well as instructing a user on how replication can be conducted.”

    It’s a pity Nature can’t meet the standards of the best economics journals, and that scientists take umbrage when they are asked to do what leading economists have to do…

  34. BDAABAT said

    One followup: the response listed in #31 is an admission that he lied. There isn’t any other way to state it. HE LIED. This isn’t parsing language. He said all the code as available, it was exactly the code use, and that the process was “totally transparent”. This is a demonstrable lie, one that he now admits. Would bet that his employer has specific ethics policies on what constitutes appropriate behavior, and I’d bet that what he’s demonstrated doesn’t meet the standard. I’d bet if one of his grad students engaged in similar behavior, that he would do something about it.

    Bruce

  35. AEGeneral said

    Eric Steig states:

    “In the business world, auditors 1) don’t publicly accuse a company of withholding data prior to requesting said data; 2) are not self-appointed; 3) have to demonstrate integrity and competence; 4) are regulated. On this point, if you are suggesting that Steve McIntyre be regulated by an oversight committee, and have his auditor’s license revoked when he breaks ethical rules, then we may have something we can agree on.”

    And also in the business world, when management lies to the auditors or does not provide the information requested in its entirety, the risk for fraud is raised and auditors perform more extensive & unpredictable testing to compensate for the additional risk.

    If this guy is so sure this study is correct, he should stand behind his work & provide full disclosure to anyone who requests it. I thought that’s what science was all about.

    And accountants aren’t regulated. I hate when uninformed people say that.

  36. #28: I am still marking up the text, so I am holding it for a few days.

  37. Layman Lurker said

    I figured it was likely something like that. I couldn’t resist having a little fun with it though. Hope you don’t mind. BTW, I enjoy reading your blog – particularly appreciate that skeptics and warmers can have an open discourse with the focus on science.

  38. mack520 said

    I am sure it is incredibly frustrating. On the other hand you must ( I hope) take some considerable pride in your efforts. They turned the dHog on you and he was as polite and respectful as I have ever seen him be.

  39. John F. Pittman said

    Well, in our company you lie in a contracted audit, it is your job. Lie to a government audit or government required audit; and it is your job and possible jail time. You will almost certainly be fined and fired. Not providing requested documents, unless it is somehow impossible or nearly so, is acceptable. Not providing documentation that is available in a timely manner can lead to fines and very increased oversight and requirements. Company management make sure that those who are questioned, or asked to provide data, understand the rules. Makes you wonder where the management of Gavin and Eric are? Since Obama has indicated that he considers FOIA the publics’ way to do their checking on monies spent in their name, I can’t believe that the political management will be happy if this explodes into the scientific version of the “Daschle” event, with humourous overtones. Lying first then admitting the truth. Claiming to provide all the code, and then not doing it, and eventually admitting such. Perhaps Jeff should consider sending a complaint to OMB. They are the departmental agency that has the legal authority to “grade” how an agency implements FOIA and other requirements such as Data Quality Act, and proper use of contractors and the contractor’s requirements.

  40. John Norris said

    I tossed Gavin a softball on the subject of archiving data on the 27 Jan Antarctic thread and he answered it.

    John Norris Says:
    2 February 2009 at 9:46 PM
    Has Steig archived all code and data used in the Steig et al paper to a publicly available website? Or did he just provide a reference to various sites holding the data (that can get revised)?
    [Response: You raise a good question. Steig’s archiving is athttp://faculty.washington.edu/steig/nature09data/ and you can see that the data sources are referenced to the originating organisations (who can and do update data, fix errors etc.). Ideally, for ‘movable’ datasets, one would want a system where snapshots in time were recoverable (and citable), along with pointers to the up-to-date versions, forward citation to publications that had used various versions and the ability to update analyses as time went on. What you don’t want is mostly duplicate data sets that aren’t maintained floating around in the grey zone – that will just lead to confusion. Google were actually working on such a system, but have unfortunately lost interest. Other organisations such as BADC are thinking along those lines, but it is a sad fact that such a system does not yet exist. – gavin]

    My reply to his reply did not pass moderation though, despite three attempts:

    re 160

    “… What you don’t want is mostly duplicate data sets that aren’t maintained floating around in the grey zone – that will just lead to confusion. …’

    I am pretty sure that I don’t want important research that can’t be replicated. But thanks for your recommendation for what you think I don’t want.

  41. Jeff Id said

    John,

    I read your post yesterday. The “moderation” is too much for me.

    #38 Mack,

    I appreciate the support. However, there is no pride for no result. The dHog (good name) has been after me for some time at Tamino as well. He won’t engage in a non-moderated venue.

  42. Molon Labe said

    Jeff,

    If you don’t have acces to Matlab, you might see if the RegEM package will work using the freeware knockoff of Matlab called Octave. Available as part of the Cygwin distribution, which is a port of GNU utilities to Windows.

  43. Molon Labe said

    A quick cut at using RegEM with Octave: The code appears to use ARPACK routines which I don’t currently have installed. Will revisit when I have some time.

  44. Molon Labe said

    Also, here is the official site for Octave.

  45. Jeff Id said

    #44 thanks for the help. I have a copy of matlab.

    I’m working on cramming the temp timeseries through it right now.

  46. mack520 said

    The dHog (good name) has been after me for some time at Tamino as well. He won’t engage in a non-moderated venue. Uh- yeah he won’t. Has a tendency to manipulate the truth, also.

  47. mack520 said

    hey i hope you will excuse my profanity- I have a box of fortran books- can i send them to you- will that help? Somewhere someone said (paraphrased) ” if I send a publisher a copy of Rogets with my brilliant novel inside- all the words are there”. This is so- stupid- this is so indefensible , this is so outrageous. see ya in 3 months

  48. I agree with you. This was shabby treatment indeed, and foolish as well. See my blog for more.

  49. Chris H said

    Steig said [quote]anyone legitimate who has asked for all our data … has received it. … what I mean by legitimate: … 1) don’t publicly accuse a company of withholding data prior to requesting said data; 2) are not self-appointed; 3) have to demonstrate integrity and competence; 4) are regulated.[/quote]
    My translation of those points:

    1. Public requests (such as blog comments) for said code+data count as accusations of withholding, and will be mis-used against you.

    2. Anyone who we don’t like is “self-appointed” and will be ignored.

    3. Anyone who is publically open (such as on a blog) about their investigations of our papers has too much integrity & competence for us to deal with them, we only like to deal behind closed doors.

    4. Anyone who is not in the official Climate Scientists old boys club (aka Team) will be rejected as being “unregulated”. We “regulate” each other, and therefore have nothing to worry about.

  50. Matt Y. said

    #49:

    My translation of that quote was:

    “what I mean by legitimate” == “anybody who blogs for Real Climate”

  51. davidc said

    I had a fleeting encounter with Matlab a few years ago. What I recall is that it operates in two modes. One is like conventional code in other languages: you write the code, then hit RUN. In the other mode you enter commands at the prompt as you do with a calculator. Maybe Steig is referring to the second mode and his totally transparent revelations about what he did is not so much the dictionary that contains every single word he said as the circuit diagram for his calculator that does “+” etc.

  52. Jeff Id said

    David,

    Your right about matlab, but there are a pile of options and possibilities for how to set up the RegCm. Most of it is outlined in the paper but I have several questions for how the data was processed for RegEm. Last night I did some reading and if I had to guess, there are at least a hundred lines of code used for implementation of this version of RegEm.

  53. As an outsider to this debate, I believe the issue of “codes and data” must be stated more clearly. If I understand the situation correctly:

    (1) All the input data to Steig 2009 is available from the original sources. Some (all?) of these sources have zero or inadequate version control, so in the future only grey data will be available. Thanks to McIntyre’s note to BAS, they at least have archived recent changes.

    (2) Replication of the results in Steig 2009 requires knowledge of the processes used, meaning some combination of code and intermediate work results. Ideally full archiving of both should be made available to the public.

    (3) But Steig has here given his definition of who deserves access to this material. In his words, who is “legitimate.” To Steig only a very few should get access to key information on which public policy will be made – policies that might determine the fate of the world (as Al Gore has told us).

    Bringing the issue of access to the public’s attention might be the most important result of Steig 2009.

  54. Jeff Id said

    #53 apparently we’re all outsiders to the debate.

  55. John F. Pittman said

    Outsiders expected to foot the bill. I wonder where that “ownership” speech politicians are always give when wanting us to “buy in”, is going to occur? It makes me wonder just what introduction is going to be used. If you percieve Chu’s remarks as one of those politcal “feelers” to test the waters, it makes you wonder just where Obama and his team’s heads are. Jeff, since you appreciate honest heart felt remarks, I hope you don’t mind me saying that it looks like Obama’s team have their collective heads up their collective asses. I get this feeling everytime I see this data refusal unless you are “in the club.” I wonder if Obama and crew are “in the club”? I mean they aren’t accepted climate scientists. Maybe Steig should email Obama and tell him to sick it “where the sun don’t shine” if Obama thinks he can get all of HIS code without becoming a “real climate” scientist. Just a daydream on a Saturday morning.

  56. #54 — To some (most?) members of the climate science fraternity (or priesthood), it seems we are all outsiders. Unless we force open the doors. Which should be possible. Their position is popular among the true believers at RealClimate, but IMO is unlikely to gain public support. Esp considering how much of this research is funded by the government.

    For such work, to what extent is Steig’s position consistent with the policies of the various government funding agencies and the Freedom of Information Act?

  57. Garacka said

    I’m just speculating as I could be out of my league, but Fabius Maximus, February 7, 2009 at 7:58 am comment; “… some combination of code and intermediate work results.” was what I was thinking. I’ll state it another way….

    I wonder if Steig, in fact, has no other code, because the only connection between the Matlab functions being used were “manual” intermediate steps. In other words, this is not one turn key program. What is needed is Steig’s intermediate steps description.

  58. Jeff Id said

    #57, there is no chance that there aren’t at least a hundred preparation lines for this code and graph generation which have not been disclosed.

    Another thing not disclosed is the version of the AVHRR satellite data used including processing steps.

  59. Tim L said

    WOW! this is indeed the corruption that is ingrained into are bureaucratic mess we have built over time. whew!

  60. Tim L said

    Jeff, LOOK AT THIS RANT!!!
    # Michael Tobis Says:
    6 février 2009 at 2:00 AM

    Eric, you snark: ” What is there about the sentence, “The code, all of it, exactly as we used it, is right here,” that you don’t understand? “

    I don’t understand how you think that could be true. You link to a nicely documented and from all appearances elegant library of matlab functions. Where are the data files? Where is the script that invoked those functions and plotted those graphs?

    There is absolutely no substantive reason this should not be distributed along with the relevant publication. You shouldn’t be sniffing at people who want to replicate your work in toto. You should be posting a complete makefile that converts input data into output data. This is common practice in exploration seismology, thanks to the example of John Claerbout at Stanford University, and that in a field where there are sensible commercial reasons for confidentiality. A related effort, called Madagascar, is being developed at U Texas and is 100% open source.

    The paradoxical backwardness of science in regard to adopting the social lessons of high tech is well analyzed in this blog entry by Michael Nielsen.

    RC again climbs on its high horse, doing none of us any good. You guys are the good guys. Please act like it.

    [Response: Michael, with all due respect, you are holding climate science to a ridiculously high ideal. i.e. that every paper have every single step and collation available instantly upon publication for anyone regardless of their level of expertise or competence. I agree that would be nice. But this isn’t true for even one of the 1000’s of papers published each month in the field. It certainly isn’t a scientific necessity since real replication is clearly best done independently and science seems to have managed ok up til now (though it could clearly do better). Nonetheless, the amount of supplemental information has grown enormously in recent years and will likely grow even more extensive in future. Encouraging that development requires that the steps that are being taken by the vanguard be praised, rather than condemned because there are people who can’t work out how to set up a data matrix in Matlab. And what do we do for people who don’t have access to Matlab, IDL, STATA or a fortran 95 compiler? Are we to recode everything using open source code that we may never have used? What if the script only runs on Linux? Do we need to make a Windows GUI version too?

    Clearly these steps, while theoretically desirable (sure why not?), become increasingly burdensome, and thus some line needs to be drawn between the ideal and the practice. That line has shifted over time and depends enormously on how ‘interesting’ a study is considered to be, but assuming that people trying to replicate work actually have some competency is part of the deal. For example, if there is a calculation of a linear trend, should we put the exact code up and a script as well? Or can we assume that the reader knows what a linear trend is and how to calculate one? What about a global mean? Or an EOF pattern? A purist would say do it all, and that would at least be a consistent position, even if it’s one that will never be satisfied. But if you accept that some assumptions need to be made, you have a responsibility to acknowledge what they are rather than simply insist that perfection is the only acceptable solution. Look, as someone who pretty heavily involved in trying to open out access to climate model output, I’m making similar points to yours in many different forums, but every time people pile on top of scientists who have gone the extra mile because they didn’t go far enough, you set back the process and discourage people from even doing the minimum. – gavin]

  61. Tim L said

    Jeff, lol I like this…. and they CUT YOU OUT!
    if they would have just given you the answer, it would be a non issue!

    # Jason Says:
    6 février 2009 at 10:07 AM

    Gavin,

    In response to #53 you say that:

    “you are holding climate science to a ridiculously high ideal. i.e. that every paper have every single step and collation available instantly upon publication for anyone regardless of their level of expertise or competence”

    I agree that this is a very high standard, and one that is not realistic.

    But it IS important that sufficient information be released that experts in the field can replicate your results.

    Pointing somebody to the Matlab RegEM package and to the original source data is most certainly not sufficient.

    [Response: For someone who is an expert? For sure it’s enough. I might post on a recent replication exercise I did and how it really works, rather than how people think it should work. – gavin]

    [edit – Eric isn’t here to discuss this, and so lay off that until he is.]

  62. Jeff Id said

    Tim and everyone,

    I have to thank you for the support on this. Eric Steig and gavin did it to themselves, Gavin has been cutting posts on this for the last two days rather than just reveal the apparently non-existent code.

    I’m apparently not qualified to see which numbers were put where so until I get a climatology degree (whatever that is) I couldn’t possibly understand the complexities of LINEAR ALGEBRA. Well what can I say besides I’m guessing the graduate differential equation classes and software I have studied and written as a lowly aeronautical/optical engineer kept up well with um… climatology math..

    If you look at the broken nature of the RC threads you can kind of see that people are being cut by the dozens.

    If we don’t give up, they might just break. Of course, I don’t expect truly full complete disclosure at this point and I do expect plenty of mocking when the code is finally let go but … who cares 🙂

    Again, thanks for your support in doing what’s right.

  63. Tim L said

    last one i hope. this is a snark on Anthony.

    caerbannog Says:
    6 février 2009 at 10:51 AM

    IIRC, What started out as a skeptic’s (John V’s) critical look at GISSTemp turned out to be a nice confirmation of the quality of the GISSTemp product. And that, as I recall, kinda took the wind out of surfacestations.org’s sails.

  64. Tim L said

    no one more
    http://faculty.washington.edu/steig/nature09data/
    this is everything in list that they used.
    BUT NOT how it was used, and the setup.
    very devious indeed!
    I think you should do your own work… not try to “revue” there work..
    get published in popular science !!!!!!
    you are better than they are… that is why the hate is coming out from them.
    uhmmm

  65. Jeff Id said

    #64 Thanks Tim, I’ve been to that link many times. I’m no climatologist but I can understand their papers (when they’re disclosed) and after my plot on sea ice trend, I’m changing my tune. Instead of my early belief that the result of this analysis is actually correct, I no longer believe the temperature trend is as severe as they claim and it may not even be upward.

    I’ve got matlab but have several problems. I can’t figure out the AVHRR data series they used. I don’t know the values used for truncation of the ttls option and can’t find any reference to individual station weighting. It looks like I need to make it up for RegEm.

  66. Based on the discussion of this thread, perhaps the post should be revised. The data is available, the methods (code, intermediate work product, etc) has not. Disclosure of the statistics package used is of little help.

  67. Jeff Id said

    The AVHRR regridded data is not available. Station weighting data seems like it’s also not available (not sure if or how this was done).

  68. How Steig et al re-gridded and weighted the data are details of the process, not the input data. I do not understand why the insistence on saying “the data is not available”. It is at best and unclear, even weak, assertion — and allows Steig to point to the source data as an effective rebuttal.

    A focus on disclosure of methods is IMO both more accurate and understandable to the lay public. How can we have confidence in results with so little description of the process by which they were produced? Also, so little detail makes replication difficult.

  69. Jeff Id said

    Please point me to the 50 x 50km AVHRR TIR satellite data and I will stop making the statement immediately.

    How did it go from 1.5km to 50 may not be a terribly important point but the 50km data appears to be what was used to determine only 3 pc curves were required to represent an entire continent.

  70. I do not understand your question. Do you believe Steig and company started with something other than the posted AVHRR satellite data? Or are you asking how he re-gridded the posted data?

  71. Follow-up to #70: If I correctly understand, Steve McIntyre says that Steig 2009 “processes” the public AVHRR data:

    “It {AVHRR} is twice daily information on different grids than used in Steig. Steig processed the data conditioning it for cloud cover in a different way than their predecessors. For the statistic analysis in Steig, they would have prepared a data set of monthly AVHRR results. Steig says that he provides this information to “legitimate” researchers, but refused to provide it to me.”

  72. Jeff Id said

    First, there is no 50 x 50 data provided at the link. This means the regrid of lower resolution data required a method.

    It is probably a minor point but as I am figuring out this paper, the satellite 50×50 was PCA’d into 3 total trends. I would love to run the PCA’s and accept more degrees of freedom to see just how much the other pc’s actually change the analysis. If they don’t this might still be a good paper.

  73. Jeff Id said

    #71 absolutely.

  74. I do not consider this is a pedantic point. The data used is, as Steig says, available to all. Your own post (#72) says that you would like to know the “method” used, not the original data.

    The reality of the situation requires high precision when questioning authorities like Steig. Like Caesar’s wife; in practice you have to be better than them. Stating that you need “the data” IMO does nothing but provide an easy way to discredit you. He points to the data, QED.

  75. Jeff Id said

    I don’t see the data, I see a link to a site with twice daily data which allegedly can provide data in a completely different format with literally an infinite number of possibilities for conversion. I don’t see any description of how some data was turned into other data-a pretty important step.

    I want to agree with you but I can’t.

  76. Well, perhaps that accounts for the difficulty you have communicating your conclusions to a larger audience. Your own posts repeatedly speak of methods by which this source data is processed. As in “how some data was turned into other data – a pretty important step“, and how the source data offers “literally an infinite number of possibilities for conversion.”

    I believe Steig has an airtight reply in this specific dispute. I am sympathetic to your overall position about Steig 2009 (not an expert, I have no opinion), and have written extensively about climate skeptics’ work — but even I believe Steig is 100% correct about the data’s availability.

  77. Jeff Id said

    I don’t think you understand what I’m saying. The first thing we should agree on is that the net trend is a small but discernible fraction of the noise level, this means small changes in method make a big difference in outcome.

    You can have substantial processing prior to utilizing the “data”. This creates entirely new and completely undisclosed data by which three PC’s are determined.

    Steig has the “data” on his computer which for some ‘very unusual’ reason he refuses to provide. A link to the raw data which went through undetermined and unprescribed processing is pretty important in this paper.

    If I run a PCA analysis on the sat data as it is and I find that 3 pc’s are not sufficient for processing the antarctic, Steig can hide behind the fact that my data is somehow different. People will say publish it and there will be two papers, one from Jeff Id and one from Steig. As you can see climatology is so politicized that there is no possibility my paper would have any weight and I will have wasted a month of my life.

    If I run the data as he has it, and find that 3 pc’s work for his set and 3 don’t work for the full set we have resolution. If I run the data which I hope to receive from the NSIDC, I have applied for it and I come to a different result, we’ll all be wondering what’s happening. Did Jeff mess up or did Steig? If I have Steig’s data and it performes as advertised, there is no controversy, the antarctic is warming rapidly and humans are in trouble.

    So, you need to ask yourself, why wouldn’t Steig release his satellite data. He has the bandwidth available to him. Why doesn’t he release his code, the bandwidth is also available. What possible motivation would prevent the disclosure of his methods and data? Even if I’m hell bent on badmouthing his work, what effect would that have on the result?

    BTW, this one was a bit irritating,

    Well, perhaps that accounts for the difficulty you have communicating your conclusions to a larger audience.

    BTW: Possibly a thousand people will read your comment.

    My blog is 6 months old and has grown rapidly. I run at about 1/10th of CA for clicks but about 1/100th for comments already and SM is basically famous in the field. That’s not too bad for an engineer who first started looking at the actual climate data only 6 months ago. My readers get used to my style, they understand that I can be wrong and will even admit it. They also know that I don’t beat around the bush with my opinions.

    I would suggest that rather than me missing the message, there are perhaps nuances in the history of the team and in the art of programming that in this case you may be missing.

    ‘art of programming’ – never thought I’d say that. 🙂

  78. Layman Lurker said

    FL, I don’t think refering to processed “data” as “method” will really change much in the debate between skeptics and warmers or even the world of RC.

    On what is data vs. what is method. Temp anomalies are not raw measured data. It is a data set that is derived from raw measurments and calculations. But it is still a data set.

  79. Layman Lurker said

    Sorry, that should be FM, not FL.

  80. An editor at RealClimate has stepped into this debate and declared you the winner.

    Excerpt from Comment #63 to “On Replication” by Nicolas Nierenberg, 9 February 2009 at 10:34 AM (bold emphasis added):

    “… Other than a couple of people the conversation seems to be converging on the fact that providing the code and data is preferable. Dr. Schmidt has said that this will be done in the case of the Steig paper. [editor note: this was done in the case of Steig et al with respect to code though perhaps not with as much hand-holding as you seem to want. some of these data are proprietary (NASA), but will be made available in the near future]

    Congratulations for some great detective work, esp persevance though the repeated assuarances by Steig and others that the code was “All of the data used in the temperature reconstructions are from publically available data sources.” (source)

  81. OK, let’s try again. It appears that the RealClimate folks have no consistent definition of “data.” The latest spin about Steig 2009:

    “[reply: the raw data are public; the processed data (i.e. cloud masking) are not yet, but will be in due course. so relax]”

    OK, you have your orders. Relax. The authors will release this vital information when they are good and ready to do so.

  82. curious said

    This inline to AMac made me smile at CA today:
    ************************************************************
    Steve Mc: The “matlab” exchange is here (link to RC “Antartic Warming is Robust” thread)

    jeff Id says:
    4 Feb 2009 at 11:21 PM

    A link to my recent post requesting again that code be released.
    [edit]
    I believe your reconstruction is robust. Let me see the detail so I can agree in public.

    [Response: What is there about the sentence, “The code, all of it, exactly as we used it, is right here,” that you don’t understand? Or are you asking for a step-by-step guide to Matlab? If so, you’re certainly welcome to enroll in one of my classes at the University of Washington.–eric]

    The linked code pointed to a subroutine that was used in Steig’s calculation but the subroutine did not constitute ALL the code. Nor as of Feb 4, 2009 was Steig’s data fully available. With the benefit of hindsight, I’m wondering what precisely within the jeff Id comment Steig considered to be “snide and inaccurate”? Did Steig take umbrage at the suggestion his reconstruction was “robust”?
    ************************************
    AMac’s comment is worth reading for the context:

    http://climateaudit.org/2011/09/08/more-on-dessler-2010/#comment-302658

  83. Layman Lurker said

    I caught that too, Curious. A little thin skinned methinks.

    To my knowledge, S09 cloud masking algoritm is still not archived which has always gnawed at my craw. Comiso’s piece should have been submitted for peer review separately prior to S09. I started looking into the processed AVHRR data comparing spatial correlations of surface stations with the corresponding grid points of Comiso’s data. It is unfinished (hopefully will pick up on it this winter again when I have time) but have done enough to see that the time lagged cross correlations between stations are not homogeneous between data sources.

  84. curious said

    83 – Layman – hmmm, I think it is more a case of misrepresentation by Steig than thin skin. The context is that Mosher was discussing due diligence and an obvious component of this is to provide data and code so your work can be properly scrutinised. In response Steig chooses to pick up Mosher’s reference to his Matlab jibe and give it a new spin that the history of the incident doesn’t support. The thing that made me smile was SteveMc’s take on the source of the offense 🙂

    btw – it also reminded me of the Corrigendum issue with that paper and I wonder if that would be an appropriate mechanism for Dessler to consider?

  85. Layman Lurker said

    You might be right Curious. I had the same reaction when I first read the comment. But giving him the benefit of a doubt, I thought maybe he thought Jeff was being sarcastic and facetious by saying S09 was “robust”. I thought it was a “thin skinned” interpretation if that was the case.

  86. kim said

    However Jeff meant it, if Eric interpreted it as snide, then he knew his work was not robust.

    The psychology of the Team is an open book, and you know it when you see it.
    ==============================

  87. Mark T said

    Indeed, Kim, that much should be obvious to anyone that read the exchange. His follow-up lashing of Steve Mosher after making what amounted to a glib joke regarding the whole affair was further evidence of his own insecurities regarding his work. He is protesting (and defending) a bit too much to be believed as sincere.

    Mark

  88. Kenneth Fritsch said

    83.Layman Lurker said
    September 9, 2011 at 7:26 pm

    LL, I did some breakpoint analyses of the Antarctica temperature series and from these results one can see that RLS and EW from O(10) are closely related with regards to breakpoints and S09 and AVHRR are closely related but not related to RLS and EW. RLS and EW are closely related in these regards to the grounds stations. I finally did a similar breakpoint analyses of the UAH coverage of the Antarctica and found that it corresponded breakpoint-wise much better with RLS and EW than S09 or AVHRR. One of the major breakpoints in the S09/AVHRR data corresponded to a change in a satellite used to obtain the AVHRR data.

    I linked some of this work over at Nick Stokes’ blog but did not get any responses. I was attempting to separate out the effect of the AVHRR measurements and then determine whether it was a major factor in the difference between the S09 and O(10) results. At this point I have convinced myself that that is case and that there is strong suspicion that the AVHRR data contains artifacts. Remember that O(10) used the AVHRR data primarily for spatial correlations and not with time.

    When you dig into the data you also find that the ground station data is very sparse and particularly
    so for the land area of West Antarctica.

  89. Layman Lurker said

    Interesting Kenneth. When I get the time maybe I’ll look for your stuff at Nicks. I was seeing some curious (to put it mildly) correlations over both time and space with the Comiso AVHRR data. The corresponding surface stations had what I would have considered more of a normal decay of correlation over time and space – though I’m hardly an expert. My question is whether these are spurious correlations and perhaps an artifact of Comiso’s cloud masking process. Something which if true would likely have impacted the reconstruction.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: