the Air Vent

Because the world needs another opinion

NAS Provides Recommendations for Public Data Access

Posted by Jeff Id on August 20, 2009

The National Academy of Sciences has released a pre-publication copy of a government funded project titled – Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. This is a quote from the executive summary.

The report recommends that all researchers receive appropriate training in the management of research data, and calls on researchers to make all research data, methods, and other information underlying results publicly accessible in a timely manner.

Of course it is a government program so the recommendation for adding personnel for training of all researchers fails to mention cost and becomes an automatic self expanding department like QC departments have become in manufacturing. Still, this article recognizes the key aspect of what Steve McIntyre has spent years now working on HERE as an example, access to the data and code behind research projects. Others may agree with NAS recommendations in general but in my opinion they go overboard adding layers of time and effort which could be otherwise better spent doing research when all they needed to do was insist that data and code with turnkey level instructions be archived and published with sources listed.

It reminds me of the International Space Station, if you’ve ever seen the nasa feed on that you may have noticed that the astronauts spend hours every day inventorying and cataloging the tools, foods and equipment by reading ten-ish digit hyphenated numbers on package labels back to ground control. It drives me nuts to think how much it just cost to inventory a set of pliers. Why not just barcode the damn things, or at least everything possible as any warehouse on earth would but that’s not the point. The point here is layers of work for researcher’s who are naturally disinclined to do such labor.

There are eleven recommendations in the summary, the free online format for the entire book is in png which is a non-vector graphic. If you want infinitely expandable graphics you can buy the pdf form. I had to expand the graphics in processing software to make them legible.

Image1This is a key item of course. If data integrety is not maintained or software is not appropriately commented and archived the amount of knowledge gained from certain forms of research is quite limited.

Image2

Training, sounds good but this is a trojan horse in my opinion. Training implies trainers or professionals which like QC managers are in part dedicated to the expansion of their own field and department. The only training required would come automatically from exposure of the code and data to public eye.

Image3

This is all that needs to happen. The institutions should conform to a basic very simple rule that data and code be archived and accessible for everyone when possible. To that end they should set up a short list of their own procedures to follow the primary guidelines.

Image4

There it is , this trojan horse has a big window in the side. Research institutions should recognize the contributions of data professionals and should then fund the aforementioned data professionals who are those that make decisions about data release. Sounds perfect to me.Image5

If verification were allowed in all cases this would be a huge step forward from where we’re at today. Consider that Steig et al took thousands of lines of high level code to verify and the best we’ve been able to do is come close. Ryan recently received raw AVHRR data which took about seven months from publication. The paper has probably been cited before replication and any rebuttal can be published because of this delay. Imagine what that would mean if it were done with intent.

Some discussion under #5 in the summary was related to standardized formats in a certain field which made the data less accessible. Certainly it is important to make data accessible in standard formats outside proprietary programs which often cost thousands of dollars but other than that, work it out yourself is by far the best option. Standardizing formats for data is a massive and unreasonable requirement for a field. It would be like telling a carpenter what brand and type of tool to use for every job. All researchers need are reasonable open source formats. After that, it’s up to you to work it out. The code needs to be in any software the researcher feels is best which NAS seems to recognize by staying away from that issue entirely. Image6

This standard for sharing research data should be based on simplicity and a few bullet points followed by one paragraph descriptions. While having input from these organizations is important, the outcome from these adventures is often a random kludge. The NAS seems to put little thought into recommending simplicity. Image7

Image8Image9

It’s research, they keep the data as neatly as they can and report it at the end. Researchers have been doing this for centuries now to varying degrees of success. Exposure of the output is all that’s required to manage the quality of the input. Scientists need the freedom to think about research, to change direction when a new avenue opens itself and to record data as required based on their current understanding.Image10Image11It’s good to see the NAS is paying attention to the difficulties in data sharing. It’s a useful report which as you can see raises hairs on my neck with the potential for overstepping government organizations. Perhaps researchers in general will resist some of this due to the added layers of work but in the end, the realization that you will be expected to share your data and code should eliminate some of the work we see in climate science. The coral reef paper on CA HERE is likely a good example of something which may not make the cut.

One item missing from these recommendations would have been the simplest and easiest of the bunch. A requirement that scientists provide software, code and turnkey instructions for operation in situations where it is realistic. Obviously some supercomputer code or sensitive info has to be excluded, however this simple requirement accomplishes the most important parts of the above eleven recommendations with no hassle to the scientist other than integrity and proper archiving.

Interestingly the NAS took its own advice and archived the entire report on line in a png graphic form HERE.

Thanks to Sera for providing the link to this article.


19 Responses to “NAS Provides Recommendations for Public Data Access”

  1. cogito said

    Those experienced in Quality Control know that recommendations are just recommendtions. They leave the doors wide open to excuses such as “I know I should, but I don’t have the time just now…”

  2. Jeff Id said

    I don’t have a clue how the government will utilize these recommendations. It would be interesting if some readers had experience with the next steps in other areas if they could comment on what to expect from this.

  3. Charlie said

    Nice guidelines. Whether they have an real world impact is open to question.

    The various US government agencies already have guidelines for the quality, integrity and utility of the information they disseminate. The Information Quality Guidelines of the various agencies such as NOAA and NASA are based upon “Section 515″, the resultant OMB (Office of Management and Budget)Quality of Information Guidelines. Also part of this same grouping is the OMB Peer Review Guidelines.

    The NAS Guidelines and the various Quality of Information Guidelines are only relevant if there is a way we can get compliance.

    I’m running a test case now with NASA where I’m trying to get them to clean up their act in regards to http://climate.nasa.gov/ and http://climate.nasa.gov/keyIndicators/index.cfm

    Since they dismissed and ignored my informal requests for correction submitted via the web feedback form, earlier this week I submitted a formal Request for Correction per the Section 515 Quality of Information Guidelines. We’ll see how it goes.

    Among other things, that webpage says that Arctic Sea Ice has been declining 38% per decade since 1979. It also says that the March 2009 Sea Ice Extent is 5.85 million sq km. (It was actually a bit over 15 million sq km).

    They also have the HADCRUT3 Global Mean Temperature plot, but the smoothed line is a perfect straight line for the last 3 years.

    Part of my request for correction is an assertion that HADCRUT3 is a highly influential scientific product that should be peer reviewed by NASA and other US government agencies that use it, using the OMB Peer Review Guidelines.

    Guidelines are only as worth anything when they are actually followed.

    I file this report in the same category as the OMB (Office of Management and Budget) recommendations on Peer Review of highly influential scientific information, and in the same category as the various Quality of Information Guidelines that government agencies have issued in response to Section 515 of Public Law 106-

  4. Charlie said

    Oops. Ignore the last paragraph of my previous comment — junk left over from editing.

  5. TAG said

    Those experienced in Quality Control know that recommendations are just recommendtions. They leave the doors wide open to excuses such as “I know I should, but I don’t have the time just now…”

    The ITU (International Telecommunications Union – now a UN agency) issues standards for the telecom industry. These standards are issued as recommendations and are labeled as such. However if someone wants to sell a device to a telecom company then they will be required by telecom company’s contact/PO to state that their device meets the appropriate recommendations. If not, then they do not sell product. These recommendations have the force of law.

    So if a funding agency requires that all projects that receive grant money demonstrate how they will meet the data recommendations then this could have the same effect. Of course if the granting agencies and journals, as they do now, do not care about their contract provisions then these recommendations will have no effect.

  6. I’ve turned those pictures into script again.

    Recommendation 1: Researchers should design and manage their projects so as to ensure the integrity of research data, adhering to the professional standards that distinguish scientific, engineering, and medical research both as a whole and as their particular fields of specialisation.
    Recommendation 2: Research institutions should ensure that every researcher receives appropriate training in the responsible conduct of research, including the proper management of research data in general and within the researcher’s field of specialization. Some research sponsors provide support for this training and for the development of training programs.
    Recommendation 3: The research enterprise and its stakeholders—research institutions, research sponsors, professional societies, journals, and individual researchers—should develop and disseminate professional standards for ensuring the integrity of research data and for ensuring adherence to these standards. In areas where standards differ between fields, it is important that differences be clearly defined and explained. Specific guidelines for data management may require reexamination and updating as technologies and research practices evolve.
    Recommendation 4: Research institutions, professional societies, and journals should ensure that the contributions of data professionals to research are appropriately recognized. In addition, research sponsors should acknowledge that financial support for data professionals is an appropriate component of research support in an increasing number of fields.
    Recommendation 5: All researchers should make research data, methods, and other information integral to their publicly reported results publicly accessible in a timely manner to allow verification of published findings and to enable other researchers to build on published results, except in unusual cases in which there are compelling reasons for not releasing data. In these cases, researchers should explain in a publicly accessible manner why the data are being withheld from release.
    Recommendation 6: In research fields that currently lack standards for sharing research data, such standards should be developed through a process that involves researchers, research institutions, research sponsors, professional societies, journals, representatives of other research fields, and representatives of public interest organizations, as appropriate for each particular field.
    Recommendation 7: Research institutions, research sponsors, professional societies, and journals should promote the sharing of research data through such means as publication policies, public recognition of outstanding data-sharing efforts, and funding.
    Recommendation 8: Research institutions should establish clear policies regarding the management of and access to research data and ensure that these policies are communicated to researchers. Institutional policies should cover the mutual responsibilities of researchers and the institution in cases in which access to data is requested or demanded by outside organizations or individuals.
    Recommendation 9: Researchers should establish data management plans at the beginning of each research project that include appropriate provisions far the stewardship of research data.
    Recommendation 10: As part of the development of standards for the management of digital data, research fields should develop guidelines for assessing the data being produced in that field and establish criteria for researchers about which data should be retained.
    Recommendation 11: Research institutions and research sponsors should study the needs for data stewardship by the researchers they employ and support. Working with researches and data professional, they should develop, support, and implement plans for meeting those needs.

    To me the important item is Recommendation 8 that concerns access by outsiders. When the results of the research determine global policy, then the science should be checkable by ordinary citizens and this rule should take precedence over any of the other above.

  7. well, at least, Climate Science material should be open to scientists from all disciplines (especially statistics and engineering), who understand Scientific Method and the handling of data, who can apply adequate statistics, and who appreciate the responsibility of getting it right (engineering quality).

  8. Sera said

    I have been in contact with a staff member at NAS for over a year concerning this project. I had pretty much given up on them and forgot about it- but in cleaning up some email last night I noticed the correspondence and decided to have another look. I was quite surprised to see that I was a month late and $43.50 short.

  9. Jeff Id said

    Fantastic, thanks Sera. Do you have anything on what you expect this to turn into down the road?

  10. Sera said

    The way I am reading this, the report calls on research fields to develop standards based on recommended principles, and gives some examples for how that can occur. The NAS is not going to impose a ‘uniform code’, and I believe that they are making a mistake. After the Somoli highjackings, the IMB (International Maritime Bureau) asked shippers to come up with their own solutions to the problem. This has been a disaster for the masters and their crew. I see the same thing happening here. If the NAS will not take the lead on this issue, then someone else will have to. This was my appeal to their authority over the past year. I guess that I was not very persuasive. Better luck next time…

    Kindest Regards,

    James Glendinning
    aka Sera, par5

  11. Page48 said

    I have a simpler approach.

    The data (and everything else that can be construed as property, intellectual or otherwise) evolving from all US grant funded science belongs to the people who paid for it – the tax payers.

    The above bullshit created by the NAS is just food for lawyers – nothing else.

  12. Page48 said

    Re: #11

    Forgot to add:

    Just think, if the Brits had been paying attention to & demanded the data accumulated by Hadcrud (intentional misspelling on my part) – for which they no doubt paid, the lost data might still exist somewhere!

  13. Sera said

    I think a major point is being missed in this discussion. The authors have been working on this for over 2 years, and the best they can come up with are recommendations? Really? I can come up with recommendations in less than a day. After a lot of time (2 years) and a lot of money (god knows), the authors should have come up with concrete guidelines for everybody. What the heck were they doing besides wasting time and money? The authors should be hammered for incompetence. This report is meaningless, and just shows everyone how toothless and incompetent the NAS has been on this issue. I would like to add that the staff at the NAS have been very gracious and polite, and they (staff) have answered all of my emails in a timely fashion- it is the authors I have a beef with.

    cc CA

  14. Page48 said

    RE #13,

    Didn’t you see my comments in #11 &#12 ????????????

  15. curious said

    FWIW – if the scientific journals had the will to do it much of the above NAS spiel would be sidestepped. As SteveM and others have been saying for a while they simply require all data and code that goes into a paper to be archived at time of publication. IMO this could be taken up by the journals as of now – responsible authors would do it even without the journal’s requests. I am not convinced that a big “standardisation and management” project will do anything other than possibly get in the way – for people who want to understand/check work if they are given the raw materials they will do the work.

  16. Sera said

    Page48 said
    August 21, 2009 at 6:35 am
    RE #13,

    Didn’t you see my comments in #11 &#12 ????????????

    Yes I did, but I am trying to emphasize that not everyone at the NAS is incompetent. The authors decided to go political instead of scientific- but the staff have always been courteous to me so I try to avoid blanket statements.

  17. Page48 said

    RE: #16

    Well, thanks, but apparently you didn’t get my point. How data is packaged for perusal isn’t up to the NAS, so all the little rule suggestions above are kind of silly. All the data from US grant funded science belongs to the people. Period.

    They can package it to the standards specified by the people, period, whether or not the people can understand one iota of the information or not.

    Scientists who wish control of their data ought to seek private funding.

  18. Jeff Id said

    I don’t agree that scientists should provide the data in a pre-specified format. It only needs to be a format others can use without proprietary software. Beyond that, let them choose.

    Scientists shouldn’t need to spend much time complying with government rules. Rather the rules need to be written in such a manner that very little is required to comply. Code, data and turnkey instruction. I saw several comments at CA which disagreed with the turnkey instruction but one point is that referees who cannot replicate a result with minimal time won’t do it.

    Simply requiring people to forward the code and data would clean a lot of the current climate science mess up. It makes scientists aware that they will be going on record to say, this is the data, this is the code, this is the answer and everyone will be able to see it.

    I wonder if when the team publishes a paper the first thing they do is check CA to see if it has been picked up. Even if it’s a good paper by climate science standards they probably hope it isn’t.

    Sera,

    You’re right that it’s not much and the recommendations read like mush. I didn’t realize they were funded for two years to do this. There isn’t much in this document, one thing I was concerned about is that under each recommendation is a discussion and the discussion at several points seems off on a tangent from the recommendation. Perhaps it was a case of too many people involved in a simple decision.

    Why people want more government is beyond my understanding.

  19. Sera said

    RE #18

    I agree. All I was asking for was a uniform code of conduct. Those who comply would have been ‘NAS Certified’, those who do not would be stigmatized as such. Of course they don’t have the authority to demand, but that is not the point. The NAS does have a good name, and they could throw it around if they wanted. I will be contacting the Royal Society- maybe they can throw their weight around. Maybe one day, the NAS will be certified by the Royal Society?!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 134 other followers

%d bloggers like this: