NAS Provides Recommendations for Public Data Access
Posted by Jeff Id on August 20, 2009
The National Academy of Sciences has released a pre-publication copy of a government funded project titled – Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. This is a quote from the executive summary.
The report recommends that all researchers receive appropriate training in the management of research data, and calls on researchers to make all research data, methods, and other information underlying results publicly accessible in a timely manner.
Of course it is a government program so the recommendation for adding personnel for training of all researchers fails to mention cost and becomes an automatic self expanding department like QC departments have become in manufacturing. Still, this article recognizes the key aspect of what Steve McIntyre has spent years now working on HERE as an example, access to the data and code behind research projects. Others may agree with NAS recommendations in general but in my opinion they go overboard adding layers of time and effort which could be otherwise better spent doing research when all they needed to do was insist that data and code with turnkey level instructions be archived and published with sources listed.
It reminds me of the International Space Station, if you’ve ever seen the nasa feed on that you may have noticed that the astronauts spend hours every day inventorying and cataloging the tools, foods and equipment by reading ten-ish digit hyphenated numbers on package labels back to ground control. It drives me nuts to think how much it just cost to inventory a set of pliers. Why not just barcode the damn things, or at least everything possible as any warehouse on earth would but that’s not the point. The point here is layers of work for researcher’s who are naturally disinclined to do such labor.
There are eleven recommendations in the summary, the free online format for the entire book is in png which is a non-vector graphic. If you want infinitely expandable graphics you can buy the pdf form. I had to expand the graphics in processing software to make them legible.
This is a key item of course. If data integrety is not maintained or software is not appropriately commented and archived the amount of knowledge gained from certain forms of research is quite limited.
Training, sounds good but this is a trojan horse in my opinion. Training implies trainers or professionals which like QC managers are in part dedicated to the expansion of their own field and department. The only training required would come automatically from exposure of the code and data to public eye.
This is all that needs to happen. The institutions should conform to a basic very simple rule that data and code be archived and accessible for everyone when possible. To that end they should set up a short list of their own procedures to follow the primary guidelines.
There it is , this trojan horse has a big window in the side. Research institutions should recognize the contributions of data professionals and should then fund the aforementioned data professionals who are those that make decisions about data release. Sounds perfect to me.
If verification were allowed in all cases this would be a huge step forward from where we’re at today. Consider that Steig et al took thousands of lines of high level code to verify and the best we’ve been able to do is come close. Ryan recently received raw AVHRR data which took about seven months from publication. The paper has probably been cited before replication and any rebuttal can be published because of this delay. Imagine what that would mean if it were done with intent.
Some discussion under #5 in the summary was related to standardized formats in a certain field which made the data less accessible. Certainly it is important to make data accessible in standard formats outside proprietary programs which often cost thousands of dollars but other than that, work it out yourself is by far the best option. Standardizing formats for data is a massive and unreasonable requirement for a field. It would be like telling a carpenter what brand and type of tool to use for every job. All researchers need are reasonable open source formats. After that, it’s up to you to work it out. The code needs to be in any software the researcher feels is best which NAS seems to recognize by staying away from that issue entirely.
This standard for sharing research data should be based on simplicity and a few bullet points followed by one paragraph descriptions. While having input from these organizations is important, the outcome from these adventures is often a random kludge. The NAS seems to put little thought into recommending simplicity.
It’s research, they keep the data as neatly as they can and report it at the end. Researchers have been doing this for centuries now to varying degrees of success. Exposure of the output is all that’s required to manage the quality of the input. Scientists need the freedom to think about research, to change direction when a new avenue opens itself and to record data as required based on their current understanding.It’s good to see the NAS is paying attention to the difficulties in data sharing. It’s a useful report which as you can see raises hairs on my neck with the potential for overstepping government organizations. Perhaps researchers in general will resist some of this due to the added layers of work but in the end, the realization that you will be expected to share your data and code should eliminate some of the work we see in climate science. The coral reef paper on CA HERE is likely a good example of something which may not make the cut.
One item missing from these recommendations would have been the simplest and easiest of the bunch. A requirement that scientists provide software, code and turnkey instructions for operation in situations where it is realistic. Obviously some supercomputer code or sensitive info has to be excluded, however this simple requirement accomplishes the most important parts of the above eleven recommendations with no hassle to the scientist other than integrity and proper archiving.
Interestingly the NAS took its own advice and archived the entire report on line in a png graphic form HERE.
Thanks to Sera for providing the link to this article.