Title of this year DataCite 2012 Summer Meeting, in Copenhagen, is "DIGITAL RESEARCH DATA IN PRACTICE:
solutions for improving discovery, access and use". Here some key messages from the event (for more see also tweets at: #datacite, #biosharing, #gigascience)
Chair: Lee-Ann Coleman, British Library
Keynote presentation: Jonathan Grant, President of RAND Europe,
The science of science.
Keynote presentation: Jonathan Grant, President of RAND Europe,
The science of science.
- The new science paradigm is based on the 4 As: Advocacy (make the case for research funding); Accountability (to taxpayers and donors); Analysis (what works in research), and Allocation (what to fund: institution, domains, people);
- We must move from advocacy to accountability and we need practical evidence for science policy
- We must analyses what works also to speed things up: e.g., the time lag between spending on research and health gain is 17 years!
- Survey devised that works to capture the impacts that have arisen from research grant
Session 1: Discovery: It’s all
about the metadata? Or is it?
Chair:
Jan Brase, TIB
- The global, federated infrastructure for sharing biodiversity datasets has now over 327 million records;
- This community has embraced the concept of 'data papers' and over 70 are in the pipeline and will be published in 6 different (Pensoft) journals;
- But beyond datasets the need exists for creating persistent identifiers also for specimens, sequences, taxon names etc.;
- Data usage index is needed for publishers, datasets, thematic and country;
- More in "Data publishing framework for primary biodiversity data" a thematic review in BMC Bioinformatics (2011)
Andrew
Treloar, ANDS,
Seeking
Serendipity: repurposing DataCite metadata to augment ANDS discovery.
- Data a a first-class object: from unstructured, disconnected, invisible, single use data to managed, connected, findable and reusable data;
- Research Data Australia supports creation or search of research datasets, collections, projects and organizations - gradually adding functionality.
Eefke
Smit, STM Association, Data and
Publications; and how they belong together.
- Deposition of datasets in archives continue to grow, surpassing journal articles in biomedical sciences;
- The data publication pyramid: 75% of research data is never made openly available, too many disciples still lacks community endorsed archive!
- STM and DataCite have just launched a new statement to: data must be deposited in trustworthy repositories; databases must also have links back to the publication(s); support for creation of best practices for citation of datasets; invitation to sign the statement (link to the statement soon; it was just signed live at this meeting by Eefke and Adam Farquhar, President, DataCite!)
Session 2: Access: understanding
technical, legal or ethical barriers to access
Chair:
Brigitte Hausstein, GESIS
Matthew
Woollard, UKDA,
Persistent identifiers
in practice. The UK Data Archive's approach.
- The importance of recoding changes: approx 15% of the (social science) data collected is altered within the first year.
Michael
Wilson, STFC,
Meeting a
scientific facility provider's duty to maximise the value of data.
- Defends patents on innovation derived from science: this may require producing data sets over 20 years earlier!
- Capturing automatically the facility lifetimes (via the ICAT Metadata Catalogue): from submission of the proposals to the publication of the results;
- Currently DOIs are assigned at the higher entity, but not at data file (individual record) level, but it maybe needed soon;
- Even if only <1% if the data is commercialized, still unsolved remains the issue of what to publish data / how long the embargo should be;
- FP7 ENSURE project works to extend the state-of-the-art in digital preservation.
Sunje
Dallmeier-Tiessen, CERN,
DataCite
& INSPIRE: facilitating data preservation and reuse in High-Energy Physics.
- In the High Energy Physics (HEP) projects, the discussion is about the levels of data description, and where it should be preserved, when associated to data publications;
- INSPIRE has 50k users and 1 million record in collaboration with the Durham HepData Project, in UK
Session 3: Different flavours of
use
Chair:
Herbert Gruttemeier, INIST
Scott
Edmonds, GigaScience,
BGI Shenzhen,
Adventures
in Data Citation.
- Tackling the long tail of curation - democratization of big data; challenges with compliance to community standards; lack of standards interoperability across;
- GigaScience, a joint venture between BMC and BGI, with an associated data hosting platform: GigaDB;
- GigaScience issue 1 is due in in July 2012 - datasets description formatted in ISA-Tab;
- E. coli #crowdsourcing: the first tweetome!
- Data citation is still failing (e.g., Google Scholar does not takes 'data publication' in account) and should be improved;
- Minor quibbles: exports to citation managers: rules for versioning and to set granularity (e.g., citing papers vs micropublications).
Jean-François Perrin, ILL,
DOI usage: a large neutron facility.
- ILL raw data is available online since 1973, and released immediately after the experiment ends;
- But it is necessary to add experimental metadata that help in the interpretation of raw data file, also collecting it automatically, where possible, and link all to the publication;
- Pandata works to federate data infrastructure for synchrotron and neutron sources;
- Standardizing the format is the first step.
Susanna
Sansone, University of Oxford,
The ISA
Commons - experiences from the field; link to my presentation.
- Shared data have little or no value if they are not interpretable and, consequently, reusable; see the example provided by the work of Ioannidis et al., Nature Genetics, 2009;
- Say no to ‘data blobs', yes to verifiable, complete and structured information!
- Importance of capture all salient features of the experimental workflow, making the annotation explicit and discoverable: not too much, not too little, just 'right';
- Many community norms and standards: lack of coordination, fragmentation and uneven coverage - see list at BioSharing;
- ISA-Tab a general purpose experimental metadata tracking framework used by a growing number of communities in several biosciences domains, more in Sansone et al., Nature Genetics, 2012.
Closing remarks – Adam
Farquhar, President, DataCite.

No comments:
Post a Comment