Mastering Data Citation: Insights from the BioDT Research Infrastructures

BioDT webinar report
12 February 2024

Data citation is a fundamental step in promoting transparency, credibility, and collaboration in the field of biodiversity research. It ensures that the hard work of data collectors and curators is recognised and facilitates the sharing and integration of valuable biodiversity datasets. Given the crucial role played by data citation in the research field, the BioDT teams have orchestrated two webinars on 6 and 20 November 2023. The report presents main outputs of the events.

The webinars aimed to highlight critical aspects of recognising and crediting biodiversity research datasets, offering best practices for standardised dataset citation. This initiative is designed to enhance data discoverability, promote reusability, and duly acknowledge the contributions of those responsible for collecting and managing the data.

GBIF: Boosting data citation with DOIs

The first session of the webinar series explored effective data citation practices employed by the Global Biodiversity Information Facility (GBIF). GBIF ensures that the use of its mediated data is documented for impact assessment. Data citations play a crucial role in acknowledging the effort invested in compiling biodiversity datasets. The adoption of digital object identifiers (DOIs) for citations is presented.

DataCite, founded in 2009, serves as a DOI Registration Agency for datasets, with GBIF as a member. In 2014, FORCE11 developed principles of data citation, emphasizing data’s legitimacy and importance in scholarly records. GBIF initiated DOI assignment for all datasets in 2015, aligning with these principles. In 2018, publishers formulated a data citation roadmap.

Citing data not only acknowledges its importance but also credits data holders and publishers, offering feedback on the data’s utility. Occurrence data, often accessed through GBIF, is assigned unique DOIs upon download. Publishers can track data usage, enhancing transparency for reproducibility.

For users obtaining data through means other than GBIF’s website, GBIF introduced the derived dataset concept. Users define metadata for derived datasets, enabling citation. GBIF encourages citation in the Materials and Methods section or Data Availability Statement, enhancing discoverability and logging. 

Despite DOIs being assigned since 2015, the adoption of citing GBIF data has been slow. An automated workflow, initiated in 2017, prompts authors failing to cite GBIF data. Currently, approximately 60% of papers
logged by GBIF use DOIs in citations.

DiSSCo: Biodiversity data citation with digital specimens

DiSSCo is another research infrastructure in the biodiversity realm. Its proposed approach involves thinking about citation issues holistically, with the FAIR Digital Object concept acting as a framework. This approach encourages thinking about data and metadata together in a machine-actionable way, with a particular focus on digital specimens. Within DiSSCo, the digital specimen is a digital surrogate of the physical specimen, providing a comprehensive view of attribution, recognition, and seamlessly linking metadata and data.

DiSSCo adopts DOIs (Digital Object Identifiers) for digital specimens, aligning with established identifier ecosystems commonly used for scholarly publications and datasets as well. DOIs can serve as key entry points, linking to metadata and providing a standardised way of citing digital specimens in publications.

Moreover, DiSSCo’s vision extends beyond datasets to include the citation of various elements within the biodiversity research ecosystem to emphasise the importance of recognising contributors beyond the
primary scientist, including field guides, lab workers, and illustrators.

eLTER: An emerging research infrastructure committed to long-term ecosystem research

eLTER represents a nascent but vital initiative conducting comprehensive research spanning the realms of atmosphere, biosphere, hydrosphere, and the socio-ecological sphere. This broad scope encompasses the impact of human influence on ecosystems, conducting in situ research, and deploying field facilities to measure,
sample, and engage in hands-on activities.

Despite eLTER being in its developmental phase, with ongoing construction and not fully operational, this Research Infrastructure has profound implications for data citation.

While eLTER does not issue its own DOIs (Digital Object Identifiers) yet, it has formulated initial policies and best practices. These guidelines are disseminated to members and affiliate networks, emphasising the necessity of being exemplary models and aligning with emerging research infrastructure standards.

Key aspects of eLTER’s data citation framework include the encouragement of persistent identifiers (PIDs) wherever feasible, advocating for the publication of data papers, and issuing PIDs for sites and locations.

Moreover, eLTER’s data citation schema is composed of various components. One such component, DEIMS-SDR, serves as a central registry offering information about sites, measurements, properties, geographic details, and contact information. DEIMS issues unique, persistent identifiers called “DEIMS.IDs,” suitable for referencing reports, papers, and datasets, remaining active even after the site’s closure.

Download the report