Data Storage and Data Management
What is data management?
Research data management has to do with managing your research data before, during and after your research project, i.e. handling your data responsibly from research-related, technical, legal and ethical perspectives throughout their life cycle. Specifically, this means that within the scope of your research project, you:
- Store and secure data so that you can find them again reliably and prevent data loss
- Establish a practice for how your data are organized and structured, described and documented
- Formulate and comply with a policy regarding how data are to be accessed and, if appropriate, shared
- Have a plan for how to save (or destroy) your data
- Make your data citeable and reuseable, if they are to be published.
The Danish Code of Conduct for Research Integrity (2014) states that ‘Data and primary materials should be retained, stored and managed in a clear and accurate form’.
Research data management is essential to good research practice because it ensures that data can easily be retrieved for reproduction and/or verification of the results. As such, data management activities form a natural part of the research process.
Why use data management?
The short answer is: ‘Because good research needs good data’. To date, it has been the case that publications count in the academic world, while datasets are generally accorded the status of ‘second-class citizens’. The established world order is changing, however, with Open Source, Open Access and Open Data all phenomena that are clearly on the way up—so the value of good data habits cannot be over-emphasized. Good data management demands planning ‘up front’—i.e. a fair investment of time—but it does mean that:
- You secure your data so that they, and the time you spend on them, is not wasted and that your data security is sound
- You increase the chances of making the very most of your data—data that you can (easily) retrieve are data that you can more easily reuse in new research projects
- You give your research efficiency a boost—well-organized and documented data are quite simply easier to navigate and to work with as your project progresses
- You will preempt accusations of ‘research misconduct’—if you ensure transparency regarding the data you have collected/generated, when and where, and provide access to them if necessary, this will make it harder to question the quality of your research
- You will have the opportunity to obtain credit for generating data—datasets made accessible and supplied with a permanent identifier (e.g. DOI) can be cited, and secure a link between you as a researcher, your publication and your data (published via a ‘data paper’ if appropriate)
- You live up to expectations/requirements from institutions, sources of funding, project partners and journals—more and more European universities now operate data management policies, and this may apply to your coming project partners. Grants through the EU’s Horizon 2020 Open Research Data Pilot demand data management/data sharing plans (see below) with regard to other researchers, the business community and the general public, and some journals additionally require access to data as a precondition for peer review (e.g. Nature, PLOS One).
- You will have greater opportunities to share data—so others can learn from them, avoid mirroring your work and, if appropriate, use them in completely new research contexts.
Taking the broader view, research data are valuable assets for universities, sources of funding and society in general, and data that cannot be found—either because they have been lost, or are currently ‘gathering dust’ on a zip drive somewhere—are a direct loss of investments and opportunities.
The expectation is that the storage, accessibility and further use of data will boost the efficiency of the research, legitimize the science, reinforce interdisciplinarity and open up new research questions (Free access to research data).
What is a data management plan (DMP)?
There are numerous variations of this type of plan, but generally speaking, it is a document that you prepare as early as the planning phase of your research project, describing aspects such as:
- Who is responsible for data management in your project
- Which types of data you collect, generate and/or use—and how
- How much space your data take up, and where you locate them
- How you organize and document data (metadata), and what is required (software, etc.) to interpret them
- How you secure data
- Who owns the data
- Any special conditions concerning confidential/personally traceable/commercializable/purchased data
- How data are shared as the project progresses and, if they are shared, who has access to them
- Which data may be made available on conclusion of the project, and if they are, how
- For how long and where data are to be stored after conclusion of the project/or which data—if any—are to be deleted
The data management plan is a dynamic document that should be updated whenever significant changes are made in the project; otherwise, the document will lose its value.
Why prepare a data management plan (DMP)?
You may already have considered many of the factors listed above, but writing them down will help you to formalize the process, document your needs and procedures in relation to effective and responsible data handling, and assist you in identifying any weaknesses in your plan.
How can I prepare a DMP?
There are plenty of options for assistance in setting up a data management plan, including:
- Online tools that allow you to set up an account, prepare your DMP and then update it on an ongoing basis. For example, you could use DMPonline, which features bespoke templates for many sources of financing (primarily in the UK and USA), as well as for H2020 Open Data Pilot
On Data Management Plans, you will find a long list of tools, guides, etc. that you can use for inspiration. To date, most of the resources on this topic are primarily in English given that Australia, Britain and the United States are some of the countries that are farthest advanced with regard to research data management.
Where should I save data?
If your university operates a solution that allows you to manage and perhaps even share your data efficiently during and after your research project, you should use it.
DeiC data—an online data storage and synchronization service for securely sharing and working with active research data, and for storing large datasets, to which you log on using your departmental login details—is another option. Free and unlimited cloud storage for researchers and students at Danish universities. At European level, Zenodo is an interesting option as well. Developed by CERN under the EU FP7 project OpenAIREplus. This allows you to share, store and showcase your research results (both data and publications) and license them under Creative Commons.
One of the most complete overviews of discipline-specific solutions at global level (what is known as a ‘global registry of research data repositories’) is to be found at re3data. But please remember that cloud solutions, where you—as the data supervisor—CANNOT obtain a guarantee that the data will be stored on European soil can be problematic in the context of the EU data protection directive; it is not permitted to process personal data (sensitive and non-sensitive alike) in countries outside the EU or companies in the United States that are not linked to the Safe Harbor scheme. Even though these conditions may have been fulfilled, the solution may still be problematic in the context of the security regulations laid down in Danish legislation (cf. decisions from the Danish Data Protection Agency). So think carefully before sending research data to international cloud storage services (Dropbox, Google, Amazon, Microsoft, etc.). You can also spend some of your research project budget on a ‘DIY’/DUDD (Distributed Data Centre Under the Desk) solution. This is a more-or-less advanced set-up primarily consisting of network hard drives that appears enticingly inexpensive, but before diving into such a solution, it is a good idea to calculate the Total Cost of Ownership (see a case calculation) and to estimate how much a loss of data may cost you (a real-world example).
Remember the librarian’s mottos: Lots Of Copies Keeps Stuff Safe. Two copies on site, one copy in a different geographical location.
Special considerations concerning personal data
Some areas of research such as the health sciences and biomedical research are regulated separately and stringently, but for all types of research, you must pay particular attention to your obligations when collecting, processing and/or sharing personal data. If you work with research data that contain personal information, you must comply with the principles laid down in the Danish Act on Processing Personal Data (Persondataloven).
- Best practise for naming files and versions
- Best practice for choosing formats
- A simple approach to metadata
- The Danish Act on Processing Personal Data (Persondataloven) in brief—see, for example The Danish Data Protection Agency and The Researcher Portal
With regard to the national strategy for data management and the organization of the national effort click here.
For additional information, please contact:
Hannah Mihai Project Manager, DeiC