One aspect of open research is releasing your data openly, and also learning to use the data of others. Many funders will now mandate that data from projects they fund is released under an open licence. This benefits the community as a whole as it allows comparison between projects and also leads to researchers providing different analyses of data that may not have occurred to the original researchers. It is not without its issues however. Firstly, as a researcher you will want to ensure that you have conducted the analysis and written any articles form the work before you release it, so that you do not find yourself in the position of someone else writing up your work. Equally, the data itself should be seen in the same light as a research publication; you do not want to wait too long before making it available to others. Knowing when to release the data is important.
Another issue is ensuring that data is sufficiently anonymised. When you have decided from the outset of a research project that data you collect will be released openly, protecting participants’ identities becomes a priority. In the OER Research Hub we collected over 7,000 survey responses from educators, librarians, formal and informal learners, from 175 different countries, at K-12, community college and higher education levels. This was using the survey bank of questions, which is also openly available. The survey data can be accessed on Figshare in .csv or Excel file formats under a CC-BY license: you are free to download it, add more data to it, carry out a different analysis, etc., with acknowledgement. How has this file been anonymised? Removing information on respondents’ gender, age or country of origin, for instance, would have also eliminated an important pathway into the analysis, so we retained these variables but deleted IP addresses and contributions to open-ended questions.
If you intend others to use your data, then you will need to add as much metadata as possible; tags will help others find your shared files, but consider too including information about how the dataset originated, what analyses have been carried out already, etc.