RDM Weekly - Issue 037
A weekly roundup of Research Data Management resources.
Welcome to Issue 37 of the RDM Weekly Newsletter!
The content of this newsletter is divided into 4 categories:
✅ What’s New in RDM?
These are resources that have come out within the last year or so
✅ Oldies but Goodies
These are resources that came out over a year ago but continue to be excellent ones to refer to as needed
✅ Research Data Management Job Opportunities
Research data management related job opportunities that I have come across in the past week
✅ Just for Fun
A data management meme or other funny data management content
What’s New in RDM?
Resources from the past year
1. Drowning in Data Sets? Here’s How to Cut Them Down to Size
Disciplines such as astronomy and the Earth and biological sciences have long grappled with unwieldy data sets. As the volume, processing speed and variety of data continue to grow, the storage capacity is struggling to keep pace. At the same time, the boom in machine-learning and artificial-intelligence technologies is creating an incentive to hoard information. But unconstrained data retention is not financially viable and uses a great deal of energy. This Nature article reviews ways to give your data maximal long-term value. The PDF version is accessible here.
2. Neuroscientists Challenge NIH’s Proposed Human-Data Access Policy
The neuroimaging community is pushing back against a new U.S. policy proposal that would require federally funded researchers to share data only through controlled-access repositories. Under the policy put forward in December by the U.S. National Institutes of Health, researchers who share human genomic, epigenomic, proteomic or transcriptomic data or imaging data of the face or head must review all data access requests, authenticate the identity of the requesters, use stringent security standards and restrict access for requesters from a list of countries outlined by the Department of Justice in January 2025. The goal is “to promote maximal responsible human participant data sharing through controlled access while simultaneously responding to emergent privacy and security risks,” according to the NIH’s request for public input on the proposal, which closed last week. Neuroscientists are raising multiple concerns about the suggested policy, which are reviewed in this article.
3. Grassroots Efforts to Bring Open Science to the Field of Education Research
At its core, open science is about individuals coming together with a shared commitment to improving research practices and fostering a more transparent and collaborative scientific culture. The goal is to create research that is freely accessible, more trustworthy, and ultimately more useful to the broader community. Across fields, grassroots movements have played a key role in advancing open science, with notable efforts in fields such as psychology, communication sciences, and environmental science. Education and developmental science have seen similar efforts, led by individuals and groups working to promote more open practices. This post highlights some of the past and ongoing efforts to raise awareness and strengthen open science practices in Education Research.
4. Ten Simple Rules for Effective Research Data Management
Advances in information technology, digitalization, database volume, the internet, high-throughput measurement technology, and artificial intelligence (AI) have profoundly transformed research. In the 20th century, it was common for a study or an experiment to yield one single file (e.g., a table). Today, many research projects yield many files, often created by multiple collaborators and are often valuable for secondary use. Furthermore, scientific knowledge is currently generated not only through hypothesis-driven statistical inference but also using (un)supervised data mining and AI techniques applied to existing resources. These developments require effective research data management (RDM) at both project and institutional level. Several publications of the “Ten Simple Rules” series offer guidance on RDM subdomains. However, they focus on project-related RDM topics. Thus, this Ten Simple Rules for Effective Research Data Management provides a condensed reference of significant RDM topics applicable at higher organizational levels, arranged in a logical sequence corresponding to the research data life cycle. The rules are derived from the authors’ diverse expertise in statistics, genomics, bioinformatics, public health, epidemiology, and research data management consulting. They may serve as a reference for institutions, researchers or professionals regardless of field or career level.
5. Why Open Data Matters
In this guest post for the NHS-R/Open Analytics community, Mattia Ficarelli shares what open data is, why open data matters in healthcare and provides Ten Guiding Principles for Publishing Open Data.
6. Understanding and Improving Data Repurposing
We live in an age of unprecedented opportunities to use existing data for tasks not anticipated when those data were collected, resulting in widespread data repurposing. This commentary defines and maps the scope of data repurposing to highlight its importance for organizations and society and the need to study data repurposing as a frontier of data management. The authors explain how repurposing differs from original data use and data reuse and then develop a framework for data repurposing consisting of concepts and activities for adapting existing data to new tasks. The framework and its implications are illustrated using two examples of repurposing, one in healthcare and one in citizen science. The authors conclude by suggesting opportunities for research to better understand data repurposing and enable more effective data repurposing practices.
7. Merges and Joins: From SQL to Stata
This guide provides an overview of data joins, including foundational information regarding keys. The article emphasizes the need to clearly define merge keys, understand dataset relationships, and validate results with diagnostics. The article then reviews different types of joins and provides example code in various languages. Last, the article discusses issues with a specific Stata command which may create results unexpected results that are not reproducible.
Oldies but Goodies
Older resources that are still helpful
1. CSV Schema Validation
In this article, David Ayres walks us through the pros and cons of the CSV format, with one of the major cons being trust. The provider of the file has to be very strict on how they generate their file. A CSV also cannot hold metadata, and lacks any sort of schema to validate data against. This article reviews existing attempts that have been made to standardize a CSV schema, and one idea the author has for a CSV schema to make consuming these files easier.
2. FORRT Open Research Games Portal
Discover 46 educational games that teach open science practices through interactive gameplay. Examples of available games include “Open Science Against Humanity”, “LEGO Metadata for Reproducibility”, “Data Horror Escape Room”, “Research Data Management Adventure”, “The Publish or Perish Game”, and more!
3. Toronto Workshop on Reproducibility - Sharla Gelfand
Getting stuck, looking around for a solution, and eventually asking for help is an inevitable and constant aspect of being a programmer. If you've ever looked up a question only to find some brave soul getting torn apart on Stack Overflow for not providing a minimum working example, you know it's also one of the most intimidating parts! A minimum working example, or a reproducible example as it's more often called in the R world, is one of the best ways to get help with your code. In this 2021 Toronto Workshop on Reproducibility, Sharla Gelfand covers what components are needed to make a good reproducible example to maximize your ability to get help (and to help yourself!), strategies for coming up with an example and testing its reproducibility, and why you should care about making one. Slides from the talk can be found here.
4. Data Sharing and Data Shared
These slides from Jessica Logan are from a presentation given at the Society for the Scientific Study of Reading conference in July 2021. Data sharing is relatively new to the fields of education and developmental science, but it is becoming increasingly more in demand. In this talk, Jessica describes some steps to take to prepare your data for sharing, describe a data repository, and how you can find data that have already been shared.
Research Data Management Job Opportunities
These are data management job opportunities that I have seen posted in the last week. I have no affiliation with these organizations.
Just for Fun
Thank you for reading! If you enjoy this content, please like, comment, or share this post! You can also support this work through Buy Me A Coffee.


