RDM Weekly - Issue 007

A weekly roundup of Research Data Management resources.

RDM Weekly

Aug 07, 2025

Welcome to Issue 7 of the RDM Weekly Newsletter!

The content of this newsletter is divided into 3 categories:

☑️ What’s New in RDM?

These are resources that have come out within the last year or so

☑️ Oldies but Goodies

These are resources that came out over a year ago but continue to be excellent ones to refer to as needed

☑️ Just for Fun

A data management meme or other funny data management content

What’s New in RDM?

Resources from the past year

1. Open With Care! Consent, Context, and Co-production in Open Qualitative Research

In this preprint, the author articulates concerns about the uncritical adoption of open science principles, particularly data sharing mandates, within qualitative research. They contend that, when imposed as default, opening qualitative inquiry risks enacting harm through three key pathways: the commodification of qualitative data (including via AI tools), the decontextualization and objectification of narratives, and the erosion of participant trust and consent.

2. Making {messy} Data

Equipping students in statistics and data science with the necessary data wrangling skills to handle real-world data is a crucial aspect of their education. Real data, unlike the clean, structured examples often used in teaching, can include a variety of challenges such as typographical errors, missing values encoded in unconventional ways, or unexpected spaces in text. In these slides, Nicola Rennie introduces the messy R package designed to introduce controlled levels of messiness into existing, clean datasets. It retains the structure of familiar example datasets while providing students with a realistic, manageable data cleaning experience.

3. De-Identification When Making Data Sets Findable, Accessible, Interoperable, and Reusable (FAIR): Two Worked Examples From the Behavioral and Social Sciences

Navigating the balance between protecting participants’ privacy and making one’s data set as open as possible can be challenging for researchers. In this article, the authors provide two worked examples with real data sets from the behavioral and social sciences on how to be as open as possible and as closed as necessary with the goal of maximally facilitating science while minimizing the risk of participant identification. As part of the article’s supplemental materials, you can also find a deidentification guide as a standalone document on OSF.

4. Doing Data Analysis with AI

This course guide equips students, who are already versed in core data analysis methods, with experience to harness AI technologies to improve productivity. The course focuses on using large language models (LLMs) to carry out tasks in data analysis and includes data management topics such as data extraction and wrangling, data exploration, and descriptive statistics. The course material includes weekly practice assignments and links to recommended material beyond what is in the course.

5. Opening Open Science to All: Demystifying Reproducibility and Transparency Practices in Linguistic Research

By and large, open science practices have not been adopted in the field of linguistics. Few, if any, researchers have had explicit instruction on the practices of open science as part of their professional training. Nonetheless, today’s speech researcher is expected to be up to date on the current protocols of open science in order to incorporate the methodological practices aimed at improving reproducibility/replicability. This study outlines eight specific open science practices that linguists can adopt to make their research more open, transparent, inclusive, and accessible to a wider audience.

6. Diversifying Professional Roles in Data Science

The interdisciplinary nature of the data science workforce extends beyond the traditional notion of a "data scientist." A successful data science team requires a wide range of technical expertise, domain knowledge and leadership capabilities. To strengthen such a team-based approach, this note recommends that institutions, funders and policymakers invest in developing and professionalizing diverse roles (e.g., data wranglers, research software engineers, and data stewards), fostering a resilient data science ecosystem for the future.

Oldies but Goodies

Older resources that are still helpful

1. Column Names as Contracts

This blog post, written 5 years ago by Emily Riederer, continues to have a profound impact on how I work with data today. In this post, Emily suggests that using “controlled vocabularies” for column names is a low-tech, low-friction approach to building a shared understanding of how each field in a data set is intended to work. She then proceeds to introduce the concept with an example and demonstrates how controlled vocabularies can offer lightweight solutions to rote data validation, discoverability, and wrangling, illustrating benefits with several R packages, as well as other languages. Under “Updates” she also provides a very helpful concept map to further illustrate the concepts.

2. Resources for Randomized Evaluations

This J-PAL resource is a goldmine of data management (and general project management) information for anyone working with RCTs. Not only is this a guide for running RCTs from start to finish, it also provides links to templates, checklists, example Stata code, and more!

3. Avoiding the Oh Crap! Moment: Data Security in Education Research

In this slide deck, Dorothea Salo provides a “threat model” for considering what and whom we are securing research data against, and why we are securing it, especially when working with human subjects data. She then offers guidance on how to get help when developing your data security plan.

4. TIER Protocol 4.0

The TIER Protocol is a template for what contents should exist in a project folder, and how those files should be organized, in order for statistical computations to be reproducible. Documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.

Just for Fun

A four tile image of Anakin and Padme from Star Wars with the following text: "I'm collecting dates in my survey". "You restricted entries to YYYY-MM-DD, right?" "You didn't leave it an open text field did you?" — Image from https://github.com/Cghlewis/datamgmt_memes

Thank you for checking out the RDM Weekly Newsletter! If you enjoy this content, please like, comment, or share this post! You can also support this work through Buy Me A Coffee.

Buy Me A Coffee