Welcome to Issue 6 of the RDM Weekly Newsletter!
The content of this newsletter is divided into 3 categories:
☑️ What’s New in RDM?
These are resources that have come out within the last year or so
☑️ Oldies but Goodies
These are resources that came out over a year ago but continue to be excellent ones to refer to as needed
☑️ Just for Fun
A data management meme or other funny data management content
What’s New in RDM?
Resources from the past year
1. AI for the Skeptical Scholar: Practical Strategies for Using LLMs in Research
While this guide isn’t necessarily data management focused, I think there are definitely helpful data management pieces in here that are worth checking out. For instance, Chapter 8 provides a great overview of best practices and considerations when using AI for coding assistance, including recognizing when AI can be helpful (e.g., writing standard data manipulation and analysis code, translating between programming languages, or explaining what code does) and where you may need to be careful using AI (e.g., code that handles sensitive or proprietary data, domain-specific packages with limited training data).
2. Data Management Tips for Educators
In this blog post, Eric Ekholm summarizes his recent presentation at the 2025 MERC Summit where he acknowledges that educators are increasingly being asked to use data to make decisions, yet often do not have formal training on how to manage and work with that data. In this talk, Erik provides some quick tips that educators can apply in their data work and notice immediate improvements.
3. Unsupervised [Randomly Responding] Survey Bot Detection: In Search of High Classification Accuracy
While online survey data collection has become popular in the social sciences, there is a risk of data contamination by computer-generated random responses (i.e., bots). Bot prevalence poses a significant threat to data quality. If deterrence efforts fail or were not set up in advance, researchers can still attempt to detect bots already present in the data. In this preprint, authors study a recently developed algorithm to detect survey bots. In addition to the paper, authors include several helpful supplemental materials including HTML slide decks (named Part1 and Part2) from their workshop materials, and a Shiny app that allows you to experiment with selecting different cutoff values for an outlier statistic to try to detect random responders.
4. Population Research UK – Skills Development for Managing Longitudinal Data for Sharing
Launched in October 2023 and completed in March 2025, this project sought to address critical skills gaps by providing high-quality training for data managers working with longitudinal studies. Key outputs and resources from this project include an Introductory RDM course, a Train-the-Trainer Package, Data Management Reference Guides, Synthetic Data Workshop Materials, and a tool for harmonizing data. All training materials are openly available via the Zenodo repository under the PRUK community.
5. Data Quality: The Cornerstone of Effective Data Governance and Analytics
Whether you’re implementing a data governance program or pursuing advanced analytics, data quality serves as the foundation for success. In this blog post, Nigel D’Souza reviews why data quality is the cornerstone of data initiatives, reviews dimensions of data quality, and then provides a step-by-step guide for building a framework for ensuring that data quality is systematically managed and monitored across your projects.
6. Data Ownership & Sharing Decision Tree
This flow chart is a research tool to help researchers decide 1) if they own the data they wish to share and 2) whether there are other factors beyond ownership that impact data sharing. This tool is especially helpful for those considering whether they can share human participant data. The PDF also links out to important background information and guidelines in certain decision pathways.
Oldies but Goodies
Older resources that are still helpful
1. Nine Simple Ways to Make it Easier to (Re)Use Your Data
This article describes nine simple ways to make it easy to reuse the data that you share and also make it easier to work with it yourself. The recommendations focus on making your data understandable, easy to analyze, and readily available to the wider community of scientists. One of my favorite parts of this article is the discussion around using good null values, a topic that I find there is little agreement on. Table 1 is an excellent table to refer to when deciding what null values to use in your data.
2. Do No Harm Guide: Applying Equity Awareness in Data Privacy Methods
Researchers and organizations can increase privacy in datasets through methods such as aggregating, suppressing, or substituting random values. But these means of protecting individuals’ information do not always equally affect the groups of people represented in the data. In this guide, published by the Urban Institute, the authors completed a literature review of equity-focused work in statistical data privacy (SDP) and conducted interviews with nine experts on privacy-preserving methods and data sharing. This guide summarizes findings that can help advance equity in SDP.
3. Regular Expressions and Working with Text Strings
In this R-Ladies St. Louis recorded meetup, Luis D. Verde Arregoitia teaches us the building blocks of regular expressions and how we can use them in R to work with strings. Slides for the talk are also available online, and its worth noting that on slide 17, Luis provides links to several helpful online regex testers.
4. The Quartz Guide to Bad Data
This guide is an exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them. Originally created to help journalists identify problems with data they report on, this guide is also applicable to other fields. You’ll find potential solutions for problems ranging from missing values, to data stored in PDFs, to inexplicable outliers. Thank you to Jenny Bryan for recently sharing this resource.
Just for Fun
** Quick note: Next week’s newsletter (RDM Weekly Issue 7) will be out on Thursday rather than Tuesday!
Thank you for checking out the RDM Weekly Newsletter! If you enjoy this content, please like, comment, or share this post! You can also support this work through Buy Me A Coffee.