RDM Weekly - Issue 030
A weekly roundup of Research Data Management resources.
Welcome to Issue 30 of the RDM Weekly Newsletter!
If you are new to RDM Weekly, the content of this newsletter is divided into 4 categories:
✅ What’s New in RDM?
These are resources that have come out within the last year or so
✅ Oldies but Goodies
These are resources that came out over a year ago but continue to be excellent ones to refer to as needed
✅ Research Data Management Job Opportunities
Research data management related job opportunities that I have come across in the past week. I have no affiliation with these jobs.
✅ Just for Fun
A data management meme or other funny data management content
What’s New in RDM?
Resources from the past year
1. From Recommendation to Action: A New Tool for Measuring Metadata Completeness
This blog post provides an overview of a new collaboration between GREI (Generalist Repository Ecosystem Initiative) and Metadata Game Changers where they have developed a strategy and accompanying toolset to help any repository visualize, measure, and enhance metadata completeness.
2. Digital Defense: Library Cybersecurity Webinar Series
Recently shared by Sherpa Intelligence, ByWater Solutions is partnering with Novare Library Services to provide this webinar series in Spring 2026. This will be a six-part series featuring a variety of guest presenters speaking on a range of cybersecurity topics. While the sessions are library focused, much of the content seems useful for anyone concerned about data security and privacy. These sessions are open to anyone, anywhere and will be recorded. Each webinar session title is linked to the Zoom free registration page.
3. Dealing with Bots: Advice for Managers of Open Access Repositories (COAR)
Shared by the Digital Repository of Ireland, an earlier COAR report on The impact of AI bots and crawlers on open repositories (2025) found that most repositories were experiencing a significant increase in traffic from bots. The impact of bot traffic ranged from regular service slow downs to some instances of major service outages. ‘Bad’ bots (those engaging in data scraping, brute force login attempts, scalping, DoS, spamming, etc.) gathering data for generative AI training seems to be the main source of the increase in traffic. A new resource from COAR covers both technical and social strategies that can be implemented by repositories of any size, from Terms of Service agreements to the deployment of firewalls and rate-limiting strategies, and will be a useful starting point for conversations with IT, data and policy specialists.
4. A Day in the Life of a Research Data Head: Making your Research Future-Proof
How do you build a career around other people’s data? Not by hoarding, selling, or turning it into yet another app—but by quietly making sure it is findable, accessible, interoperable, reusable, and still understandable ten years from now. In a world where “data-driven” has become a reflex more than a choice, there is now an entire ecosystem of people whose job is to help researchers do data well. This post provide readers a glimpse into a day in the life of Dr. Yan Wang, Head of the Research Data & Software team at the TU Delft Library.
5. Please Switch to Python (Or R. Or Anything. Just Not Stata, SAS, SPSS, or MATLAB)
Stata, SAS, SPSS, and MATLAB are proprietary tools for statistical and mathematical analysis. You write code in them, though they have point-and-click interfaces as well. They’ve been around for decades, they’re taught in graduate programs, and a lot of people use them for analytical work. In this post, Abigail Haddad provides a compelling argument for why you should switch from those programs to Python (or other open-source, general-purpose programming languages such as R, JavaScript, etc.).
6. Mapping Public Open Access K-12 State Education Indicator Data Across 7 States and Washington D.C. Using the FAIR Data Principles
Despite longstanding calls and mandates to leverage public education data for informed decision-making, data reuse remains limited due in part to poor adherence to the FAIR data management principles. To address this challenge, the authors systematically identified and qualitatively coded public education datasets across the eight jurisdictions. The report includes the full metadata catalogs, Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagrams visualizing data sources, network visualizations of data-indicator relationships, and data ingestion protocols and coding schemes. This report provides both a replicable methodological framework and a practical resource for researchers, policymakers, and educators seeking to navigate public education data for decision-making focused on opportunities and achievement for all students.
7. What Emotions Bring to Managing, Caring for, and Sharing Qualitative Data
This paper presents the findings of a collaborative autoethnography designed to deeply explore how qualitative researchers relate to and care for their data. Results show that the emotions experienced when working with data emerged as being integral to the authors’ (responsible) research practices. These, often negative, emotions are intertwined with three ways in which authors care for data: i) as a means of caring for research participants; ii) caring for data maintenance and infrastructure; and iii) caring for data’s quality and usefulness. The emotions and caring relations authors identify are often in tension with common expectations for data sharing.
Oldies but Goodies
Older resources that are still helpful
1. Larger-Than-Memory Data Workflows with Apache Arrow
In this UseR 2022 tutorial you will learn how to use the arrow R package to create seamless engineering-to-analysis data pipelines. You’ll learn how to use interoperable data file formats like Parquet or Feather for efficient storage and data access. You’ll learn how to exercise fine control over data types to avoid common data pipeline problems. During the tutorial you’ll be processing larger-than-memory files and multi-file datasets with familiar dplyr syntax, and working with data in cloud storage. The tutorial doesn’t assume any previous experience with Apache Arrow: instead, it will provide a foundation for using arrow, giving you access to a powerful suite of tools for analyzing larger-than-memory datasets in R.
2. Let’s Talk About Joins
In the world of research, we often have multiple datasets, collected from different instruments, participants, or time periods, and our research questions typically require these data to be linked in some way. Yet, before combining data, it’s important to consider what type of join makes the most sense for our specific purposes, as well as how to correctly perform those joins. This blog post reviews the various ways we may consider combining our data.
3. Ten Simple Rules on How to Write a Standard Operating Procedure
Standards for data acquisition, analysis, and documentation have been fostered in the last decade driven by grassroot initiatives of researchers and organizations such as the Research Data Alliance (RDA). Nevertheless, what is still largely missing in life science academic research are agreed procedures for complex routine research workflows. Well-crafted documentation like standard operating procedures (SOPs) offer clear direction and instructions specifically designed to avoid deviations as an absolute necessity for reproducibility. This 2020 paper provides a standardized workflow that explains step by step how to write an SOP to be used as a starting point for appropriate research documentation. An accompanying SOP template can be found here.
4. An Introduction to Common Data Elements
Common Data Elements (CDEs) are standardized survey questions, variables, and measures used in research and clinical studies to enhance the interoperability and quality of data coming from multiple sites and sources. The National Institutes of Health have identified required and recommended CDEs for biomedical research. ICPSR's Social, Behavioral, and Economic COVID Coordinating Center (SBE CCC) is working to expand the use of CDEs in social science disciplines. In this session, James McNally, Director of the National Archive of Computerized Data on Aging, gives an introduction to CDEs and their research applications.
Research Data Management Job Opportunities
These are data management job opportunities that I have seen posted in the last week. I have no affiliation with these organizations.
Just for Fun
Thank you for reading! If you enjoy this content, please like, comment, or share this post! You can also support this work through Buy Me A Coffee.



