Welcome to Issue 5 of the RDM Weekly Newsletter!
The RDM Weekly Newsletter is one month old! Thank you to everyone who has subscribed so far! I’ve had a lot of fun pulling these resources together and I hope you’ve found something worthwhile in these first few weeks of content.
If you are new to RDM Weekly, this newsletter is divided into 3 categories:
☑️ What’s New in RDM?
These are resources that have come out within the last year or so
☑️ Oldies but Goodies
These are resources that came out over a year ago but continue to be excellent ones to refer to as needed
☑️ Just for Fun
A data management meme or other funny data management content
What’s New in RDM?
Resources from the past year
1. Podcast — IDEA: Improving Data Engagement and Advocacy
In this podcast, hosts Shannon Sheridan and Briana Ezray Wham bring you interviews from real world data professionals who are engaging researchers in new and novel ways. In their most recent episode (#026), their guest is Amber Gallant, a Data Services Librarian at Royal Roads University, who shares details about her innovative project, Byte-Sized Data Encounters to teach data management skills when faced with limited time.
2. {datapasta} R Package
The {datapasta} package contains RStudio addins and functions that allow you to copy-paste data to and from your source editor, formatted for immediate use. In this video, Jenny Richmond demonstrates three ways to use the package to copy and paste data into your R environment.
3. The Open Science Cookbook
This book supplies the reader with recipes for improving how we practice, support, and teach open science in libraries. Whether you are new to open science or well versed on this topic, this book, written by academic library professionals and researchers, will provide inspiration, guidance, and practical steps for creating and supporting open library services. While this resource focuses on library services, I think it is really a useful resource for anyone championing open science at their institution.
4. Need to Update Your Data? Follow these Five Tips
When publicly sharing data, it is not uncommon to need to update those data at some point in the future. This Nature article discusses five practical tips for ensuring that users can track data lineage and that associated data processes remain reproducible when datasets need to be updated. The five tips discussed include choosing a suitable data repository, versioning your data (and citing the correct version), using clear file names, keeping a change log, and using common and interoperable data formats.
5. Dealing with Duplicative Data
This repository contains materials from a recent workshop given by Erin Grand where she reviewed common challenges with duplicate data and provided solutions for dealing with duplicates using R code. In addition to slides, this repository also contains an example script and raw data to practice with. Erin will also be giving a much briefer version of this talk at the New York Data Science & AI Conference in August.
6. Best Practices: Wrapping Up Research Projects
This two-page infographic, created by the Prevention Research Center at WashU in St. Louis, offers best practices for closing out a research project. It includes considerations for data management and sharing, dissemination of findings, organizing files, and more.
Oldies but Goodies
Older resources that are still helpful
1. Error Tight: Exercises for Lab Groups to Prevent Research Mistakes
Scientists, being human, make mistakes. We transcribe things incorrectly, we make errors in our code, and we intend to do things and then forget. The consequences of errors in research may be as minor as wasted time and annoyance, but may be as severe as losing months of work or having to retract an article. The purpose of this tutorial is to help lab groups identify places in their research workflow where errors may occur and identify ways to avoid them.
2. Why It Takes a Village to Manage and Share Data
Data sharing is not merely an act of releasing a digital product to the world. Rather, data sharing has all the emergent properties of a complex system. Many stakeholders, knowledge infrastructures, motivations, practices, and policies interact in unpredictable ways. This paper reviews the complex nature of data sharing and offers perspectives on acknowledging data sharing as a collective challenge rather than solely an individual responsibility of principal investigators.
3. Data Validation in SPSS
While this video appears to have no sound, it still may be useful for SPSS users. It offers a glimpse into how someone can set up data validation in SPSS. Setting validation rules for things such as variable ranges, allowable values, variable types, or duplicate unique identifiers, helps you systematically check that your data is meeting your quality expectations.
4. About Scientific Code Review
This guide from the Fred Hutch Data Science Lab provides readers an overview on the importance of code review. One of the highlights of this guide is that it also provides an outline of project development steps, broken out by role. In addition to the high-level overview provided on the home page, each role has its own tab that provides further resources for each role.
5. Data Elixir
Last but not least, I want to give a quick shout out to Lon Riesberg, the founder/editor of the Data Elixir newsletter. For the last 10 years Lon has curated this newsletter, which provides a weekly dose of the top data science picks from around the web. It is a staple for many data enthusiasts. Last week (in Issue 542), Lon wrote that after 10 years of creating the weekly newsletter, he will be taking a break for a while (hopefully not forever). Thank you to Lon for 10 years of sharing resources!
Just for Fun
Thank you for checking out the RDM Weekly Newsletter! If you enjoy this content, please like, comment, or share this post! You can also support this work through Buy Me A Coffee.