Welcome to Issue 14 of the RDM Weekly Newsletter!
Last week I attended Posit Conference so for this issue I wanted to highlight some of my favorite data management related talks from this year’s conference, as well as a few of my favorite talks from past conferences. The talks are shared in our usual 3 category format:
☑️ What’s New in RDM?
These are talks from the 2025 Posit Conference. There are many more great talks worth checking out. You can view abstracts for all of them at the Posit Conference website.
☑️ Oldies but Goodies
These are talks from previous Posit (or RStudio) Conferences
☑️ Just for Fun
A data management meme or other funny data management content
What’s New in RDM?
Slides from last week’s conference (talks were recorded and will be available to watch online via Posit’s YouTube Channel in next few months).
1. AI Coding Assistants: Hype, Help, or Hindrance
AI coding assistants like ChatGPT, GitHub Copilot, and Codeium promise to revolutionize our coding workflows—but how useful are they in practice? Are they our new overlords here to take our jobs? Or just a passing gimmick? Rebecca Barter thinks the reality lies somewhere in between, and that understanding these tools is key to staying relevant in today’s rapidly evolving data science ecosystem.
In this talk, Rebecca shows how she used these AI tools in RStudio, Positron, and VS Code to speed up both her advanced R workflows as well as her learning experience as an intermediate Python programmer, providing examples, pitfalls, and best practices.
2. Bold indicates negative?
Over one billion people worldwide use spreadsheets to manage and analyse data, often styling cells and their contents to highlight or distinguish values. This formatting is often used to encode additional data (for example, indicating groups with colors) but most data science tools are unaware of data expressed as formatting. This talk summarizes Luis D. Verde Arregoitia’s progress addressing the gap between formatted spreadsheets and the modern data stack. First, by championing data organization best practices, then by bringing spreadsheet contents and their formatting together into R for further analyses (unheadr package), and finally how translating format to data helped him develop tools (forgts package) for converting formatted spreadsheets to gt objects.
3. Teaching Data Sharing Through R Data Packages
Data science courses tend to teach students reproducible workflows. However, the origin of the data used in these workflows and definitions of the variables used are often not emphasized. This talk addresses this gap by focusing on how to teach students effective data sharing through the creation of R data packages. Kelly McConville explores how to leverage key packages, such as devtools and usethis, and will demonstrate how to guide students in generating appropriate documentation through ReadMes, help files, and vignettes. Furthermore, she discusses common pitfalls encountered when first learning to create R packages and will propose how to structure a project assignment where an R data package serves as the primary deliverable.
4. From R User to R Programmer
In this one day workshop, Emma Rand and Ian Lyttle helped attendees improve their R programming skills and reduce the amount of duplication in their code through function writing and using iteration. The two main goals of the class were to a) Master the art of writing functions that do one thing well and to b) Learn how to perform the same action on many objects using code which is succinct and easy to read. While there will be no recording available from this workshop, the openly available materials are still a great resource for anyone looking to improve their function writing and iteration skills.
5. Practical {renv}
The {renv} package aims to *help* users create reproducible environments for R projects. In theory, this is great! In practice, restoring a package environment can be a frustrating process due to overlooked R configuration requirements. In this talk Shannon Pileggi helped the audience better understand the source of environment restoration issues and learn strategies for successful maintenance of {renv}-backed projects.
6. Introducing surveydown: An Open-Source Platform for Programmable, Markdown-Based Surveys
In Issue 11 of this newsletter I shared a paper introducing the surveydown platform. This talk from John Paul Helveston provided a helpful high-level overview of this tool, which leverages the Quarto publication system and R shiny web framework to create reproducible and interactive surveys. While most survey platforms rely on graphical interfaces or spreadsheets to define survey content, surveydown uses plain text (markdown and R code chunks), enabling version control and collaboration via tools like GitHub. It supports complex features like conditional skip logic, dynamic questions, and complex randomization as well as a diverse set of question types and formatting options. The open-source package gives researchers full control over survey implementation and data storage, with reproducible workflows that integrate with R data analysis.
Oldies but Goodies
Presentations from prior year conferences
1. Don’t Repeat Yourself, Talk to Yourself! Reporting with R
If you’re responsible for analyses that need updating or repeating on a semi-regular basis, you might find yourself doing the same work over and over again. The principle of "don’t repeat yourself" from software engineering motivates us to use functions and packages, the core of repetition in the R universe. For analyses, it can be difficult to know how to use this principle and move beyond "copying and pasting scripts and changing the data file and the object names and updating the dates and results in RMarkdown", especially when there’s some element of human intervention required, whether it be for validating assumptions or cleaning artisanal data. In this 2020 talk, Sharla Gelfand focused on those next steps, showcasing opportunities to stop repeating yourself and instead anticipate the needs of and communicate effectively with your future self (or the next person with your job!) using project-oriented workflows, clever interactivity, templated analyses, functions, and yes, your own packages.
2. "Please Let Me Merge Before I Start Crying": And Other Things I've Said at The Git Terminal
This 2024 talk from Meghan Harris was geared towards those who may feel comfortable working independently with Git but need some confidence when working collaboratively. Just like novice drivers can learn to confidently (and safely!) merge onto (seemingly) intimidating highways, those new to collaborating with Git can also conquer Git merges with some exposure and preparation. This talk reviewed different ways R users can interact with Git, what Git merges and Git merge conflicts are, real-life examples of Git merges, advice on resolving Git merges, and suggestions for cleaner workflows to promote better Git merges
3. Context is King
The quality of data science insights is predicated on the practitioner’s understanding of the data. Data documentation is the key to unlocking this understanding; with minimal effort, this documentation can be natively embedded in R data frames via variable labels. Variable labels seamlessly provide valuable data context that reduces human error, fosters collaboration, and ultimately elevates the overall data analysis experience. In this 2024 talk, as an avid, daily user of variable labels herself, Shannon Pileggi helped the audience discover new workflows to create and leverage variable labels in R.
4. We Converted our Documentation to Quarto
In this 2023 talk, Melissa Van Bussel describes converting documentation (i.e., Wiki, User Guide, Handbook) to Quarto. A year prior to the talk, her team's documentation, which had been created using Microsoft Word, was large and lacked version control. Scrolling through the document was slow, and, due to confidentiality reasons, only one person could edit it at a time, which was a significant challenge for their team of multiple developers. After realizing they needed a more flexible solution, they successfully converted their documentation to Quarto. In this talk, Melissa discusses her team’s journey converting to Quarto, the challenges they faced along the way, and tips and tricks for anyone else who might be looking to adopt Quarto too.
5. Save an Ocean of Time: Streamline Data Wrangling with R
In this talk from the RStudio Conference in 2022, Danielle Dempsey discussed how her organization had over 250 oceanographic sensors deployed around the coast of Nova Scotia, Canada. Together, they generated around 4 million rows of data every year. She was shocked when she discovered her colleagues manually compiled, formatted, and analyzed these data using hundreds of Excel spreadsheets. This was highly time consuming, error prone, and lacked traceability. To improve this workflow, she developed an R package that reduced processing time by 95%. The package became integral to their data pipeline, including quality control, analysis, visualization, and report generation in RMarkdown. The resulting datasets proved to be invaluable to industry leaders looking to invest in Nova Scotia’s coastal resources.
Just for Fun
Thank you for checking out the RDM Weekly Newsletter! If you enjoy this content, please like, comment, or share this post! You can also support this work through Buy Me A Coffee.