useR! 2024: Full Schedule

In Person & Virtual
8 - 11 July, 2024
Learn more and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for useR! 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Time (UTC+1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

14:00 CEST

Tutorial: Web Scraping with Rvest - Hadley Wickham, Posit

In this tutorial, you'll learn the basics of web scraping with the rvest package. We'll start with a discussion of the ethics or scraping and that basic structure of an HTML page. You’ll then learn about CSS selectors and how you can use them to identify the “rows” and “columns” of the data that you want to extract. Finally, you’ll write R code that uses the rvest package to turn web pages into tidy data frames. We'll also see how you can scrape paginated sites by combining rvest with httr2, and learn two techniques for scraping dynamic sites that generate HTML with javascript.

Speakers

Hadley Wickham

Chief Scientist, Posit

Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Salzburg I

Interfaces with other programming languages, Tutorial

11:00 CEST

Rtables: Modeling and Creating Complex Production-Grade Reporting Tables in R - Gabriel Becker, None

Tabular summaries of complex data are a crucial tool for exploring and describing complex data. The structure of reporting tables often go well beyond a typical one- or two-way frequency table, e.g., those required in clinical trial reporting. rtables provides a production-ready, foundational framework for declaring and building complex structured tables. We will present three aspects of our work. First, we will connect reporting tables, faceted data visualizations, and the grammar of graphics which many analysts will already be comfortable using. We will then showcase our table framework relying on these connections. Finally we will illustrate the creation of realistic, non-trivial tables using rtables.

Speakers

Gabriel Becker

Statistical Computing Consultant

Gabe is a frequent collaborator with R-core, having contributed 7 novel features to R including proposing and subsequently working with Luke Tierney on the internal ALTREP framework. He is the author of multiple R packages, including the rtables package for creating reporting tables... Read More →

Tuesday July 9, 2024 11:00 - 11:20 CEST
Salzburg I

Data visualisation

11:20 CEST

TeX Typesetting in R Graphics: the {Xdvir} Package - Paul Murrell, The University of Auckland

Text labels are essential components of any data visualisation, whether as titles, captions, axis labels, or general annotations. While it is possible to render text in R graphics, there are only limited facilities for typesetting text. As a simple example, there is no way in R graphics to lay out a paragraph of text with full justification. This talk will describe the {xdvir} package for R, which fills this gap by merging the sophisticated typesetting capabilities of the TeX system with R graphics.

Speakers

Paul Murrell

Associate Professor, The University of Auckland

Paul Murrell is an Associate Professor in the Department of Statistics at The University of Auckland. He is a member of the R-core development team, mostly active in the graphics system, and has developed several extension packages for R, also mostly related to graphics and data... Read More →

Tuesday July 9, 2024 11:20 - 11:40 CEST
Salzburg I

Data visualisation

11:40 CEST

The Treachery of Images: Exploring the Interdependence Between Graphics, Statistics, and Interaction - Adam Bartonicek, The University of auckland

With the rise of web technologies, interactive data visualizations have become a staple of data presentation. Yet, despite their growing popularity, researchers still point to the lack of a formalized pipeline for turning raw data into summary statistics. The cause of this lack may be a subtle yet profound issue: while we often treat statistics and graphical objects as independent, they are in fact deeply connected. Consider a typical stacked barplot. Many researchers have noted that stacking some summaries will produce a valid overall statistic (e.g. count, sum), whereas stacking others will not (e.g. mean). But what are the mathematical properties that make this possible? Are there other operators that can be stacked? The goal of this talk is to delve into the relationship between graphical objects, statistics, and interaction. Specifically, by discussing a handful of concepts from category theory, I hope to give you a new appreciation of the rich structure that lies beyond the figures we look at every day. Finally, this talk will also briefly introduce a new R package for interactive data exploration – plotscaper – which is an attempt to implement some of these ideas.

Speakers

Adam Bartonicek

PhD Candidate/BscHons, The University of auckland

Adam is a PhD student at The University of Auckland, New Zealand, with a primary interest in interactive data visualization (under the primary supervision of associate professor Dr. Simon Urbanek). He is a keen user of R and also a fan of web technologies and Bayesian statistics... Read More →

Tuesday July 9, 2024 11:40 - 12:00 CEST
Salzburg I

Data visualisation

12:00 CEST

Educational Outcomes in Higher Education, from Boxplots to Dashboards via Mixed Effects: R Showcase - Jarek Bryk, University of Huddersfield

In my role at the University of Huddersfield, a mid-size institution in the north of England, I contribute to the analysis and reports on students' educational outcomes and factors that affect differential attainment of various groups of students across the entire institution. They present the composition of student population and state of their educational outcomes at various levels of institutional hierarchy and to various internal stakeholders. In this talk, I will highlight the varied applications of R analytical, visualisation and reporting capabilities in a higher education context: from correlations between attendance and marks on individual modules, through parametrised reports on students engagement with online materials, to mixed effects models that disentangle contributions of socioeconomic status and prior qualifications to graduate outcomes, with a few maps thrown in as well. These applications demonstrate the versatility of R, used not only as a business intelligence tool, but also as a "pedagogical intelligence" tool, to help us evaluate educational practice, focus our attention on challenges and direct support to improve the students' outcomes.

Speakers

Jarek Bryk

Dr, University of Huddersfield

I am a molecular biologist/computational biologist who stumbled upon a data analyst role. I wear three hats at work: I teach data science, genomics and evolution to undergraduates; I study patterns of genetic variation in small mammals to learn about their evolution and demography... Read More →

Tuesday July 9, 2024 12:00 - 12:20 CEST
Salzburg I

Data visualisation

13:20 CEST

Manage Ggplot Figures Using Ggfigdone - Wenjie Sun, Institut Curie

When you prepare a presentation or a report, you often need to manage a large number of ggplot figures. You need to change the figure size, modify the title, label, themes, etc. It is inconvinient to go back to the original code to make these changes. This package provides a simple way to manage ggplot figures. You can easily add the figure to the database and update them later using CLI (command line interface) or GUI (graphical user interface). ggfigdone is in the early stage of development. I'm looking for feedback and suggestions to improve the package. The package is available in github: https://github.com/wenjie1991/ggfigdone

Speakers

Wenjie Sun

PostDoc, Institut Curie

As a Postdoctoral Researcher at the Institut Curie, he now concentrates on employing various statistical and computational methods to conduct DNA sequence-based cellular lineage tracing, integrating this with single-cell omics data.

Tuesday July 9, 2024 13:20 - 13:25 CEST
Salzburg I

Data visualisation, Lightning Talk

13:25 CEST

Pricing Analytics - Meena Saad, Bank of Montreal

Pricing analytics is an emerging sector where the use of optimization models and forecasting come hand in hand. I've developed models where we can use proprietary data and market intelligence to keep optimizing market share through pricing. The old tale of Volume vs Margin, which to prioritize. Since the development of the model the team has been able to guide business leaders in increasing profits while minimizing loss. In our talk we will discuss: -What is pricing analytics -Pricing challenges -How to optimize pricing using R

Speakers

Meena Saad

Strategic Pricing Manager, Bank of Montreal

I am a Canadian working in the US supporting the US operations for BMO. I started my career with bachelor of Finance from the University of Ottawa. Worked in treasury for a few years then realized the importance of developing a greater understanding of analytics. I then obtained my... Read More →

Tuesday July 9, 2024 13:25 - 13:30 CEST
Salzburg I

Economics + finance + insurance + business, Lightning Talk

13:30 CEST

Table Talk: Designing a Workflow for Reproducible Table Creation in R for Epidemiological Research - Reiko Okamoto, Bruyère Research Institute/Ottawa Hospital Research Institute

Summary tables are ubiquitous in scientific manuscripts reporting epidemiological and clinical studies. Table 1 often contains key demographic information about the study population, including the mean and standard deviation for continuous variables and frequency and proportion for categorical variables. Table 2 may present the association between the explanatory and outcome variables under investigation, and so on. Since there are endless ways (some more robust than others) to create analytical and summary tables in R, our research group wanted to define a workflow that could be adopted by colleagues of varying levels of proficiency in R to reproducibly create these tables from raw data. In this talk, I will share our approach in designing this workflow and what we discovered along the way. This will include a discussion on the current landscape of table-generating packages in R and how we overcame the limitations of existing software alongside other challenges (e.g., incorporating existing metadata). These strategies will not only be useful to researchers in epidemiology but also relevant for those in other health and social science disciplines.

Speakers

Reiko Okamoto

Methodologist, Bruyère Research Institute/Ottawa Hospital Research Institute

Reiko has a background in the life sciences with experience conducting data analysis in academia and the public sector. She is always eager to make analysis more open, transparent, and reproducible. Originally from the west coast of Canada, she completed a BSc in Microbiology and... Read More →

Tuesday July 9, 2024 13:30 - 13:35 CEST
Salzburg I

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:35 CEST

BayesCVI: A Bayesian Cluster Validity Index - Nathakhun Wiroonsri, King Mongkut's University of Technology Thonburi

Selecting the appropriate number of clusters is a critical step in applying clustering methods. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their contexts. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes a Dirichlet prior, resulting in the same posterior distribution. We evaluate BCVI using the Wiroonsri index for hard clustering and the WP index for soft clustering as underlying indices. We compare the performance of BCVI with that of the original underlying indices and several other existing CVIs, including DB, STR, XB, and KWON2 indices. Our BCVI offers clear advantages in situations where users can specify their desired range for the final number of clusters. Additionally, we showcase the practical applicability of our approach through MRI images. These tools are also published as a new R package `BayesCVI' available on CRAN.

Speakers

Nathakhun Wiroonsri

Assistant Professor, King Mongkut's University of Technology Thonburi

Nathakhun Wiroonsri earned his B.Sc. in Mathematics with first-class honors from Chulalongkorn University, Master of Financial Mathematics from North Carolina State University, and Ph.D. in Applied Mathematics from the University of Southern California in 2010, 2013, and 2018, respectively... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Salzburg I

Machine learning and AI, Lightning Talk

13:40 CEST

Adding the Missing Audit Trail to R - Magnus Mengelbier, Limelogic AB

The R language is used more extensively across the Life Science industry for GxP workloads. The basic architecture of R makes it near impossible to add a generic audit trail method and mechanism for all users cases. Different strategies have been developed to provide some level of auditing, from logging conventions to file system audit utilities, but each has its drawbacks and lessons learned.

The ultimate goal is to provide an immutable audit trail compliant with ICH Good Clinical Practice, FDA 21 CFR Part 11 and EU Annex 11, regardless of the R environment. We consider different approaches to implement auditing functionality with R and how we can incorporate an audit trail functionality natively in R or with existing and available external tools and utilities that completely supports Life Science best practices, processes and standard procedures for analysis and reporting.

Speakers

Magnus Mengelbier

Managing Director, Limelogic AB

Magnus is currently the Managing Director of Limelogic, a contributor, collaborator and independent consultant based in southern Sweden with over 25 years of experience in the Life Science industry. A keen advocate of simple programming approaches with a focus on GxP, compliance... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Salzburg I

R workflow + deployment + production, Lightning Talk

13:45 CEST

Using R with SQL Server 2022 and Power BI - Tomaž Kaštrun, /

Exploring the usage of R and SQL Server, elucidating the benefits, challenges, and practical applications of harnessing R's statistical computing capabilities directly within the SQL Server environment.

By leveraging R scripts and functions seamlessly within Power BI, data professionals can gain access to a powerful toolkit for advanced analytics, predictive modelling, and machine learning.

Both integrations facilitates the execution of complex statistical analyses directly on large datasets stored in SQL Server databases or in Power BI (Vertipax), eliminating the need for data movement and enabling additional insights.

Speakers

Tomaž Kaštrun

Mr., /

Tomaž Kaštrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research... Read More →

Tuesday July 9, 2024 13:45 - 13:50 CEST
Salzburg I

Interfaces with other programming languages, Lightning Talk

14:10 CEST

Analyzing Real-World Geospatial Networks in R for Sustainable Transport Planning - Lucas van der Meer & Lorena Abad, University of Salzburg

Geospatial networks are graphs embedded in geographical space. They can be used to represent, analyze and model a variety of real-world complex systems. A motivating example is urban transport systems with their ongoing transition towards a sustainable design and increased focus on active travel. Streets, their surroundings, and their interconnections form the geospatial network. The analysis often involves an assessment of transport accessibility: how well does the network connect people to the places they want to go to? This talk will cover three main stages of such an analysis, and its implementation in R. First, we show how to import street geometries and amenity datasets from OpenStreetMap, using the packages {osmdata} and {osmextract}. Second, we show how to build a clean and routable street network from these data, using the package {sfnetworks}. Finally, we give an example of how to compute bicycle accessibility to different amenities, taking into account the suitability of the network for cycling. Although we focus on the application domain of transport planning, the content is meant to be useful for anyone interested in analyzing real-world geospatial networks in R.

Speakers

Lorena Abad

MSc., University of Salzburg

Doctoral researcher at the Department of Geoinformatics - Z_GIS of the University of Salzburg. Part of the research groups Risk, Hazard and Climate and EO Analytics. I focus on the analysis of big Earth observation data to map and monitor landscape dynamics and I am researching the... Read More →

Lucas van der Meer

Msc, University of Salzburg

Lucas van der Meer is a doctoral researcher in Geoinformatics at the University of Salzburg. He holds a bachelor in Environmental & Infrastructure Planning, and a master in Geospatial Technologies. He is particularly interested in the application of geospatial data science to address... Read More →

Tuesday July 9, 2024 14:10 - 14:30 CEST
Salzburg I

Spatial data and maps

14:30 CEST

Interfacing QGIS Spatial Processing Algorithms from R - Floris Vanderhaeghe, Research Institute for Nature and Forest (INBO) (Brussels, Belgium)

R is a powerful language for processing, analyzing and visualizing spatial data, with packages such as sf, terra, and stars. However, dedicated geographic information system (GIS) software tools offer thousands of specific algorithms that are either not available in R, or may be faster than equivalent R functions. This presentation describes how it is now possible to combine the strengths of R and QGIS, the most popular open source GIS platform, through R packages that interface QGIS processing algorithms: qgisprocess and qgis. These packages allow users to create data processing pipelines that combine R and QGIS algorithms seamlessly. We discuss the current state of these R packages and demonstrate the usage of their most important functions by example. We show the usage of qgis_search_algorithms(), qgis_run_algorithm(), qgis_extract_output(), coercion methods and more. We highlight recent updates in QGIS that improve functionality in R. Finally, we seek feedback from the community and invite contributions.

Speakers

Floris

Dr. Floris Vanderhaeghe, open science methodologist at INBO, Research Institute for Nature and Forest (INBO) (Brussels, Belgium)

Floris Vanderhaeghe is a biologist specialized in scientific methodology, with a focus on spatial survey design. Together with his team mates, he promotes the implementation of open science practices at INBO. He has a special interest in geospatial computation in R and likes to collaborate... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Salzburg I

Spatial data and maps

14:50 CEST

Sfislands: An R Package for Accommodating Islands and Disjoint Zones in Areal Spatial Modelling - Kevin Horan, Maynooth University

Fitting areal spatial models can be a cumbersome task, particularly when the geographical units are not well-behaved. The presence of islands, for example, gives rise to particular issues when creating neighbourhood structures based on contiguity. Further complications can arise from the presence of other natural barriers such as rivers and mountains, or man-made connectivities such as bridges, tunnels and ferry crossings. In order to create what a researcher considers to be an appropriate neighbourhood structure, incorporating all of the domain knowledge that they might have about the system, it should be simple and intuitive to add and remove connections between spatial units. Using examples from Indonesian earthquakes to London's river Thames, this session demonstrates a package which streamlines the human workflow involved in both the setting up of neighbourhood structures for spatial models, and the extraction of predictions from subsequent models. The package has a heavy emphasis on visualisation of both neighbourhood structures and model predictions and this will be reflected in the examples.

Speakers

Kevin Horan

PhD researcher, Maynooth University

Kevin Horan is a third-year PhD researcher in the Science Foundation Ireland Centre for Research Training in Foundations of Data Science at Maynooth University.

Tuesday July 9, 2024 14:50 - 15:10 CEST
Salzburg I

Spatial data and maps

15:10 CEST

Wavelet Secure Maps: Enhancing Privacy Protected Maps - Edwin de Jonge, Statistics Netherlands

We present a novel privacy protection method for spatial density maps based on wavelet MRA analysis. sdcSpatial is an R package designed to create spatial density maps, while protecting the privacy of the obervations involved. It contains several protection methods, which work well, but may create a suboptimal density map: the spatial resolutions of urban and rural areas often are very different. Wavelet Secure Maps are a novel method that use multi-resolution analysis to derive a spatial density map that adapts to the local spatial resolution. The presentation will introduce the method and its application using the upcoming update for sdcSpatial.

Speakers

Edwin de Jonge

Statistics Netherlands

Edwin de Jonge is a research and statistical consultant working at Statistics Netherlands for more than 25 years. He has a background in theoretical and computational physics. He has a long experience in methodological research, including data cleaning, visualization and network analysis... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Salzburg I

Spatial data and maps

15:30 CEST

Boost Spatial Data Science Workflows with GRASS GIS and R - Veronica Andreo, Center for Geospatial Analytics. North Carolina State University.

GRASS GIS is a powerful geoprocessing engine that offers a robust and mature toolset for diverse applications. The core distribution brings together more than 500 tools for spatial and temporal analysis of vector, raster, 3D raster and imagery data. GRASS was developed for speed and efficiency, which allows it to scale workflows with massive datasets rather simply. At the same time, R excels at statistical analysis, modeling and data visualization. The spatial community within R has indeed grown significantly in the last decade, with the rise of packages like sf, stars, gdalcubes, terra, mapview, tmap, among many others. The beauty of open source software is that we do not need to reinvent the wheel each time. Instead, we can join forces to build bridges that connect our individual strengths. In this talk, I’ll stand over the shoulders of giants, to demonstrate how the combination of GRASS GIS and R through the rgrass package can help us integrate and streamline our spatial data engineering and data science workflows for scientific and operational applications.

Speakers

Veronica Andreo

Dr., Center for Geospatial Analytics. North Carolina State University.

Veronica Andreo holds a PhD in Biology and an MSc in Remote Sensing and GIS Applications. She is part of the GRASS Dev Team, and serves as PSC chair since 2021. She is currently working at the Center for Geospatial Analytics, in North Carolina State University (USA) within an NSF... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Salzburg I

Spatial data and maps

11:30 CEST

Forecast Reconciliation Made Easy: The FoReco Package - Daniele Girolimetto, Department of Statistical Sciences, University of Padova

Forecast reconciliation is a post-forecasting approach to ensure the coherence of forecasts across constraints (not just simple aggregation). It harmonizes individual predictions to meet predefined relationships, leading to a consistent and comprehensive picture. This can include ensuring market share forecasts for different brands sum up to the total, or guaranteeing some property (e.g. non negativity). By incorporating these constraints, reconciliation can also improve forecast accuracy by leveraging the individual strengths. This technique finds applications in several fields like finance, supply chain, macroeconomics, load, renewable energy generation, and weather forecasting. The R package FoReco provides a powerful toolset for implementing classical and regression-based forecast reconciliation. It offers a wide range of different approaches to address different types of constraints, including cross-sectional (e.g., market share), temporal (e.g., annual-monthly data), and cross-temporal relationships. This talk presents an overview of the forecast reconciliation process and provides examples using FoReco in real-world applications. https://github.com/danigiro/FoReco

Speakers

Daniele Girolimetto

Postdoctoral researcher in Statistics, Department of Statistical Sciences, University of Padova

Daniele Girolimetto is a postdoctoral researcher in the Department of Statistical Sciences at the University of Padova. His research interests are related to time series including statistical methods (univariate/multivariate forecasting approaches, bootstrap methods, applications... Read More →

Wednesday July 10, 2024 11:30 - 11:50 CEST
Salzburg I

Predictive modelling and forecasting

11:50 CEST

Dynamic Prediction with Numerous Longitudinal Covariates - Mirko Signorelli, Leiden University

To make informed decisions, clinicians and patients rely on accurate predictions of the probability to experience adverse events such as dementia, cancer or death. Dynamic prediction models can update the probability of experiencing an event as more longitudinal data is collected. However, traditional joint modelling is computationally unfeasible with more than a handful of longitudinal covariates, and until recently R lacked a package that could deal with numerous longitudinal covariates. The R package pencal uses a penalized regression calibration approach that allows to overcome this limitation. It employs mixed-effects models to summarize the evolution of the longitudinal covariates, and a penalized Cox model to predict survival. Besides covering estimation, the package comprises functions to compute predicted survival probabilities for new subjects, and to validate model performance. For large datasets, pencal enables easy parallelization through the specification of the number of cores as argument within its functions. Reference: Signorelli, M. (2023). pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal Predictors. Preprint: arXiv.2309.15600

Speakers

Mirko Signorelli

Assistant professor, Leiden University

Mirko Signorelli is assistant professor of Statistics at Leiden University, where he develops new statistical models, creates R packages, and teaches courses on R, computational statistics and longitudinal data analysis. His research focuses on statistical models for longitudinal... Read More →

Wednesday July 10, 2024 11:50 - 12:10 CEST
Salzburg I

Predictive modelling and forecasting

12:10 CEST

Tidymodels: Now Also for Time-to-Event Data! - Hannah Frick, Posit

The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles. In addition to regression and classification, it now also supports censored regression for time-to-event data. This type of data with potential censoring requires dedicated models and performance metrics from the field of survival analysis. While the censored package has made survival models available for a while, the recent addition of survival metrics to the yardstick package has enabled us to support this type of analysis across the entire framework. The same ease of use and vast functionality, from resampling and feature engineering to tuning, is now available for this additional modeling problem.

Speakers

Hannah Frick

Senior Software Engineer, Posit

Hannah Frick is a software engineer on the tidymodels team at Posit. She holds a PhD in statistics and has worked in interdisciplinary research and data science consultancy. She is a co-founder of R-Ladies Global.

Wednesday July 10, 2024 12:10 - 12:30 CEST
Salzburg I

Predictive modelling and forecasting

15:00 CEST

Getting the Most Out of Test-Driven Development for Shiny - Jakub Sobolewski, Appsilon

Tests are not only a way of catching bugs but also a way of building software. During the talk, I’ll share how we can use Test-Driven Development to build Shiny apps. We’ll start with tips on gathering requirements in a format that is easy to translate to test cases. Implementing requirements as automated tests helps us get confidence we’ve built the correct code. This is crucial when using Shiny in enterprise, when producing incorrect results may cost dearly. I’ll introduce patterns that help us separate test code from implementation details, making tests more durable. We’ll discuss ways we can shape test code to build specifications that read almost like natural language. Furthermore, we’ll talk about how to use shinytest2 effectively and what the alternatives are for robust testing of Shiny apps. Even if you don’t plan to employ Test Driven Development, you’ll be able to reuse the same patterns to produce more durable tests that document the app's behavior.

Speakers

Jakub Sobolewski

Mr, Appsilon

Jakub is a senior engineer at Appsilon. He has a background in applied physics. Before Appsilon he worked in insurance and banking as an analyst. Maintainer of the shiny.react and shiny.fluent packages.

Wednesday July 10, 2024 15:00 - 15:20 CEST
Salzburg I

Efficient programming

15:20 CEST

What They Forgot to Teach You About Shiny Development in Production - Mohamed El Fodil Ihaddaden, HDI AG

The amount of documentation about Shiny is getting larger and larger. The official Shiny website offers a nice interface with live examples. Many books and video tutorials have been published about the subject. However, the amount of documentation and freely available knowledge about Shiny development in a production context remains scarce. As such, many Shiny developers, even the experienced once find themselves confused when starting their journey within a company that makes money using Shiny, at least at first. Indeed, while developing Shiny applications in production, there are a set of rules that should be respected in order to have the smoothest experience possible, for the developers themselves and for the stockholders too. By stockholders, I don't mean the users only, rather everyone who interact directly or indirectly with the app, for example someone might need an output that is generated through the app. Through my experience as a Shiny developer for a company that leverages Shiny in production in an extensive way, I would like to think that I have gathered an interesting amount of knowledge that will help anyone improve their Shiny development process

Speakers

Fodil Ihaddaden

Analytics Engineer, HDI AG

R enthusiast from Algeria and based in Hamburg, Germany. Working as an R/Shiny developer.

Wednesday July 10, 2024 15:20 - 15:40 CEST
Salzburg I

Efficient programming

15:40 CEST

Mastering Plumber Structure: Your API's Solutions. - Adam Forys & Magdalena Krochmal, Roche

This presentation offers a comprehensive analysis of the structural design patterns within the Plumber API framework. We will explore a spectrum of approaches, ranging from fundamental implementations to sophisticated techniques such as 'Plumber as code' and 'Plumber as package.' Each structure will be examined for its advantages and best-use scenarios. Finally, we will provide guidance on selecting the most suitable API structure based on a development team's skills and project requirements.

Speakers

Adam Forys

Mr., Roche

Adam is a Principal Data Scientist at Roche. He is dedicated to building R packages that empower teams working on SDTM. He is committed to collaboration and enjoys guiding others in overcoming technical obstacles and optimizing their data science workflows.

Magdalena Krochmal

Senior Data Scientist, Roche

Magdalena Krochmal is a Senior Data Scientist based in Basel, Switzerland. With a background in biomedical engineering and a Ph.D. in bioinformatics, she has spent three impactful years at Roche. Magdalena is an expert R developer specializing in SDTM automation. Her work centers... Read More →

Wednesday July 10, 2024 15:40 - 16:00 CEST
Salzburg I

Efficient programming

11:30 CEST

R Evolution: The Retirement of R Packages with Many Reverse Dependencies - Edzer Pebesma, University of Muenster & Roger Bivand, Norwegian School of Economics

We report on a project where three older R packages for spatial analysis: rgdal for reading and writing vector and raster data and coordinate transfromation, rgeos for geometric transformations and predicates and maptools have been taken off CRAN on Oct 16, 2023 because their maintainer retired, and more modern approaches (e.g., sf and terra) had superseded them. To avoid a very large number of R packages that depended on one or more of these packages, directly or indirectly, being removed from CRAN, we took a number of steps. In this talk we describe the steps we took to minimize lasting damage to other packages on CRAN, and report on the lessons learnt. Over the course of the project the number of at-risk packages decreased from more than 550 to less than 100. Of the 70 or so packages still on an at-risk watch-list, about half were actively archived by their maintainers as outdated. We will discuss some key takeaways/pieces of advice for developers considering software retirement, and propose a mechanism of deprecating packages with a deprecation date, which could for instance show up as a NOTE when checking packages that use them.

Speakers

Roger Bivand

Norwegian School of Economics

Retired

Edzer Pebesma

University of Muenster

I lead the spatio-temporal modelling laboratory at the institute for geoinformatics, and am deputy head of institute. I hold a PhD in geosciences, and am interested in spatial statistics, environmental modelling, geoinformatics and GI Science, semantic technology for spatial analysis... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Salzburg I

Community and outreach

11:50 CEST

Fifteen Years of the R Journal - Mark van der Loo, Statistics Netherlands

The first issue of the R Journal was published in June 2009. Run by volunteers from academia, government and industry, the journal has grown into an increasingly popular outlet for scientific research on anything related to R. At the time of writing the Journal has an impact factor of 1.673. In this talk I will look back at the origins and history of The R Journal. I will look back on the people involved and the formal organisation of the journal, including associate editors, editors, and the advisory board. We will take a detailed look at the current editorial process and production of issues in HTML and pdf format will be explained. This will yield extensive tips and tricks that help aspiring authors to get their submissions processed quickly. Finally, we will look into the future developments of the R Journal.

Speakers

Mark van der Loo

Senior Researcher, Statistics Netherlands

Mark is a Senior Researcher at Statistics Netherlands and a Research Fellow at the Leiden Institute for Advanced Computer Science at the University of Leiden. Mark published his first package in 2009 and has since co-authored about 20 R packages, a book on statistical data cleaning... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Salzburg I

Open and reproducible science

12:10 CEST

Translate R for Global Reach - Binod Jung Bogati, Numeric Mind

Do you use R and want to help extend its global reach? Our talk on translating R is just for you! Translation involves translating R's messages, warnings, and errors from English into other languages, making it accessible to a global audience. Support for translation has been part of R since 2005, but it relies heavily on community contributions to provide the necessary translations and help keep up-to-date with changes in R. In this talk, we'll begin by providing an overview of recent efforts to facilitate community contribution, celebrating the achievements of translation teams at community events like R Project Sprint 2023 in coordination with the R Contribution Working Group and R Core Team. We'll then dive into the practical aspects of contributing to R's translations (via Weblate), including explaining how R's messages are structured and tips for translating technical terms. In addition, we'll offer valuable tips and tricks to streamline the translation process and our communication with the translation community. Join us on this exciting journey to make R accessible to all and discover how you can be a part of this global endeavor.

Speakers

Binod Jung Bogati

Associate Manager - Data Science, Numeric Mind

Binod Jung Bogati is a Statistical Programmer at Numeric Mind since 2020. Apart from work, he is also rOpenSci 2023/24 Champion, R User Group Nepal's organizer, hosts R community events. He loves working on data and currently focusing on Clinical Data Science / Life Science.

Thursday July 11, 2024 12:10 - 12:30 CEST
Salzburg I

Community and outreach

12:30 CEST

Using R to Co-Create an Inclusive Data Analysis Approach with the HBCU Health Equity Data Consortium - Lois Adler-Johnson, North Carolina Institute for Public Health

From February 2023 to February 2024, the Historically Black Colleges and Universities (HBCU) Health Equity Data Consortium (HEDC) in North Carolina (NC) deployed the COVID-19 Impact Survey to address critical data gaps on the pandemic’s impact on households across NC. To provide capacity building support for wrangling, analyzing, and visualizing survey results, the NC Institute for Public Health (NCIPH) formed a Data Analysis Workgroup composed of faculty and students from all 10 universities within the HBCU HEDC. Workgroup members had a variety of preferred, mostly licensed programming languages; NCIPH selected R as the primary language as it was free and accessible. NCIPH led R trainings, compiled relevant R resources, and developed shared code for transforming raw results, descriptive statistics, and univariate regression. The group used R Markdown, Quarto, and Shiny to report results, ultimately using the output as a basis for exploratory analyses and dissemination of findings to NC communities. The facilitation of a Data Analysis Workgroup and use of free, open-source R packages and outputs can serve as an engaging framework to bolster data science education and autonomy.

Speakers

Lois Adler-Johnson

Public Health Data Scientist, North Carolina Institute for Public Health

Lois Adler-Johnson is a Data Scientist at the North Carolina Institute for Public Health who's passionate about sharing and applying her quantitative data analysis and programming skills in ways that address racial and health inequities across North Carolina. Lois has an academic... Read More →

Thursday July 11, 2024 12:30 - 12:50 CEST
Salzburg I

Data science education