useR! 2024: Full Schedule

In Person & Virtual
8 - 11 July, 2024
Learn more and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for useR! 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Time (UTC+1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

08:00 CEST

Registration

Monday July 8, 2024 08:00 - 18:00 CEST
Salzburg Foyer

Registration

09:00 CEST

Tutorial: Introduction to Machine Learning for Survival Analysis with Mlr3 - John Zobolas, Institute for Cancer Research & Lukas Burk, Leibniz Institute for Prevention Research and Epidemiology - BIPS and LMU Munich

This introductory tutorial is designed to equip participants with practical skills and knowledge for performing survival analysis using machine learning techniques. Survival analysis, a fundamental statistical method in biomedical and clinical research, focuses on analyzing time-to-event data, such as the time to disease progression or patient survival. In this tutorial, attendees will work with clinical and gene expression data to build, train, and test survival models. They will learn how to leverage R's mlr3 ecosystem for efficient model development, incorporating sophisticated machine learning models such as penalized linear models and random forests to enhance the accuracy of the survival predictions. Participants will also explore survival metrics and model validation techniques to assess the quality and reliability of their models in the context of real-world data. Whether you're new to survival analysis or seeking to enhance your skills, this workshop offers valuable insights and hands-on experience for tackling challenging clinical and biomedical questions.

Speakers

John Zobolas

PhD, Institute for Cancer Research, Oslo University Hospital

My background is in computer science, with diverse expertise in computational modeling, software engineering, survival analysis and statistical/machine learning. Being an engineer at heart, my strongest quality is careful, analytical thinking. A productive workday consists of writing... Read More →

Lukas Burk

M.Sc., Leibniz Institute for Prevention Research and Epidemiology - BIPS and LMU Munich

Studied Public Health and Biostatistics before starting a PhD in Statistics and Machine Learning

Monday July 8, 2024 09:00 - 12:30 CEST
Pinzgau

Biostatistics + epidemiology + bioinformatics, Tutorial

09:00 CEST

Tutorial: Interactively Exploring High-Dimensional Data and Models in R - Ursula Laa, University of Natural Resources and Life Sciences & Dianne Cook, Monash University

This tutorial is for scientists and data science practitioners who regularly work with high-dimensional data and are interested in learning how to better visualize this data. It is based on a book with the same title, available under https://dicook.github.io/mulgar_book. We begin the tutorial with an introduction of high-dimensional data and why visualization is important. We then introduce tour methods and show how we can recognize structure in high-dimensional data. Then we show how to apply these methods in three settings: for effective dimension reduction, including non-linear methods; for understanding solutions from cluster analysis using visualization; and for building better classification models with visual input. Participants should have a good working knowledge of R, and some background in multivariate statistical methods and/or data mining techniques.

Speakers

Dianne Cook

Professor, Monash University

Dianne Cook is Professor of Statistics at Monash University in Melbourne, Australia. Her research is on visualisation of high-dimensional data, and on bridging the gap between exploratory graphics and statistical inference. She is a Fellow of the American Statistical Association... Read More →

Ursula Laa

Ass.Prof., University of Natural Resources and Life Sciences

Ursula is an Assistant Professor at the Institute of Statistics of the University of Natural Resources and Life Sciences in Vienna. She works on new methods for the visualization of multivariate data and models, and on interdisciplinary applications of statistics and data science... Read More →

Monday July 8, 2024 09:00 - 12:30 CEST
Flachgau

Data visualisation, Tutorial

09:00 CEST

Tutorial: Debugging in R - Shannon Pileggi, The Prostate Cancer Clinical Trials Consortium

Learn how to unlock your programming superpower with debugging techniques! In this workshop, we will review code troubleshooting tips, discuss debugging functions (traceback(), browser(), debug(), trace(), and recover()), and distinguish between strategies for debugging your own code versus someone else’s code.

Speakers

Shannon Pileggi

Lead Data Scientist, The Prostate Cancer Clinical Trials Consortium

Shannon Pileggi (she/her) is a Lead Data Scientist at The Prostate Cancer Clinical Trials Consortium, a frequent blogger, and a member of the R-Ladies Leadership team. She enjoys automating data wrangling and data outputs, and making both data insights and learning new material d... Read More →

Monday July 8, 2024 09:00 - 12:30 CEST
Pongau

Efficient programming, Tutorial

09:00 CEST

Tutorial: Efficient Data Analysis with Data.Table - Paola Corrales, R-Ladies/rOpenSci/Carpentries & Elio Campitelli, Universidad de Buenos Aires

data.table is one of the most efficient open-source in-memory data manipulation packages available today. It can summarise, compute new variables, re-arrange tables and perform group-wise operations quickly, and memory efficiently thanks to its highly optimised C code. It also provides fast alternatives to base R functions for reading and writing files. This three-hour tutorial will introduce participants to data.table’s basics. Through live coding sessions and hands-on exercises, participants will learn how to use data.table as part of their data analysis pipeline; from reading data into memory to writing the results back, including exploration, data manipulation and joins. The tutorial will also lay the foundations for learning more advanced features, such as special symbols and combined operations. We will finish the tutorial with an invitation to join the data.table community and learn how to contribute to the package.

Speakers

Elio Campitelli

Lic, Universidad de Buenos Aires

I’m a PhD student in atmospheric sciences at the Centre for Ocean and Atmospheric Research, where I study the atmospheric circulation in the Southern Hemisphere and how it affects the weather in South America. I’m also the maintainer for several R packages and give courses.

Paola Corrales

PhD, R-Ladies/rOpenSci/Carpentries

Paola has a PhD in Atmospheric Science and has experience working with Numerical Weather Prediction models using HPC systems and programming languages such as R, bash, and Fortran. She is an active R user and developer and contributes to many communities, such as R-Ladies and rOpenSci... Read More →

Monday July 8, 2024 09:00 - 12:30 CEST
Attersee

Efficient programming, Tutorial

09:00 CEST

Tutorial: Data Anonymisation for Open Science - Jiří Novák & Oscar Thees, UZH, FHNW; Marko Miletic, Bern University of Applied Sciences; Alžběta Beranová, Czech Statistical Office

One of the key elements of open science is open data that are available to a wide spectrum of users. Unfortunately, many datasets cannot be publicly available mostly for privacy reasons because data protection laws fundamentally restrict personal data use. In this tutorial, we will go through methods of statistical disclosure control with different anonymisation approaches that can be used to protect data confidentiality. These methods either modify or synthesise data so that they can be disclosed without revealing confidential information that may be associated with specific respondents. In particular, we will discuss non-perturbation and perturbation methods and also methods for synthetic data generation. For these purposes, the usage of packages sdcMicro, simPop, and synthpop will be shown.

Speakers

Jiří Novák

Ph.D. student, UZH, FHNW

Jiří Novák received his first doctorate in Statistics from the Prague University of Economics and Business and is currently pursuing a second doctorate at the University of Zurich. In his previous research, he focused on statistical disclosure control for microdata from population... Read More →

Marko Miletic

Scientific project collaborator and software developer, Bern University of Applied Sciences

Marko Miletic is a scientific project collaborator and software developer at the Institute for Optimisation and Data Analysis at the Bern University of Applied Sciences where he focusses on the development and advancement of anonymization methods for event and longitudinal data for... Read More →

Oscar Thees

FHNW

Oscar Thees is an economist by training and a research associate at the Empirical Economic and Social Research group at the University of Applied Sciences Northwestern Switzerland. He is also doing his PhD at the Vienna University of Technology on the topic of anonymization of event... Read More →

Alžběta Beranová

Monday July 8, 2024 09:00 - 12:30 CEST
Tennegau

Open and reproducible science, Tutorial

09:00 CEST

Tutorial: Streamlining R Package Development with Github Actions Workflows - Daphne Grasselly & Pawel Rucki, Roche; Dinakar Kulkarni, Genentech

GitHub Actions provide an automated workflow for continuous integration and deployment, enhancing collaboration and code quality. This tutorial aims to demystify GitHub Actions, offering insights into their fundamentals and guiding participants through the process of crafting reusable actions tailored for R package development. The tutorial begins with an overview of GitHub Actions, elucidating their role in automating software workflows and boosting productivity in the R programming ecosystem. Attendees will gain a comprehensive understanding of the basics, including syntax, triggers, and workflow components, paving the way for seamless integration into their development pipelines. Building on this foundation, the tutorial delves into the creation of reusable actions, emphasizing best practices for designing modular, versatile components. The tutorial also showcases the benefits of running both development as well as CI/CD workflows in a common Docker container environment to guarantee reproducibility. Participants will learn how to encapsulate common tasks and share them across different projects, fostering a culture of code reuse within the R community.

Speakers

Franciszek Walkowiak

Senior IT Professional at Roche, Roche

DevOps engineer with 4 years of experience in the pharmaceutical industry. I have worked with Amazon Web Services, Google Cloud Platform, and infrastructure as code practices. Currently, I support teams of R software developers by employing DevOps practices and tools such as GitLab... Read More →

Daphne Grasselly

Senior Data Scientist - Roche, Roche

I am currently working at Roche as a Senior Data Scientist. My main focus is on enhancing automation workflows for efficient package delivery, particularly in the realm of R development within the pharmaceutical industry. I am passionate about optimizing processes and improving code... Read More →

Pawel Rucki

Ms, Roche

Pawel graduated in 2015 from University of Warsaw, Econometrics and Quantitative Economics. Working with R for almost 10 years now, Pawel applied it in the field of geospatial data analysis, credit risk assessment, financial provisions calculation and clinical trial data analysis... Read More →

Monday July 8, 2024 09:00 - 12:30 CEST
Wolfgangsee

R workflow + deployment + production, Tutorial

12:30 CEST

Lunch - Attendees On Own

Monday July 8, 2024 12:30 - 14:00 CEST
TBD

Breaks + Special Events

14:00 CEST

Tutorial: Futureverse: Friendly Parallelization in R - Henrik Bengtsson, University of California San Francisco (UCSF)

This 3-hour workshop introduces the Futureverse - A Unifying Parallelization Framework in R for Everyone - for any R developer looking for options to run their R code in parallel. Designed for participants familiar with R, the workshop does not require prior knowledge of R package development or parallel computing. It is structured into four parts, covering an introduction to futures, managing outputs, warnings, errors, map-reduce parallelization, and concluding with an open discussion. Futureverse (https://www.futureverse.org) is designed, so existing code can be parallelized with a minimal amount of modifications, and allow the developer to keep their focus on the main purpose of their code. We will explore parallel alternatives to familiar programming patterns in base R apply (future.apply), Tidyverse purrr (furrr), and foreach with doFuture. Each part will have hands-on learning components, ensuring participants leave with the skills to apply these techniques to their own projects. Instructions to participants will be made available online at https://www.futureverse.org/tutorials.html prior to the event. All material will be made available there after the event.

Speakers

Henrik Bengtsson

Henrik Bengtsson, University of California San Francisco (UCSF)

UCSF, R Foundation, R Consortium, MSC in Computer Science, PhD in Mathematical Statistics, Applied, large-scale research in Bioinformatics and Genomics. R since 2000.

Monday July 8, 2024 14:00 - 17:30 CEST
Salzburg II

Efficient programming, Tutorial

14:00 CEST

Tutorial: Good Software Engineering Practice for R Packages - Friedrich Pahlke, RPACT & Daniel Sabanés Bové, RCONIS

Join us for an engaging 3-hour face-to-face course designed to enhance your R programming skills with a focus on developing reliable R packages used in statistics or data science. This course is a blend of informative presentations and interactive team exercises, aimed at equipping participants with practical tools and techniques for engineering high-quality R packages. Throughout the session, you will collaborate to build a small R package that adheres to clean code rules and incorporates good software engineering practices. This course is tailored for individuals who are comfortable with writing functions in R and are looking to elevate their package development skills. Bring your laptop and be prepared to transform your approach to R package development through hands-on learning and collaboration. Whether you're looking to improve your workflow, meet regulatory standards, or simply enhance the quality of your statistical tools, this course offers valuable insights and skills to achieve your goals.

Speakers

Daniel Sabanés Bové

Ph.D., RCONIS

Daniel Sabanés Bové studied statistics and obtained his PhD in 2013. He started his career with 5 years in Roche as a biostatistician, then worked 2 years at Google as a Data Scientist, before rejoining Roche in 2020, where he founded and led the Statistical Engineering team. Daniel... Read More →

Friedrich Pahlke

CEO, RPACT

Friedrich Pahlke, with a PhD from the University of Lübeck (2008), has been an independent consultant in computer science, data science, and biostatistics since 2008. Previously, he was a Research Fellow at Lübeck's Institute of Medical Biometry and Statistics. As RPACT's co-founder... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Tennegau

Efficient programming, Tutorial

14:00 CEST

Tutorial: Web Scraping with Rvest - Hadley Wickham, Posit

In this tutorial, you'll learn the basics of web scraping with the rvest package. We'll start with a discussion of the ethics or scraping and that basic structure of an HTML page. You’ll then learn about CSS selectors and how you can use them to identify the “rows” and “columns” of the data that you want to extract. Finally, you’ll write R code that uses the rvest package to turn web pages into tidy data frames. We'll also see how you can scrape paginated sites by combining rvest with httr2, and learn two techniques for scraping dynamic sites that generate HTML with javascript.

Speakers

Hadley Wickham

Chief Scientist, Posit

Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Salzburg I

Interfaces with other programming languages, Tutorial

14:00 CEST

Tutorial: Survival Analysis with Tidymodels - Hannah Frick, Posit & Max Kuhn, Posit PBC

Survival analysis is now supported across the tidymodels framework, a collection of R packages for modeling and machine learning using tidyverse principles. It covers the entire predictive modeling workflow from data splitting, resampling, feature engineering, model fitting, and performance evaluation to tuning. It provides a consistent interface with composable functions that allow beginners a safe start and advanced users access to more specialized techniques such as feature engineering on text data or tuning via racing methods. The recent addition of dedicated performance metrics has enabled us to support tuning of survival models and unlock the entire framework for survival analysis. This workshop focuses on the core components of tidymodels to get you up and running with predictive survival analysis. This workshop is for you if you - are familiar with basic survival analysis such as censoring of time-to-event data, Kaplan-Meier curves, proportional hazards models - are familiar with the basic predictive modeling workflow such as split in train and test set, resampling, tuning via grid search - want to learn how to leverage the tidymodels framework for survival analysis

Speakers

Max

principal software engineer, Posit PBC

Max Kuhn is a software engineer at Posit PBC where he is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret and tidymodels. He has a Ph.D. in Biostatistics. Max was a Senior Director of Nonclinical Statistics at Pfizer Global R&D and... Read More →

Hannah Frick

Senior Software Engineer, Posit

Hannah Frick is a software engineer on the tidymodels team at Posit. She holds a PhD in statistics and has worked in interdisciplinary research and data science consultancy. She is a co-founder of R-Ladies Global.

Monday July 8, 2024 14:00 - 17:30 CEST
Pongau

Predictive modelling and forecasting, Tutorial

14:00 CEST

Tutorial: Tidy Time Series Analysis and Forecasting - Mitchell O'Hara-Wild, Nectric

Organisations of all types collect vast amounts of time series data, and there is a growing need for time series analytics to understand how things change in our fast-moving world. This tutorial provides a practical introduction to time series analytics and forecasting using R, utilising the tidyverse and tidy time series tools to enable analysis across many time series. Attendees will learn about commonly seen time series patterns, and how to find them with specialised time series graphics created with ggplot2. Then we will use fable to capture these patterns with statistical time series models, and produce probabilistic forecasts. Finally, participants will gain insights into evaluating model performance, ensuring the accuracy and reliability of their forecasts. Through a combination of foundational concepts and practical demonstrations, this tutorial equips participants with the skills to extract meaningful insights from time series data for informed decision-making in various domains.

Speakers

Mitchell O'Hara-Wild

Data Scientist, Nectric

Mitchell O’Hara-Wild (he/him) is a PhD candidate at Monash University, creating new techniques and tools for forecasting large collections of time series with Rob Hyndman and George Athanasopoulos. He is the lead developer of the tidy time-series forecasting tools fable and feasts... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Wolfgangsee

Predictive modelling and forecasting, Tutorial

14:00 CEST

Tutorial: Getting Started with Quarto for Your Scientific Publications - Christophe Dervieux, POSIT PBC

Enhancing scientific publishing through Quarto is the workshop's focus, designed for those familiar with R Markdown and newcomers to Quarto. Quarto extends the foundational capabilities of R Markdown, introducing advanced features that elevate publishing projects. The tutorial navigates through Quarto's core functionalities, From converting R Markdown files to creating new Quarto content from scratch while exploiting Quarto's full spectrum for sophisticated, publication-ready outputs. Attendees will explore the broad range of Quarto's applications, including single-document formats, as well as more complex Quarto projects. The workshop provides hands-on experiences designed for interactive participation, enabling attendees to integrate Quarto's tools into known publishing workflows and learning to initiate new projects. At the end of the session, participants will have the skills needed to upgrade their current R Markdown workflows and start new projects with Quarto. This will enable them to create refined scientific publications that communicate research findings and results effectively.

Speakers

Christophe Dervieux

Open Source Software Engineer, POSIT PBC

Christophe Dervieux is an Open Source Software Engineer at Posit PBC. He specializes in the R Markdown and Quarto ecosystems. With contributions to key R packages and co-authorship in the R Markdown Cookbook, Christophe brings a deep understanding of scientific publishing tools to... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Pinzgau

Quarto and reporting, Tutorial

14:00 CEST

Tutorial: Building Effective Docker Images: R Edition - Andrew Collier, Fathom Data

*Abstract* Docker is a cornerstone of modern software development and deployment, ensuring reproducibility, scalability, and seamless environment management across different platforms. This tutorial will examine the art and science of crafting efficient and optimised Docker images specifically tailored for R applications. *Description* Docker has revolutionised how we develop, deploy, and run applications by offering a lightweight, portable solution for application containerisation. For the R community, Docker presents an vital tool for addressing common challenges such as "it works on my machine", dependency management, and consistent environments across development and production systems. However, creating effective Docker images that are optimised for R applications requires a nuanced understanding of both Docker and R ecosystems. This tutorial aims to bridge that gap, providing attendees with the knowledge to build Docker images that are not only functional but also optimised for performance, size, and security.

Speakers

Andrew Collier

Dr, Fathom Data

Andrew is Lead Data Scientist at Fathom Data. He spends his days tinkering with R, Python and Docker.

Monday July 8, 2024 14:00 - 17:30 CEST
Attersee

R workflow + deployment + production, Tutorial

14:00 CEST

Tutorial: Contributing to R - Gabriel Becker, None & Heather Turner, University of Warwick

Did you always want to contribute to (base) R but don't know how? This tutorial shows cases where and how users have contributed actively to (base) R, by submitting bug reports with minimal reproducible examples, how testing, reading source code, and providing patches to the R source code has helped making R better. A selection of past bug reports are provided for you to practice debugging. For bugs that have been resolved you can check what happened after the bug was reported.

Speakers

Gabriel Becker

Statistical Computing Consultant, None

Gabe is a frequent collaborator with R-core, having contributed 7 novel features to R including proposing and subsequently working with Luke Tierney on the internal ALTREP framework. He is the author of multiple R packages, including the rtables package for creating reporting tables... Read More →

Heather Turner

Dr, University of Warwick

Heather Turner is a Research Software Engineering Fellow and Associate Professor at the University of Warwick. She is an active member of the R community, in particular, she is on the board of the R Foundation and chairs both the R Contribution Working Group and the Forwards taskforce... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Flachgau

R workflow + deployment + production, Tutorial

08:00 CEST

Registration

Tuesday July 9, 2024 08:00 - 17:30 CEST
Salzburg Foyer

Registration

09:00 CEST

Keynote: Welcoming & Opening Remarks

Tuesday July 9, 2024 09:00 - 09:20 CEST
Salzburg I + II

Keynote Sessions

Level Any

09:20 CEST

Keynote Sessions to be Announced

Tuesday July 9, 2024 09:20 - 09:40 CEST
Salzburg I + II

Keynote Sessions

Level Any

09:40 CEST

Keynote: Kurt Hornik, Wirtschaftsuniversität Wien

Speakers

Kurt Hornik

Professor of Statistics & Mathematics, Chair, Department of Finance, Accounting and Statistics, Wirtschaftsuniversität Wien

Kurt Hornik was born and raised in Austria, and holds a PhD in applied mathematics from Technische Universität Wien. Since 2003 he is professor of statistics and mathematics at Wirtschaftsuniversität Wien, where he currently also serves as chair of the Department of Finance, Accounting... Read More →

Tuesday July 9, 2024 09:40 - 10:40 CEST
Salzburg I + II

Keynote Sessions

Level Any

10:30 CEST

Sponsor Showcase

Tuesday July 9, 2024 10:30 - 18:00 CEST
Salzburg Foyer

Sponsor Showcase

10:40 CEST

Break

Tuesday July 9, 2024 10:40 - 11:00 CEST
TBD

Breaks + Special Events

11:00 CEST

Create Your Own Recipes Steps for Omics Data: The Scimo Package - Antoine Bichat, Servier

The rise of advanced high-throughput sequencing technologies has led to a massive increase in the production of omics data, encompassing genomics, transcriptomics, proteomics, metagenomics, and more. To effectively explore and analyze omics data, specialized preprocessing techniques are essential, including feature normalization, selection, and aggregation. However, many of these specific methods were not initially available in the original 'recipes' package. As a response, we have developed an extension package, 'scimo', designed to seamlessly integrate these techniques into the 'tidymodels' ecosystem. 'scimo' offers a comprehensive suite of preprocessing steps tailored for omics data analysis, while remaining adaptable to other data types. During this presentation, we will showcase the capabilities of 'scimo' and provide insights into creating your own 'recipes' extension package. Additionally, we will discuss strategies to navigate the potential pitfalls that we have encountered during the development process. https://github.com/abichat/scimo

Speakers

Antoine Bichat

PhD, Servier

Antoine Bichat is a data scientist at Servier, where he works on pediatric oncology projects within the computational medicine team. He holds a PhD in biostatistics and has also worked in a biotech specialized in metagenomics. Antoine loves dataviz, teaching R and experimenting with... Read More →

Tuesday July 9, 2024 11:00 - 11:20 CEST
Attersee

Biostatistics + epidemiology + bioinformatics

11:00 CEST

Rtables: Modeling and Creating Complex Production-Grade Reporting Tables in R - Gabriel Becker, None

Tabular summaries of complex data are a crucial tool for exploring and describing complex data. The structure of reporting tables often go well beyond a typical one- or two-way frequency table, e.g., those required in clinical trial reporting. rtables provides a production-ready, foundational framework for declaring and building complex structured tables. We will present three aspects of our work. First, we will connect reporting tables, faceted data visualizations, and the grammar of graphics which many analysts will already be comfortable using. We will then showcase our table framework relying on these connections. Finally we will illustrate the creation of realistic, non-trivial tables using rtables.

Speakers

Gabriel Becker

Statistical Computing Consultant, None

Tuesday July 9, 2024 11:00 - 11:20 CEST
Salzburg I

Data visualisation

11:00 CEST

Moju-Kapu: How {Mirai} and {Crew} Are Powering the Next Generation of Parallel Computing in R - Charlie Gao, Hibiki AI Limited & Will Landau, Eli Lilly and Company

The {mirai} package provides the latest in parallel computing technology, and the {crew} package extends {mirai} to high-performance computing environments. {mirai} and {crew} offer distributed computing for demanding workflows such as machine learning, simulation, and Bayesian data analysis. Each package is designed according to the ‘moju-kapu’ (modular encapsulation) concept. Interfaces are simple and easy to use standalone, while also inviting third-party extensions and integrations. The result is a multi-tiered, flexible framework with convenient entry points at every level of complexity. Together, {mirai} and {crew} form an elegant and performant foundation for an emerging landscape of asynchronous and parallel programming tools in R. They already provide new backends for {parallel}, {promises}, {plumber}, {targets}, and Shiny, as well as new high-level interfaces such as {crew.cluster} for traditional clusters and {crew.aws.batch} for the cloud. This talk explores the layered engineering approach of {mirai} and {crew} through the lens of ‘moju-kapu’, in which each package is designed to be at the same time self-sufficient, but also a modular part of an encapsulating system.

Speakers

Will Landau

Senior Research Scientist, Eli Lilly and Company

Will Landau is a statistician and software developer in the life sciences. He earned his PhD in Statistics at Iowa State in 2016, and he specializes in Bayesian methods, high-performance computing, and reproducible workflows. Will is the creator of the {targets} R package, a reproducible... Read More →

Charlie Gao

Director, Hibiki AI Limited

Charlie Gao is a director of Hibiki AI Limited, leading financial markets machine learning research in the UK. A graduate of Trinity College, University of Cambridge, Charlie previously held roles in international investment banking at Morgan Stanley, Citigroup and Standard Chartered... Read More →

Tuesday July 9, 2024 11:00 - 11:20 CEST
Pongau + Flachgau

Efficient programming

11:00 CEST

Quarto: Elevating R Markdown for Advanced Publishing - Christophe Dervieux, POSIT PBC

In the dynamic landscape of data analysis and scientific publishing, R Markdown has been pivotal for the R community, allowing users to seamlessly blend code, narrative and results in a cohesive narrative. Now, Quarto emerges as a powerful tool that builds on years of experience but also goes beyond R Markdown, providing more flexibility and power in scientific communication. This talk aims to unveil Quarto's novel features with its latest 1.4 stable release. We will delve into how Quarto enhances the user experience for R enthusiasts, maintaining the syntax familiarity of R Markdown while introducing innovative functionalities like sophisticated figure and table handling, better project management, and versatile publication options across multiple formats, similar to R Markdown ones. Why switch to Quarto from R Markdown? In which cases? How does Quarto integrate with existing workflows? This presentation aims at answering usual user questions, to inspire them to try out Quarto.

Speakers

Christophe Dervieux

Open Source Software Engineer, POSIT PBC

Tuesday July 9, 2024 11:00 - 11:20 CEST
Salzburg II

Quarto and reporting

11:00 CEST

Past, Present, and Future of Data.Table - Tyson Barrett, Highmark Health

This talk will walk through the past, present, and future of the data.table package. The timing of this talk is particularly important as changes to the governance of the package aimed at providing a solid foundation for long-term maintenance of the package have recently been approved. As a leading data wrangling and cleaning package in the R ecosystem, the goals of the new governance is to create a broader community that can more easily engage with the development of the package and find support for its use.

Speakers

Tyson Barrett

Manager Research Analytics and Enablement, Highmark Health

Tyson Barrett, PhD is the current data.table maintainer working with a talented team of developers and a wonderful development community. During his day job, he works with a team of researchers at Highmark Health, a healthcare organization, to improve healthcare outcomes, costs, and... Read More →

Tuesday July 9, 2024 11:00 - 11:20 CEST
Pinzgau + Tennegau

R workflow + deployment + production

11:00 CEST

{Mmrm}: a Robust and Comprehensive R Package for Implementing Mixed Models for Repeated Measures - Daniel Sabanés Bové, RCONIS

Mixed models for repeated measures (MMRM) analysis has been extensively used to analyze longitudinal datasets. SAS has been the gold standard for this analysis in the past, and so far R packages fall short for one of the following reasons: model convergence issues, unavailability of covariance structures or adjusted degrees of freedom, or numerical results being far from SAS. To fill in this important gap in the open-source statistical software landscape, a cross-company workstream of openstatsware.org has developed the new {mmrm} R package. A critical advantage of {mmrm} over existing implementations is that it is faster and converges more reliably. It also provides a comprehensive set of features: users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjusted degrees of freedom, extract the least square means estimates using the emmeans package, and use tidymodels for easy model fitting. We introduce the modeling framework, the implementation strategy and discuss open source collaboration as a critical ingredient to success.

Speakers

Daniel Sabanés Bové

Ph.D., RCONIS

Tuesday July 9, 2024 11:00 - 11:20 CEST
Wolfgangsee

Statistical modelling

11:20 CEST

MIEP: Make-It-Easy-Pipeline - Alberto Corradin, Veneto Institute of Oncology IOV – IRCSS, Padova, Italy;

Make-it-easy-pipeline (MIEP) is an integrated, interactive, user-friendly pipeline for RNA-seq data. This new R package helps researchers develop testable hypotheses and select targets for wet lab functional testing. MIEP performs statistical testing, annotates sequences, corrects biases, and summarizes results in HTML tables, volcano plots, and heat maps. Shiny apps allow to modify default settings, select a data shrinkage method, and set thresholds to identify differentially expressed features. Use of MIEP in a cancer research project facilitated the identification of phenotype-linked signal transduction pathways whose biological relevance was experimentally verified. MIEP’s functions include: - Dimensionality reduction and visualizations by PCA, UMAP, tSNE, SVM - Enrichment of Gene Ontology (GO) terms - Calculation of features’ importance subsequent to classification (conditional random forests) - Gene set editing based on the ranking of GO terms or features’ importance - Analyses focused on gene sets and user-friendly graphical representations. These characteristics and careful handling of exceptions make MIEP an easy-to-use tool for biologists with basic programming skills.

Speakers

Alberto Corradin

Dr., Veneto Institute of Oncology IOV – IRCSS, Padova, Italy;

Alberto Corradin is a Data Scientist with expertise in statistics and data analysis, development of mathematical models, machine learning and artificial intelligence. He is currently a researcher at Veneto Institute of Oncology IOV – IRCSS, Padova, Italy. He is co-author of 14 peer-reviewed... Read More →

Tuesday July 9, 2024 11:20 - 11:40 CEST
Attersee

Biostatistics + epidemiology + bioinformatics

11:20 CEST

TeX Typesetting in R Graphics: the {Xdvir} Package - Paul Murrell, The University of Auckland

Text labels are essential components of any data visualisation, whether as titles, captions, axis labels, or general annotations. While it is possible to render text in R graphics, there are only limited facilities for typesetting text. As a simple example, there is no way in R graphics to lay out a paragraph of text with full justification. This talk will describe the {xdvir} package for R, which fills this gap by merging the sophisticated typesetting capabilities of the TeX system with R graphics.

Speakers

Paul Murrell

Associate Professor, The University of Auckland

Paul Murrell is an Associate Professor in the Department of Statistics at The University of Auckland. He is a member of the R-core development team, mostly active in the graphics system, and has developed several extension packages for R, also mostly related to graphics and data... Read More →

Tuesday July 9, 2024 11:20 - 11:40 CEST
Salzburg I

Data visualisation

11:20 CEST

Future.Mirai: Use the Mirai Parallelization Framework in Futureverse - Easy! - Henrik Bengtsson, University of California San Francisco (UCSF)

In this talk, I am proudly presenting the 'future.mirai' package - a parallel backend for the Futureverse that use the 'mirai' framework for parallelization. The 'mirai' package implements a distributed computing environment for R, where R expressions can be resolved on local and remote machines using novel techniques. It was introduced in 2022 and has since gained additional exciting features, e.g. secure communication with parallel workers. The 'future.mirai' package makes 'mirai' readily available to all users and developers already using Futureverse for parallelization. By design, all existing Futureverse code will work out of the box without having to change any code. I will give a brief overview of 'mirai' and 'future' - the core of Futureverse - before introducing 'future.mirai' and present how well the two frameworks work together. In this presentation, you will learn how simple it is to run R in parallel using Futureverse with a focus on 'mirai'.

Speakers

Henrik Bengtsson

Henrik Bengtsson, University of California San Francisco (UCSF)

UCSF, R Foundation, R Consortium, MSC in Computer Science, PhD in Mathematical Statistics, Applied, large-scale research in Bioinformatics and Genomics. R since 2000.

Tuesday July 9, 2024 11:20 - 11:40 CEST
Pongau + Flachgau

Efficient programming

11:20 CEST

Mlr3summary: Concise and Interpretable Summaries for Machine Learning Models - Susanne Dandl, LMU Munich and Munich Center for Machine Learning & Marc Becker, LMU Munich

In machine learning (ML), transparency and interpretability are central to promoting trust and informed decision-making. This contribution introduces a novel R package for ML model summaries, centered on performance measures and interpretation methods. The package draws inspiration from the summary method for (additive/generalized) linear models in R which generates a table that encapsulates model performance, effect sizes and directions for individual variables, and model complexity. In our contribution, we extend this methodology to non-parametric ML models, creating a concise yet informative table that facilitates analogous conclusions. The clarity of the structured output can enhance and expedite the model selection process, making it a helpful tool for practitioners and researchers alike. Our talk presents the core functionality of the R package and addresses some implementation details, as well as potential pitfalls. With this, we hope to contribute some advancements in model transparency and comparability in the field of ML.

Speakers

Marc Becker

-, LMU Munich

Marc Becker is working on the mlr3 project as a research software engineer and is mainly responsible for the optimization packages. He obtained a Bachelor's Degree (B.Sc.) in Geography from the Freie Universität Berlin and a Master's Degree (M.Sc.) in Geoinformatics from the Friedrich-Schiller-Universität... Read More →

Susanne Dandl

Dr., LMU Munich and Munich Center for Machine Learning

Susanne is a postdoctoral researcher, focussing on the intersection between machine learning, statistics and causality. She did her Bachelor and Master at the Department of Statistics, LMU Munich, Germany. She obtained her doctorate in December 2023 from the same university.

Tuesday July 9, 2024 11:20 - 11:40 CEST
Salzburg II

Quarto and reporting

11:20 CEST

WebR, and the Future of Building Web Applications with R - Colin Fay, ThinkR

One of the great joys of being a software engineer is that things keep moving. New technologies, new languages, new frameworks, every now and then new things are emerging that are changing the way we build software. In the past couple of years in the R world, we've been building and deploying web apps and API in a pretty stable way: building {shiny} app with frameworks like {golem} or {rhino}, API with {plumber}, and sending them to a server that can launch R and make our R code available to the world. In the past months, something new has emerged: webR, a version of R compiled for WebAssembly (WASM), allowing to run R in the browser and un NodeJS, with no need for an R installation. This opened a lot of new doors, JavaScript being the tool of choice when it comes to building web apps and API. In this talk, Colin will start by explaining what webR is and how it will change the way we think about building and deploying R code on the web. He will present `webrcli` and `spidyr`, two tools for creating NodeJS apps that can call R code via webR. And finally, Colin will also focus on the challenges that will arise with `webR`, and how we'll build web apps with R in the future.

Speakers

Colin Fay

Lead Developer at ThinkR, ThinkR

Colin FAY is a lead developer at ThinkR, a french agency of R experts. During the day, he helps companies by building tools and deploying infrastructure. His main areas of expertise are data & software engineering, web applications (frontend and backend), and R in production. During... Read More →

Tuesday July 9, 2024 11:20 - 11:40 CEST
Pinzgau + Tennegau

R workflow + deployment + production

11:20 CEST

Statistical Computing with Vectorised Distributions - Mitchell O'Hara-Wild, Nectric

The uncertainty of model outputs is often absent or hidden in R, and tools for interacting with distributions are limited. For example, most prediction methods in R only produce point predictions by default. Although it is possible to obtain other parameters and form the complete distribution, additional knowledge about the distribution's shape and properties are needed. The distributional package vastly simplifies creating and interacting with distributions in R. The package provides vectorised distributions and supports the calculation of various statistics without needing to use shape-specific d*/p*/q*/r* functions. Statistics can be easily calculated for distributions in the same vector, regardless of shape. Manipulating distributions is also supported, including applying transformations, inflating values, truncating, and creating mixtures of distributions. When vectors are stored as data frame columns, these operations integrate seamlessly with tidyverse workflows. Distributions can also be visualised with ggplot2 using the ggdist extension package, which offers many graphical representations of uncertainty.

Speakers

Mitchell O'Hara-Wild

Data Scientist, Nectric

Tuesday July 9, 2024 11:20 - 11:40 CEST
Wolfgangsee

Statistical modelling

11:40 CEST

Teal - an Open Source Framework for Data Exploration in Clinical Trials and Beyond - Pawel Rucki, Roche

The {teal}, an open source framework for Shiny app development, was designed to accelerate the data exploration process within clinical trials. Throughout the years, it has grown into a robust and versatile solution, gaining recognition from various companies in the industry. In this talk, I will introduce you to the product and its core concepts and features. I will showcase a few practical applications from the clinical trials context and beyond.

Speakers

Pawel Rucki

Ms, Roche

Tuesday July 9, 2024 11:40 - 12:00 CEST
Attersee

Biostatistics + epidemiology + bioinformatics

11:40 CEST

The Treachery of Images: Exploring the Interdependence Between Graphics, Statistics, and Interaction - Adam Bartonicek, The University of auckland

With the rise of web technologies, interactive data visualizations have become a staple of data presentation. Yet, despite their growing popularity, researchers still point to the lack of a formalized pipeline for turning raw data into summary statistics. The cause of this lack may be a subtle yet profound issue: while we often treat statistics and graphical objects as independent, they are in fact deeply connected. Consider a typical stacked barplot. Many researchers have noted that stacking some summaries will produce a valid overall statistic (e.g. count, sum), whereas stacking others will not (e.g. mean). But what are the mathematical properties that make this possible? Are there other operators that can be stacked? The goal of this talk is to delve into the relationship between graphical objects, statistics, and interaction. Specifically, by discussing a handful of concepts from category theory, I hope to give you a new appreciation of the rich structure that lies beyond the figures we look at every day. Finally, this talk will also briefly introduce a new R package for interactive data exploration – plotscaper – which is an attempt to implement some of these ideas.

Speakers

Adam Bartonicek

PhD Candidate/BscHons, The University of auckland

Adam is a PhD student at The University of Auckland, New Zealand, with a primary interest in interactive data visualization (under the primary supervision of associate professor Dr. Simon Urbanek). He is a keen user of R and also a fan of web technologies and Bayesian statistics... Read More →

Tuesday July 9, 2024 11:40 - 12:00 CEST
Salzburg I

Data visualisation

11:40 CEST

{Shiny.Tictoc} Measuring Shiny Performance, Without the Headaches - Ryszard Szymański, Appsilon

Slow dashboards lead to poor UX and cause users to lose interest, or even become frustrated. However, improving performance can be an equally frustrating process! Starting from issues with installing additional software, getting hard to interpret results or results that differ from the performance reported by end users. Enter {shiny.tictoc} - a new tool for measuring the performance of Shiny apps. Simply add the tool to your app, interact with it, and export the results. In fact, user tests conducted at Appsilon showed that developers who previously never heard of {shiny.tictoc} were able to benchmark their apps in less than 5 minutes! Best part? It all happens in the browser without the need to install any additional software!

Speakers

Ryszard Szymański

Staff Engineer, Appsilon

Ryszard Szymański is a Staff Engineer at Appsilon. He has a background in computer science and is the author of open source packages like hackeRnews, shiny.emptystate and featureflag. Outside of work he is interested in basketball, cooking and baking.

Tuesday July 9, 2024 11:40 - 12:00 CEST
Pongau + Flachgau

Efficient programming

11:40 CEST

Achieving Corporate Design Consistency in Reports with Indiedown: An R Markdown Solution - Angelica Becerra, Cynkra

Creating organizational reports, letters, and documents that adhere to strict layouts and graphical design standards in an efficient a consistent way is a significant challenge. This session presents the package [indiedown](https://indiedown.cynkra.com) designed to facilitate the creation of PDF R Markdown templates that align with organizations' design layouts and guidelines. A customized R Markdown template represents a significant step towards aligning data science with corporate reporting needs. By providing a structured, reproducible, and visually consistent approach to document creation, this tool empowers organizations to maintain high standards of communication and documentation. This work emphasizes the broader applicability of R Markdown in organizational and business settings distributing templates via an R package. By facilitating the creation of visually appealing, data-rich documents that comply with corporate identity. This presentation will not only offer practical solutions to common reporting challenges but also encourage dialogue on customized R tools that support specific corporate needs.

Speakers

Angelica Becerra

Data Scientist, Cynkra

Angelica Becerra is a statistician and data scientist with experience in consulting governmental offices on large-scale survey studies and statistical analysis. She has developed data analysis projects using R and Python with a focus on visualization, web scraping, and reports. Angelica... Read More →

Tuesday July 9, 2024 11:40 - 12:00 CEST
Salzburg II

Quarto and reporting

11:40 CEST

Building Bilingual Bridges with Multilingual Manuals - Elio Campitelli, Universidad de Buenos Aires

The vast majority of packages are documented in English, due to the language's status as de-facto lingua franca. But what if your package is designed with a specific demographic in mind that could be better served by documentation in another language? Non-English documentation would make your package more accessible to them at the expense of isolating it from the wider international community. But... ¿por qué no los dos? The rhelpi18n package adds support for multilingual documentation in R so you can have the best of both worlds. Package authors or community projects can create translation modules that users can install to access documentation in their languages directly from R. The talk will include a high-level view of how this package extends R help system and will explain how people can create, install and use translation modules for R packages.

Speakers

Elio Campitelli

Lic, Universidad de Buenos Aires

Tuesday July 9, 2024 11:40 - 12:00 CEST
Pinzgau + Tennegau

R workflow + deployment + production

11:40 CEST

Bayesian Modeling of Panel Data with R Package Dynamite - Santtu Tikka, University of Jyväskylä

In this talk, I will present dynamite: an R package for Bayesian inference of intensive panel data comprising of multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, and latent factors via the dynamic multivariate panel model (DMPM) framework. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of both individual-level and summarized predictions and offers a comprehensive suite of tools for visualization and model diagnostics. I will demonstrate how the package can be used to estimate long-term causal effects of interventions with proper accounting for the uncertainties in these estimates.

Speakers

Santtu Tikka

Senior Lecturer, University of Jyväskylä

Santtu Tikka is a senior lecturer at the Department of Mathematics and Statistics, University of Jyväskylä. His research focuses on causal inference and Bayesian modeling. He is also the author of several R packages such as causaleffect, dosearch and dynamite.

Tuesday July 9, 2024 11:40 - 12:00 CEST
Wolfgangsee

Statistical modelling

12:00 CEST

MS Meets R: Unravelling Cellular Lipid Networks by Integrative Analysis & Untangling Ether Lipids Th - Jakob Koch, Medical University of Innsbruck

Membrane balance relies on specific metabolic lipid interactions. We investigated how fatty acyl side chains from different lipid classes influence each other using artificial neural networks (ANNs). We analyzed profiles of different phospholipids from 15 mouse tissues and found tissue-specific patterns. With this data we trained 362 ANN model architectures in R and were able to predict mitochondrial cardiolipin remodelling based on fatty acyl pools in phospholipids. Our analysis revealed key players in mammalian cardiolipin remodelling: high oleic acid increased lipid diversity, while linoleic acid favoured uniformity. Additionally, to reliably discriminate between plasmanyl/plasmenyl lipids in mouse tissues we conducted lipidomics experiments with ion mobility spectrometry (IMS). All data integration and analysis steps were performed in R. Statistical analysis in R confirmed the validity of this IMS approach for lipid subclass separation, which is especially powerful when combined with accurate retention time characteristics.

Tuesday July 9, 2024 12:00 - 12:20 CEST
Attersee

Biostatistics + epidemiology + bioinformatics

12:00 CEST

Educational Outcomes in Higher Education, from Boxplots to Dashboards via Mixed Effects: R Showcase - Jarek Bryk, University of Huddersfield

In my role at the University of Huddersfield, a mid-size institution in the north of England, I contribute to the analysis and reports on students' educational outcomes and factors that affect differential attainment of various groups of students across the entire institution. They present the composition of student population and state of their educational outcomes at various levels of institutional hierarchy and to various internal stakeholders. In this talk, I will highlight the varied applications of R analytical, visualisation and reporting capabilities in a higher education context: from correlations between attendance and marks on individual modules, through parametrised reports on students engagement with online materials, to mixed effects models that disentangle contributions of socioeconomic status and prior qualifications to graduate outcomes, with a few maps thrown in as well. These applications demonstrate the versatility of R, used not only as a business intelligence tool, but also as a "pedagogical intelligence" tool, to help us evaluate educational practice, focus our attention on challenges and direct support to improve the students' outcomes.

Speakers

Jarek Bryk

Dr, University of Huddersfield

I am a molecular biologist/computational biologist who stumbled upon a data analyst role. I wear three hats at work: I teach data science, genomics and evolution to undergraduates; I study patterns of genetic variation in small mammals to learn about their evolution and demography... Read More →

Tuesday July 9, 2024 12:00 - 12:20 CEST
Salzburg I

Data visualisation

12:00 CEST

{Constructive} : a Nicer `dput()` Using Idiomatic Constructors - Antoine Fabri, Cynkra

It can be hard to understand an R object. print() or str() methods don't tell the full story, and dput() has several issues: * Et doesn't handle some objects * It is sometimes inaccurate, and doesn't test its accuracy * It uses only low level constructors (list(), c(), not factor(), data.frame() etc) * Its output's indentation is inconvenient The constructive package tackles those issues and more and can be used for : * Object exploration/debugging * Reproducible examples * Editing manually data in R * Snapshot tests We'll demonstrate during this talk the main features and use cases for this package.

Speakers

Antoine Fabri

Ir., Cynkra

Antoine Fabri has been a R user for more than a decade and contributed several packages to the open source community, such as constructive, flow, powerjoin, typed, unglue, boomer. He's been working as an R consultant full time at Cynkra since September 2021.

Tuesday July 9, 2024 12:00 - 12:20 CEST
Pongau + Flachgau

Efficient programming

12:00 CEST

Parametrized Nice Reports for PDF with Quarto - Thomas Vroylandt, Kantiles

Being able to produce hundreds, even thousands, of reports at once with parameterized reporting is one of the most powerful tools available to R users (and Python users too, thanks to Quarto). But, when making reports in this way, problems can arise. A chart that works well in one report looks terrible in another. A map that fits nicely into a report for one state doesn't fit in another state's report. Possibilities for errors abound. In this talk, I will share hard-earned lessons on best practices for generating parameterized reports on PDF while using web technologies (and thus avoid LaTeX). With advice on a range of topics, including formatting data, designing reports, and creating plots that work across multiple reports, this talk will help those who don't just want to do parameterized reporting, but want to do it well. This talk will cover a set of methods for producing parametrized reports for PDF, going from the good old method (Rmd and pagedown) to new ones (Quarto and pagedjs-cli or Quarto and weasyprint, Typst) plus the pagedreport package, which will gain a Quarto extension before the conference.

Speakers

Thomas Vroylandt

Co-Founder & Consultant, Kantiles

Thomas Vroylandt is a consultant at Kantiles. He also work as a consultant for R for the Rest of Us where he supports business around the world improving their reporting. He is the creator of the pagedreport package and is expert with dealing with PDF reports using pagedjs and similar... Read More →

Tuesday July 9, 2024 12:00 - 12:20 CEST
Salzburg II

Quarto and reporting

12:00 CEST

Systems Integration Tests for R Package Cohorts - Franciszek Walkowiak, Roche

One of the challenges for R developers is ensuring that their packages work correctly on an ever-increasing number of operating systems, platforms, and R versions. To aid in this endeavor, we are introducing two tools: Locksmith and Scribe. Their task is to install a cohort of R packages, along with all dependencies, and test the cohort on any kind of system. Locksmith resolves all dependencies of the cohort using provided package repositories and saves the list of all package versions and repositories to a snapshot. Scribe utilizes the snapshot to download, build, install, and check the packages in an efficient and reproducible manner. An older snapshot of packages can be restored by scribe on a new system to check for any compatibility issues. Both tools are written in Go, making their binaries easily buildable and distributable for different systems and platforms. Go also simplifies concurrent package installation and checking, significantly reducing execution time. As a result, package cohort testing can be performed frequently for various systems, allowing developers to quickly assess the overall health of their packages.

Speakers

Franciszek Walkowiak

Senior IT Professional at Roche, Roche

Tuesday July 9, 2024 12:00 - 12:20 CEST
Pinzgau + Tennegau

R workflow + deployment + production

12:00 CEST

Regression for Compositions: Logit Models Versus Log-Ratio Transformation of Data - David Firth, University of Warwick

Compositional data analysis is growing rapidly in modern application areas, e.g. microbiome analysis, time-use studies and archaeometry to name a few. The dominant statistical methods use work of J Aitchison from the 1980s, where log-ratio transformation of data (compositional measurements) was developed as the key to use of standard multivariate methods with composition data. I outline some difficulties with log-ratios, and show that the assumptions underpinning log-ratios actually lead to a simple variance-covariance function that is readily used to build appropriate GLMs with logit link. The theory is in [1], but emphasis here will be on practice, and working in R. A key focus will be examples from in-development R package _compos_, which implements the new approach in the style of standard `lm` or `glm` methods. The new `colm` model class in _compos_ not only mimics R's `lm` and `glm` classes, it also neatly uses `lm` internally in a robust fitting algorithm (similar to the "Poisson trick", used for fitting multinomials with `glm`). The package is under open-source development [2] for CRAN submission by June 2024. [1] arxiv.org/abs/2312.10548 [2] github.com/DavidFirth/compos

Speakers

David Firth

Professor, University of Warwick

An academic statistician with wide-ranging applied interests. Open-source enthusiast.

Tuesday July 9, 2024 12:00 - 12:20 CEST
Wolfgangsee

Statistical modelling

12:20 CEST

Lunch

Tuesday July 9, 2024 12:20 - 13:20 CEST
TBD

Breaks + Special Events

13:20 CEST

Using R to Find the Most Central and Outlying Points in Object Data by Robust Depth Functions - Vida Zamanifarizhandi, University of Turku

The widespread emergence of object data, such as images, text, graphs, matrices, etc., emphasizes the need for new data analysis methods tailored to these unusual data formats. This is a difficult task as even a simple operation such as computing the mean (the most central point) is no longer straightforward in object data which have a very complex structure. Moreover, in order to ensure that our analysis stay reliable, it is imperative to develop robust methods, which are tolerant against outliers regardless of their exact type. In this talk we show how different robust depth functions can be used in R to find the most central objects in any object data set. As a byproduct, we also identify any possible outliers in the data. We offer two alternatives for the estimation: a fast in-sample optimization and a slower and more accurate out-of-sample version based on the evolutionary algorithms in the R package "GA". The method is particularly useful in the descriptive analysis of object data, which allows to detect outliers and find what the most typical objects look like.

Speakers

Vida Zamanifarizhandi

Project Researcher, University of Turku

I am a versatile statistician who is highly skilled in data analysis with over 5 years of experience in the statistical field and research. Throughout this period, I acquired specific expertise in programming with R. Currently, I am a project researcher at the University of Turku... Read More →

Tuesday July 9, 2024 13:20 - 13:25 CEST
Attersee

Big and high-dimensional data, Lightning Talk

13:20 CEST

RainbowR - a Community That Supports, Connects and Promotes LGBTQ+ People Who Code in R - Ella Kaye, University of Warwick & Hanne Oberman, Utrecht University

rainbowR is a friendly community for LGBTQ+ folks who code in R. We run monthly online meet-ups where participants chat and share their R work in a supportive environment. We also organise the buddy scheme, which randomly pairs members of the community, to encourage people to meet and connect. We have exciting plans for the future! You'll learn about what rainbowR does and how you can get involved, whether as a member of the LGBTQ+ community or as an ally, and hopefully forge new connections at the conference and beyond. We believe the whole R community benefits when that community is diverse and inclusive.

Speakers

Ella Kaye

Ms, University of Warwick

Ella is a Research Software Engineer in the Department of Statistics at the University of Warwick, UK. She works to increase sustainability and EDI (Equality, Diversity and Inclusion) in the R Project. She also runs rainbowR, a community that supports, promotes and connects LGBTQ... Read More →

Hanne Oberman

PhD candidate, Utrecht University

Statistician interested in data visualization, interdisciplinarity, and open science. Hanne is a PhD candidate in Methodology and Statistics at Utrecht University, working on computational evaluation and data visualization in the Missing Data research group. Core developer for the... Read More →

Tuesday July 9, 2024 13:20 - 13:25 CEST
Salzburg II

Community and outreach, Lightning Talk

13:20 CEST

Manage Ggplot Figures Using Ggfigdone - Wenjie Sun, Institut Curie

When you prepare a presentation or a report, you often need to manage a large number of ggplot figures. You need to change the figure size, modify the title, label, themes, etc. It is inconvinient to go back to the original code to make these changes. This package provides a simple way to manage ggplot figures. You can easily add the figure to the database and update them later using CLI (command line interface) or GUI (graphical user interface). ggfigdone is in the early stage of development. I'm looking for feedback and suggestions to improve the package. The package is available in github: https://github.com/wenjie1991/ggfigdone

Speakers

Wenjie Sun

PostDoc, Institut Curie

As a Postdoctoral Researcher at the Institut Curie, he now concentrates on employing various statistical and computational methods to conduct DNA sequence-based cellular lineage tracing, integrating this with single-cell omics data.

Tuesday July 9, 2024 13:20 - 13:25 CEST
Salzburg I

Data visualisation, Lightning Talk

13:20 CEST

Predictors Optimization for Sensory Profiles Modelling Based on Electronic Signals - Jean-Vincent Le Bé, Nestlé Research

The sensory profiles of coffee products can be generated by using predictive models with inputs from specific electrodes immersed in the liquid coffee. The electrical signals are generated by molecules diffusing through selective bio-polymers and reaching electrodes. These signals are time-series data, and various features are calculated from them (such as transient peaks or steady-state averages). These features are then used to train models for the prediction of sensory profiles or proximity to a given reference product. Six features are calculated for each of the 15 pairs of electrodes resulting in 90 variables for the classification model (proximity to a reference) or the regression model (sensory profile). These variables can be correlated to a certain extent and lead to over-fitting or unnecessary recording. This work presents a method for reducing the number of variables to a minimum relevant set using clustering and random forest variable importance. It shows that a proper selection results in a more robust model across different experimental batches.

Tuesday July 9, 2024 13:20 - 13:25 CEST
Wolfgangsee

Machine learning and AI, Lightning Talk

13:20 CEST

Datasets and Assignments for Undergraduate Teaching: Enzyme Kinetics as an Example - Jarek Bryk, University of Huddersfield

In this session I will present a set of scripts (a package-to-be) that allows generation of simulated data of enzymatic reactions with a set of inhibitors for use in undergraduate teaching. The datasets can be individualised for each student, allowing them to either prepare for the laboratory class where such data would be acquired, or to substitute a laboratory class where attending would not be possible. The scripts also enable generation of expected analytical answers to experimental questions arising from the data, such as the nature of the inhibitor or enzyme kinetics parameters, individually for each student, to facilitate marking and feedback on the assignment. The point of the presentation is not the enzyme kinetics, however, but rather the approach of leveraging computational nature of many topics in undergraduate biology education with R to support students' learning.

Speakers

Jarek Bryk

Dr, University of Huddersfield

Tuesday July 9, 2024 13:20 - 13:25 CEST
Pongau + Flachgau

Public sector and NGO, Lightning Talk

13:20 CEST

Shinydraw: Quickly Wireframe Shiny Apps in Excalidraw - Michael Page, cynkra

Wireframing Shiny apps is a time consuming process, typically involving proprietary tools. This often results in the wireframing stage of development coming with increased costs, or being skipped entirely. One alternative has been to use Excalidraw, an open-source virtual whiteboard for sketching hand-drawn like diagrams through a simple and intuitive graphical user interface. But, to date, the use of Excalidraw has still been a time intensive process requiring individual Shiny components to be drawn from scratch. That was, until now. Enter shinydraw: an Excalidraw library that offers pre-drawn Shiny components including inputs, outputs, theming, and more. Drawing wireframes with shinydraw is as simple as loading the library and then dragging-and-dropping the components you desire. It is a batteries included approach to wireframing, with a near-zero learning curve, leveraging the powers of open-source technologies and standards. This talk will show you how easy it is to get started using the shinydraw library so you can get building Shiny wireframes within minutes.

Speakers

Michael Page

Mr, cynkra

Mike Page is a data scientist with more than five years of experience working with R in the third sector. Here, his focus has been on developing open-source Shiny apps and tools such as the humaniverse collection of R packages. Mike holds a Masters by Research degree in psychoendocrinology... Read More →

Tuesday July 9, 2024 13:20 - 13:25 CEST
Pinzgau + Tennegau

Shiny + dashboards + web apps, Lightning Talk

13:25 CEST

A Bayesian Approach to Decision Making in Early Development Clinical Trials : an R Solution. - Audrey Yeo, Roche

Showcasing a new statistical software that supports decision making on whether a novel cancer treatment demonstrates sufficient safety and efficacy signals to warrant further investment.

Speakers

Audrey Yeo

Statistical Software Engineer && Biostatistician, Roche

Audrey Yeo is a Statistical Software Engineer and Clinical Trial Biostatistician at F. Hoffman La-Roche since 2021. Together with the statistician engineering team, they are creating a state of art engineering tool to enhance decision making for early development. Audrey has a pharma... Read More →

Tuesday July 9, 2024 13:25 - 13:30 CEST
Pinzgau + Tennegau

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:25 CEST

Leveraging R-Ladies Paris Reach for Community Impact - Chaima Boughanmi, BVA Xsight

This presentation aims to introduce the R-Ladies Paris community, a local chapter of the global organization R-Ladies Global, which strives to reduce gender inequalities and enhance the visibility, participation, and recognition of contributions from underrepresented genders within the R community. We will discuss our strategies to bring R enthusiasts together, fostering a collaborative environment where individuals can learn from each other. We will share insights on how to maintain an active presence to encourage member engagement. Moreover, we'll provide tips and pieces of advice on how we make our meetups accessible to connect with a broader audience. We will address the resources provided by R-Ladies Global to support the activities of chapters, which could serve as encouragement for you to embark on your R-Ladies journey in the future.

Speakers

Chaima Boughanmi

Data Scientist & Business modeler, BVA Xsight

Chaïma, a junior Data Scientist at BVA in Paris, is an engineer in statistics and information analysis with a master’s degree in Data Science from Université Paris Saclay. Experienced in modelling, data analysis,coding and ML,she has worked across various sectors. With a passion... Read More →

Tuesday July 9, 2024 13:25 - 13:30 CEST
Salzburg II

Community and outreach, Lightning Talk

13:25 CEST

Why Build Silly Things in R? - Fonti Kar, University Of New South Wales

Data science is an ever-evolving industry that requires constant upskilling. The pressures to learn the latest tools for project deliverables or to enhance one’s CV can be a hindrance to effective learning. Here, I argue for the need for silliness when developing new R skills. Learning is far more enjoyable and conducive to retention and application when we take away the seriousness of upskilling. I will share my experience in creating {ohwhaley} - a ‘toy’ R package which serves as a tool for learning package development and upskilling new learners. I hope attendees will walk away feeling more light-hearted and empowered to build silly things in R to reinvigorate their curiosity for R knowledge.

Speakers

Fonti Kar

Dr., University Of New South Wales

I’m Fonti and I am an evolutionary biologist wearing R developer shoes. I work with researchers and turn their ideas into accessible tools. I like to learn new things in R and sharing it with others.

Tuesday July 9, 2024 13:25 - 13:30 CEST
Attersee

Data science education, Lightning Talk

13:25 CEST

Pricing Analytics - Meena Saad, Bank of Montreal

Pricing analytics is an emerging sector where the use of optimization models and forecasting come hand in hand. I've developed models where we can use proprietary data and market intelligence to keep optimizing market share through pricing. The old tale of Volume vs Margin, which to prioritize. Since the development of the model the team has been able to guide business leaders in increasing profits while minimizing loss. In our talk we will discuss: -What is pricing analytics -Pricing challenges -How to optimize pricing using R

Speakers

Meena Saad

Strategic Pricing Manager, Bank of Montreal

I am a Canadian working in the US supporting the US operations for BMO. I started my career with bachelor of Finance from the University of Ottawa. Worked in treasury for a few years then realized the importance of developing a greater understanding of analytics. I then obtained my... Read More →

Tuesday July 9, 2024 13:25 - 13:30 CEST
Salzburg I

Economics + finance + insurance + business, Lightning Talk

13:25 CEST

Automatic Generation and Delivery of Customised Reports on COVID-19 Hospitalisations in Austria - Zuzanna Brzozowska, Gesundheit Österreich GmbH & Martin Zuba, Austrian National Public Health Institute

We show how RMarkdown was used in the national COVID-19 monitoring system in Austria. Within just a few days, our two-person team automated the process of generating and delivering customised weekly reports on COVID-19 hospitalisations. Our RMarkdown template generated two different kinds of reports in two different formats: 1) PDF files with detailed information at the subnational (i.e. federal state) level sent out automatically to a pre-defined list of local authorities (each recipient list received only one file with the analyses for the corresponding federal state), and 2) a publicly available HTML file with interactive features containing summary information at the national level only. Both report formats combined formatted text, links to external documents, tables, and different types of graphs (in the HTML format, the graphs were interactive).

Speakers

Martin Zuba

Mr, Austrian National Public Health Institute

Graduated in Economics at University of Vienna in 2010. Employed at the Austrian National Health Institute, department of health economics and health system analysis since 2015. Specialised in economic evaluations, financing of the healthcare system, statistics and modelling.

Zuzanna Brzozowska

Mrs., Gesundheit Österreich GmbH

Zuzanna is a research and data scientist experienced in complex quantitative data analysis and in communicating research results. After years spent in Academia as a demographer, she is now in applied research with a public health institute where she does what she likes most: automates... Read More →

Tuesday July 9, 2024 13:25 - 13:30 CEST
Pongau + Flachgau

Public sector and NGO, Lightning Talk

13:25 CEST

Latent Transition Analysis Using R - Christian Ritz, University of Southern Denmark

Latent transition analysis (LTA) is a useful statistical modelling approach for identifying subgroups ("latent classes") and describe transitions between subgroups over time. Subgroups may be characterized in terms of prevalence at each time point and through transition probabilities capturing the likelihood of transition from one subgroup at one time point to another (or the same) subgroup at another time point. Investigating predictors of transition between subgroups is often of key interest. Currently, LTA is mostly carried out using commercial and specialized software, but not by means of open source statistical software. This talk will show that there exist a flexible and modular approach for LTA using R.

Speakers

Christian Ritz

Dr., University of Southern Denmark

Christian Ritz is a professor of statistics and epidemiology. He has been using R for many years for a wide range of statistical analyses in biology, environmental science, medicine, nutrition epidemiology, physiology, and toxicology. He is the developer of the R package "drc".

Tuesday July 9, 2024 13:25 - 13:30 CEST
Wolfgangsee

Statistical modelling, Lightning Talk

13:30 CEST

Generate Raw Synthetic Dataset for Clinical Trial - Binod Jung Bogati, Numeric Mind

Obtaining synthetic raw datasets, particularly for clinical trials, poses significant challenges. The reliance on manual data entry in Electronic Data Capture (EDC) systems, along with the creation of test data scenarios for generating Study Data Tabulation Model (SDTM) and other clinical programming tasks, presents complexities. syngenR, an R package, addresses these challenges by offering a solution that generates customized synthetic raw datasets for clinical trials. This presentation introduces an alternative to conventional test data generation and entry methods, addressing specific limitations and challenges of the current approach. By automating the creation of synthetic data that accurately reflects real-world variability, reliability, and efficiency of SDTM generation and other clinical programming tasks while avoiding the inaccuracies associated with manual data entry. This package can also be used in educational settings, and its capability to test various clinical trial scenarios, and its potential to significantly reduce the time and effort required for clinical trial preparation and execution.

Speakers

Binod Jung Bogati

Associate Manager - Data Science, Numeric Mind

Binod Jung Bogati is a Statistical Programmer at Numeric Mind since 2020. Apart from work, he is also rOpenSci 2023/24 Champion, R User Group Nepal's organizer, hosts R community events. He loves working on data and currently focusing on Clinical Data Science / Life Science.

Tuesday July 9, 2024 13:30 - 13:35 CEST
Pinzgau + Tennegau

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:30 CEST

ScMitoMut: Single Cell Lineage Informative Mitochondrial Mutation Calling Tool - Wenjie Sun, Institut Curie

Cells originate from cell, tracing their lineage from a common ancestor (lineage tracing) is provial for the exploration of development, tumors, and stem cell biology. This is particularly important in answering questions such as stem cell potency, cancer cell plasticity. Within scATACSeq or single-cell multiomics sequencing, mitochondria DNA is enriched due to their histone-free nature with somatic mutations in mitochondria acting as endogeneous marker to following cell lineage in single cell level while profiling open chromatin. We introduce scMitoMut (available in bioconductor), an R package that leverages the statistical model to accurately identify mitochondrial mutations at the single-cell level. scMitoMut is designed to enable users to analysis large datasets on personal computers. In the implementation phase, we have addressed the challenge of handling large scATACSeq datasets on personal computers by creating an HDF5-based object to store raw data and intermediate results. To speed up statistical model fitting for both binomial-mixture and beta-binomial distributions, we have utilized Rcpp for efficient computation and implemented parallel processing techniques.

Speakers

Wenjie Sun

PostDoc, Institut Curie

Tuesday July 9, 2024 13:30 - 13:35 CEST
Attersee

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:30 CEST

Table Talk: Designing a Workflow for Reproducible Table Creation in R for Epidemiological Research - Reiko Okamoto, Bruyère Research Institute/Ottawa Hospital Research Institute

Summary tables are ubiquitous in scientific manuscripts reporting epidemiological and clinical studies. Table 1 often contains key demographic information about the study population, including the mean and standard deviation for continuous variables and frequency and proportion for categorical variables. Table 2 may present the association between the explanatory and outcome variables under investigation, and so on. Since there are endless ways (some more robust than others) to create analytical and summary tables in R, our research group wanted to define a workflow that could be adopted by colleagues of varying levels of proficiency in R to reproducibly create these tables from raw data. In this talk, I will share our approach in designing this workflow and what we discovered along the way. This will include a discussion on the current landscape of table-generating packages in R and how we overcame the limitations of existing software alongside other challenges (e.g., incorporating existing metadata). These strategies will not only be useful to researchers in epidemiology but also relevant for those in other health and social science disciplines.

Speakers

Reiko Okamoto

Methodologist, Bruyère Research Institute/Ottawa Hospital Research Institute

Reiko has a background in the life sciences with experience conducting data analysis in academia and the public sector. She is always eager to make analysis more open, transparent, and reproducible. Originally from the west coast of Canada, she completed a BSc in Microbiology and... Read More →

Tuesday July 9, 2024 13:30 - 13:35 CEST
Salzburg I

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:30 CEST

R Is for Everybody - Ekaterina Akimova, Laboratory for Immunological and Molecular Cancer Research

I am a female scientist working in cancer research. Working on my PhD project, I realized that I could not make a use of the huge data amounts I generate without proper coding skills. At that point, my journey into R began. Knowing absolutely nothing about R or any other programming language, I was trying to perform an analysis and get the final output that I aimed for. With a lot of help from our bioinformatician, we ended up with a peer-reviewed publication, summarizing our project and analysis. During these times, I often wished for guidance and support by community members who may be struggling with similar issues themselves. Unfortunately, this community did not exist in Salzburg. I continued my work and my learning process, but kept wishing for some R friends. Until finally, in January 2024, I launched a local R-Ladies Chapter in Salzburg. R-Ladies is a worldwide non-profit organization that promotes gender diversity within R Community by organizing educational and networking events. In this lighting talk, I would like to encourage everyone, in particular people of under-represented genders, to put aside their fear of coding and embrace the world of R.

Speakers

Ekaterina Akimova

Dr. rer. nat., Laboratory for Immunological and Molecular Cancer Research

I completed my PhD at LIMCR, investigating DNA damage in cancer. During this time, I worked on various projects, including the development of R-based analysis pipelines, and found my passion for coding. In 2023, I finished the doctorate, but continued my research endeavours as a PostDoc... Read More →

Tuesday July 9, 2024 13:30 - 13:35 CEST
Salzburg II

Community and outreach, Lightning Talk

13:30 CEST

Open-Source Economics: Breaking Silos, R-Evolutionizing the Field - Aarsh Batra, London School of Economics and Political Science

In my journey through data roles in econ research at various universities and non-profits, convincing my teams to hop on the R train has been a bit of a tug-of-war and that’s because in econ there's been a historic preference for proprietary tools like STATA that has resulted in a slower uptake of open-source collaborative practices. But yes, things are starting to change, slowly but surely. R and Python are shaking things up in the econ world and it's exciting to see research groups jumping in and opening up. In my experience, effectively showcasing R’s versatility, open-source foundation, and vibrant community support has proved pivotal in winning teams over. The insights I've gathered are relevant for anyone dealing with similar challenges in convincing their teams and if given the opportunity I'm keen to share these with everyone. Imagine this: an econverse brimming with R packages, as user-friendly and structured as the tidyverse or the pharmaverse. It's a vision worth spreading, and I'm eager to rally others to join the cause.

Speakers

Aarsh Batra

Mr., London School of Economics and Political Science

I'm Aarsh, aka R-sh, your friendly neighborhood R and data-viz enthusiast from India. By day, I'm a Data Scientist working for the London School of Economics and Political Science, navigating through econ research and data roles. Outside of work hours, you'll find me solving puzzles... Read More →

Tuesday July 9, 2024 13:30 - 13:35 CEST
Pongau + Flachgau

Economics + finance + insurance + business, Lightning Talk

13:30 CEST

A Comprehensive List of Normality Tests in R - Fernando Corrêa, Curso-R

Goodness-of-fit remains a cornerstone in statistical analysis, with normality testing standing as a pivotal subset. With numerous solutions available, each boasting distinct characteristics in power, type I error control, and theoretical underpinnings, the landscape can be daunting to navigate. Moreover, the proliferation of R packages, encompassing both classical Fortran implementations and modern alternatives, adds another layer of complexity. In this presentation, we embark on a comprehensive journey through the normal goodness-of-fit tests. We will meticulously list, demonstrate, and critically assess these implementations, juxtaposing old-school methodologies with their contemporary counterparts. From the Shapiro-Wilk to the Anscombe-Glynn test, we'll explore a plethora of tests, shedding light on computacional performance, recent advancements and key differences in interface design.

Speakers

Fernando Corrêa

Mr, Curso-R

Bachelor's degree and master's student in Statistics at IME-USP. Former Technical Director at the Brazilian Association of Jurimetrics, partner at the lawtech Terranova Jurimetrica, and currently works as a data science consultant at R6 Consultancy. Uses R for everything, but has... Read More →

Tuesday July 9, 2024 13:30 - 13:35 CEST
Wolfgangsee

Statistical modelling, Lightning Talk

13:35 CEST

The State of R Programming and Its Prospects in Africa - Elisha Chitsenga, Crestly Resorts

Data and statistical modeling techniques are the backbone of any modern society, but this is not true in Africa. Only a few countries have used this notion. Having recognized this issue, it appears that a lack of use, awareness, and comprehension of these modeling methodologies has crippled our countries, as a lack of current and detailed information has left stakeholders reliant on obsolete data and information for decision-making. Based on the information presented above, it is time for African society to devote itself to the implementation and use of R programming, which we see as a game changer due to its open source nature. In this enlightening talk, the presenter will discuss the challenges Africa faces, particularly in data collection and analysis, and how R programming can help, as well as how the community can collaborate with African aspiring data scientists to help improve lives and future generations across the continent.

Speakers

Elisha Chitsenga

Software Developer, Community, and Business Leader, Crestly Resorts

Elisha is a software developer with over 10 years of expertise, an accounting degree, a leader in the community and company, and a fully accredited information systems auditor. His main goals are to use cloud-based solutions, Linux, RISC-V, Python, and R coding to help the African... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Salzburg II

Community and outreach, Lightning Talk

13:35 CEST

BayesCVI: A Bayesian Cluster Validity Index - Nathakhun Wiroonsri, King Mongkut's University of Technology Thonburi

Selecting the appropriate number of clusters is a critical step in applying clustering methods. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their contexts. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes a Dirichlet prior, resulting in the same posterior distribution. We evaluate BCVI using the Wiroonsri index for hard clustering and the WP index for soft clustering as underlying indices. We compare the performance of BCVI with that of the original underlying indices and several other existing CVIs, including DB, STR, XB, and KWON2 indices. Our BCVI offers clear advantages in situations where users can specify their desired range for the final number of clusters. Additionally, we showcase the practical applicability of our approach through MRI images. These tools are also published as a new R package `BayesCVI' available on CRAN.

Speakers

Nathakhun Wiroonsri

Assistant Professor, King Mongkut's University of Technology Thonburi

Nathakhun Wiroonsri earned his B.Sc. in Mathematics with first-class honors from Chulalongkorn University, Master of Financial Mathematics from North Carolina State University, and Ph.D. in Applied Mathematics from the University of Southern California in 2010, 2013, and 2018, respectively... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Salzburg I

Machine learning and AI, Lightning Talk

13:35 CEST

Roam: Remote Objects with Active-Binding Magic - Yangzhuoran Fin Yang, Monash University

The "roam" package simplifies the creation of R objects that resemble regular objects but are sourced from remote locations. It empowers package developers to incorporate these "roaming" objects, which may surpass the 5MB limit, into their packages. Additionally, it facilitates dataset updates independent of package updates through functions that retrieve data from remote sources. https://github.com/FinYang/roam

Speakers

Yangzhuoran Fin Yang

PhD Candidate, Monash University

Yangzhuoran Fin Yang is a PhD candidate in the Department of Econometrics and Business Statistics at Monash University. His PhD project is on the use of transformations of time series to improve forecasting. Fin is active in research software development, (co)authoring open source... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Pinzgau + Tennegau

R workflow + deployment + production, Lightning Talk

13:35 CEST

Dandelion Hub - A central repository for de-central actions for ecosocial justice based on R/Shiny - Wilmar Igl, Private

Social and ecological systems around the world are experiencing multiple crises [1]. Decision makers have shown a lack of ability or will to take action to reduce current social and ecological injustices [2]. However, individuals without formal political or economic power can contribute to peacefully guiding the world back to a sustainable pathway. The Dandelion Hub (https://dhub.global) serves as a central repository for de-central, non-violent actions for ecosocial justice.

The Dandelion Hub (DHub) uses a web frontend (R/Shiny) and database backend (MariaDB) to record and report actions across a broad spectrum of non-violent actions according to the classification by Sharp (1973) [3].

As of 2024-01-28, 2,389,238 activists across 615 actions in 180 cities, 50 countries, and 6 continents, who took part in non-violent actions between 2018-08-20 to 2024-01-14, are represented in the repository.

Open-source technology, such as R/Shiny, can contribute to the socio-ecological transformation of society.

References:
[1] WEF (2024). http://tinyurl.com/2995u9jh
[2] IPCC (2023). http://tinyurl.com/5cybutfs
[3] Sharp, Gene (1973). http://tinyurl.com/2jvhvxwt

Speakers

Wilmar Igl

PhD, Private

Wilmar Igl, PhD, has a background in medical statistics and psychology. He has over 20 years of professional experience in the life sciences and has also been active in the climate movement since 2018. His experiences have resulted in broader interests in eco-social justice and projects... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Pongau + Flachgau

Shiny + dashboards + web apps, Lightning Talk

13:35 CEST

Unlock Your Data Insights Faster: The 'CohortBuilder' Way. - Adam Forys, Roche

Cohort analysis is vital for understanding patterns and trends within datasets, particularly in fields like healthcare, marketing, and user analytics. The 'cohortBuilder' and 'shinyCohortBuilder' packages in R offer a convenient approach to defining and manipulating cohorts. If you're exploring ways to streamline your cohort analysis workflow within Shiny, this talk will introduce you to powerful tools worth exploring. During the presentation, I will demonstrate the core concepts of the 'cohortBuilder' ecosystem, highlighting their strengths in performing cohort analysis and visualization within R Shiny.

Speakers

Adam Forys

Mr., Roche

Adam is a Principal Data Scientist at Roche. He is dedicated to building R packages that empower teams working on SDTM. He is committed to collaboration and enjoys guiding others in overcoming technical obstacles and optimizing their data science workflows.

Tuesday July 9, 2024 13:35 - 13:40 CEST
Attersee

Shiny + dashboards + web apps, Lightning Talk

13:35 CEST

From SPSS to R in Social Sciences - Juan Claramunt, Leiden University

In this session, we will discuss the transition from SPSS to R that we made at the Methodology and Statistics unit of the Institute of Psychology of Leiden University.
With this talk, we want to inspire and help other users to introduce R in their Social Sciences bachelors replacing other common software such as SPSS, as well as making connections with other R users willing to improve Data Science/Statistics education in Social Sciences.
We will introduce the main reasons for the change: alignment research-education, didactical purposes, incentivize good research practices (open science, reproducibility), and better career perspectives.
Afterwards, a description of the implementation of R in our bachelor program will be outlined. Here, we aim to provide tips on how to transition from SPSS to R in a social sciences environment, including the experience of other R users working on related education.
We finalize with a summary of the quantitative and qualitative results we have obtained during the first R year.
We anticipate the transition to R to greatly impact our students' future research practices, helping to solve issues such as the reproducibility crisis.

Speakers

Juan Claramunt

Specialist in sciencific information, Leiden University

Bachelor in Mathematics at Universidad de Cantabria, Utrecht University & Brown University.Master in Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences, & European Master in Official Statistics (Utrecht University).Scientific information specialist at... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Wolfgangsee

Social sciences, Lightning Talk

13:40 CEST

Building Community Through the Power of Your Voice - Joanna Moćko-Łazarewicz, Appsilon

Unlock the potential of your voice! Join me for a quick exploration of why speaking at industry events matters specifically within our R ecosystem. Discover how sharing your insights not only creates a positive ripple effect but also empowers others to benefit from your experiences. From networking to personal growth, we'll delve into the transformative impact of sharing your story. I'll share insightful stats about the R conference landscape and offer actionable insights on becoming a speaker. Learn how to navigate the process and kickstart your speaking journey. Let's illuminate the path to a more connected and empowered R community together, where your voice becomes a valuable resource for collective growth.

Speakers

Joanna Moćko-Lazarewicz

Community and Events Lead, Appsilon

Joanna is a seasoned event manager with a fervor for cultivating connections within the tech community. Currently, she’s serving as the Community and Events Lead at Appsilon. In her role, Joanna takes pride in organizing ShinyConf, an event that has become a staple for hundreds... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Salzburg II

Community and outreach, Lightning Talk

13:40 CEST

Interactive, Engaging and Playful Teaching of Hypothesis Testing - Andre Beinrucker & Markus Konrad, HTW Berlin

We present a method to teach hypothesis testing by engaging students into an A/B-test facilitated by a Shiny app.

The idea is to teach hypothesis testing in the context of A/B-testing, which is nowadays massively used to optimize apps and webpages. The proposed method engages students into an A/B-test in class as follows: Students access a quiz app via a QR-code. The students are randomly assigned to either control or treatment group and run the quiz without knowing their group or the treatment. We then reveal the treatment and anonymously save and analyze the scores in class using a hypothesis test to see whether the treatment had any effect on the scores.

The procedure, based on a pen&paper game by Adam Shrager, comes with a number of challenges and pitfalls – as any randomized trial - that can be openly discussed with students.

In this presentation we walk you through this A/B-test, collecting live quiz data from the audience if time permits. We present and discuss some challenges and key findings. The app is freely available, feedback is appreciated:
documentation and installation: https://github.com/IFAFMultiLA/memory_game
live demo: https://tinyurl.com/MemGameTrial

Speakers

Andre Beinrucker

Prof., University of Applied Sciences Berlin (HTW Berlin)

since 2020: Professor of Applied Statistics at the Universtity of Applied Sciences Berlin (HTW Berlin)2019-2020: Data Scientist at Babbel2015-2019: Biostatistician at Thermo Fisher Scientific2015: Ph.D. at the University of Potsdam

Markus Konrad

M.Sc., HTW Berlin

M.Sc. in Computer Science from HTW Berlin, University of Applied Sciences. Worked as data scientist at the Berlin Social Science Center (WZB) and at Fraunhofer FOKUS. Focuses on software engineering, data analysis and machine learning in R and Python.Currently research assistant and... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Pongau + Flachgau

Data science education, Lightning Talk

13:40 CEST

R for Exploring Spatial Data: Lightning Overview - Siddharth Gupta, University of Potsdam

I will talk about different map/spatial related libraries in R, their comparison and overview using personal experiences and (hopefully) fun examples!

Speakers

Siddharth Gupta

PhD student in Cognitive Science, University of Potsdam

I am a PhD student at the University of Potsdam, working in the domain of psycholinguistics. Besides, I am interested in NLP, linguistics, behavioral economics and Deep Learning. When I am not consumed with college work, I post YouTube videos, create Discord bots, write Twitter threads... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Wolfgangsee

Data visualisation, Lightning Talk

13:40 CEST

Automated Generation of R Client Libraries for RESTful APIs Using OpenAPI Specification and the Open - Simon Haller, Raiffeisen

Learn how to streamline the creation of R client libraries for RESTful APIs by leveraging the OpenAPI specification (v2 or v3) and a customizable, template-based code generation tool, OpenAPI Generator (https://openapi-generator.tech/). This talk introduces the code generation workflow, demonstrates the functionality of the resulting R package, and explores customization options for the generated R code and documentation. We explore an efficient approach to leveraging openapi.json (swagger) files as a starting point for creating tailored R client libraries for seamless integration.

Speakers

Simon Haller

Dr., Raiffeisen

Dr. Simon Haller, a theoretical physicist with a PhD in mathematics, brings extensive experience in the finance industry. As a Senior Quant Analyst, he has adeptly developed and maintained applications in programming languages such as R and Python across diverse areas, including risk... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Attersee

Interfaces with other programming languages, Lightning Talk

13:40 CEST

Adding the Missing Audit Trail to R - Magnus Mengelbier, Limelogic AB

The R language is used more extensively across the Life Science industry for GxP workloads. The basic architecture of R makes it near impossible to add a generic audit trail method and mechanism for all users cases. Different strategies have been developed to provide some level of auditing, from logging conventions to file system audit utilities, but each has its drawbacks and lessons learned.

The ultimate goal is to provide an immutable audit trail compliant with ICH Good Clinical Practice, FDA 21 CFR Part 11 and EU Annex 11, regardless of the R environment. We consider different approaches to implement auditing functionality with R and how we can incorporate an audit trail functionality natively in R or with existing and available external tools and utilities that completely supports Life Science best practices, processes and standard procedures for analysis and reporting.

Speakers

Magnus Mengelbier

Managing Director, Limelogic AB

Magnus is currently the Managing Director of Limelogic, a contributor, collaborator and independent consultant based in southern Sweden with over 25 years of experience in the Life Science industry. A keen advocate of simple programming approaches with a focus on GxP, compliance... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Salzburg I

R workflow + deployment + production, Lightning Talk

13:40 CEST

Checklist Improves Collaboration, Quality and Visibility of Your Code - Thierry Onkelinx, Research Institute for Nature and Forest

The checklist package is a set of rules for R packages and R source code projects. The ruleset covers several topics: folder structure, filename conventions, spelling, code style, citation metadata, licence, contribution guidelines, ... Adherence to a common set of rules within an organisation facilitates collaboration between its members. Enforcing citation metadata and an open source licence improves the visibility of projects. Automated checks via GitHub Actions detect problems as soon as possible. Checklist is based on the rcmdcheck, lintr, pkgdown, codemetar and hunspell packages. Where applicable, we use the same rules for projects and packages. The maintainer can choose which parts of the ruleset apply to a project. In the case of an R package, the entire ruleset is mandatory. Publishing code on Zenodo is easy if you link to your GitHub repository. Each release on GitHub triggers a new version on Zenodo with a specific DOI. A GitHub action creates a new release for each new version of the package. Documentation and source code is available on https://inbo.github.io/checklist

Speakers

Thierry Onkelinx

statistician, Research Institute for Nature and Forest

statistician at the Research Institute for Nature and Forest

Tuesday July 9, 2024 13:40 - 13:45 CEST
Pinzgau + Tennegau

R workflow + deployment + production, Lightning Talk

13:45 CEST

Community Detection for Extremely Large Networks - Aidan Lakshman, University of Pittsburgh

Community detection in graphs has numerous applications from social networks to biology. However, the immense size of modern graphs makes it challenging to accurately detect communities. We set out to benchmark a variety of popular methods available in R to measure their accuracy and time complexity on synthetic and real datasets. Unsurprisingly, we found that less scalable algorithms tend to outperform more computationally efficient ones. To address this issue, we introduce two new variants of the Fast Label Propagation algorithm for clustering extremely large networks, both available in the SynExtend package for R. Our implementations offer accuracy comparable to less scalable approaches while providing linear-time computational scalability. Furthermore, we made it possible to apply our community detection algorithms outside of main memory, which permits community detection on graphs with billions of nodes using less than a gigabyte of RAM. These advances will help democratize scalable analyses by removing the need for expensive supercomputer resources. Together, this work both improves graph community detection and makes these analyses more accessible to researchers.

Speakers

Aidan Lakshman

PhD Candidate, University of Pittsburgh

Aidan Lakshman is a PhD Candidate in Biomedical Informatics at the University of Pittsburgh. His dissertation focuses on developing tools for large-scale comparative genomics. He is expected to graduate in May 2025 and is actively searching for employment opportunities. Aidan is an... Read More →

Tuesday July 9, 2024 13:45 - 13:50 CEST
Attersee

Big and high-dimensional data, Lightning Talk

13:45 CEST

R4CR: R Education for Clinical Researchers via Quarto - JInhwan Kim, Zarathu Co., Ltd.

Clinical research is one of the fastest growing fields in the world, and R is becoming increasingly important as a way to handle data, especially as more and more studies are conducted with small numbers of patients, or in collaboration with multiple institutions to collect data and conduct research. Rather than using R to analyze data, clinical researchers have typically focused on study design, data collection, and validation, while coding has been done by professional developers, but now more and more clinical researchers are trying to use R themselves, including data management. To this end, we have been providing R training for clinical researchers, but there is a lot of room for improvement compared to professional training services, such as reflecting the latest R-related technology trends and making the training experience better. In this session, I will share how we decided to use Quarto, what we considered in order to provide R training for clinical researchers, how we actually used Quarto, the advantages and disadvantages of using Quarto, our achievements, and our future plans.

Speakers

JInhwan Kim

R developer, Zarathu Co., Ltd.

Jinhwan is R / Shiny developer with background in bioinformatics. He has dedicated his career to crafting data products using R ecosystem across diverse industries as a Data Scientist. Currently, He is a key contributor at Zarathu, where he specializes in developing R packages and... Read More →

Tuesday July 9, 2024 13:45 - 13:50 CEST
Salzburg II

Data science education, Lightning Talk

13:45 CEST

verdepcheck - A Tool for Dependencies Check - Pawel Rucki & André Veríssimo, Roche

A proper dependency management is critical to assure a good experience of your package users. Package API incompatibilities, breaking changes or incorrect minimal dependency versions might lead into various compatibility issues on the user end.

In this talk I will introduce you to the newly created product (a package and associated GitHub Action) designed for package developers that will help you to detect and solve these issues earlier.

Speakers

Pawel Rucki

Ms, Roche

André Veríssimo

PhD, Roche

Tuesday July 9, 2024 13:45 - 13:50 CEST
Pinzgau + Tennegau

Efficient programming, Lightning Talk

13:45 CEST

Using R with SQL Server 2022 and Power BI - Tomaž Kaštrun, /

Exploring the usage of R and SQL Server, elucidating the benefits, challenges, and practical applications of harnessing R's statistical computing capabilities directly within the SQL Server environment.

By leveraging R scripts and functions seamlessly within Power BI, data professionals can gain access to a powerful toolkit for advanced analytics, predictive modelling, and machine learning.

Both integrations facilitates the execution of complex statistical analyses directly on large datasets stored in SQL Server databases or in Power BI (Vertipax), eliminating the need for data movement and enabling additional insights.

Speakers

Tomaž Kaštrun

Mr., /

Tomaž Kaštrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research... Read More →

Tuesday July 9, 2024 13:45 - 13:50 CEST
Salzburg I

Interfaces with other programming languages, Lightning Talk

13:45 CEST

Regression Models for [0, 1] Responses Using Betareg and Crch - Achim Zeileis, Universität Innsbruck

In this presentation we show how to model data from the closed unit interval [0, 1] using extended-support beta regression and heteroscedastic two-limit tobit models. In contrast to zero- and/or one-inflated beta regression, both approaches only require estimation of a single latent process that captures both the distribution of the inner observations and the point masses for observations on the boundaries at 0 and/or 1. The heteroscedastic two-limit tobit model does so by fitting a Gaussian distribution censored at 0 and 1 which is conveniently available in the R package "crch". Extended-support beta regression has recently been proposed and implemented in the development version of the "betareg" package. It contains both classic beta regression and heteroscedastic two-limit tobit as special cases, shifting between the two with just one additional parameter. Both approaches are illustrated by modeling reading accuracy scores of children and investments in an economic loss aversion experiment, respectively, discussing the models' relative (dis)advantages.

Speakers

Achim Zeileis

Professor of Statistics, Universität Innsbruck

Achim Zeileis is Professor of Statistics at the Faculty of Economics and Statistics at Universität Innsbruck. Being an R user since version 0.64.0, Achim is co-author of a variety of CRAN packages such as zoo, colorspace, party(kit), sandwich, or exams. In the R community he is active... Read More →

Tuesday July 9, 2024 13:45 - 13:50 CEST
Wolfgangsee

Statistical modelling, Lightning Talk

13:50 CEST

Break

Tuesday July 9, 2024 13:50 - 14:10 CEST
TBD

Breaks + Special Events

14:10 CEST

Centralized Database Management: Enhancing User Interaction and Accessibility with R / Shiny and AWS - Ilaria Capelli & Mina Sohrabi, Raiffeisen Bank International - Raiffeisen Research

As developers at a leading economic and financial markets research institution, centralizing our data is crucial for both internal (backtesting, sharing forecasts, model validations) and external (client access) use cases. Our contribution consists of two parts: first, the introduction of our R package, developed using the “Open API generator,” enabling seamless querying of our database via a RESTful API. Additionally, we show the integration of a robust authorization and authentication layer within AWS cloud infrastructure. Second, we have designed a user-friendly R/Shiny-based frontend to facilitate easy access for non-technical users. This interface displays forecasts and historic data for various variables and countries, allowing for data updates and downloads. To mimic familiar Excel operations such as row-wise pasting and data removal, we have successfully utilized the open-source R-wrapper rhandsontable of the JavaScript library handsontable. Both contributions enable our analysts and clients to smoothly interact with our database by providing a convenient, secure, and user-friendly environment, enhancing the accessibility of our macroeconomic data for all stakeholders.

Speakers

Ilaria Capelli

M. Sc., Raiffeisen Bank International - Raiffeisen Research

Ilaria Capelli holds a master’s degree in Stochastics and Data Science. With experience as a Quant Analyst in the financial industry, her proficiency includes developing interactive dashboards, stress test modeling, and risk assessment using tools such as R, Python, SQL, and visualization... Read More →

Mina Sohrabi

M.Sc, Raiffeisen Bank International - Raiffeisen Research

Mina Sohrabi has a master's degree in technical mathematics and is currently pursuing her PhD degree in mathematical modelling/operation's research. As a Quant Analyst in the banking industry, her expertise mainly lies in the field of developing interactive web-dashboards by means... Read More →

Tuesday July 9, 2024 14:10 - 14:30 CEST
Pongau + Flachgau

Economics + finance + insurance + business

14:10 CEST

Spare Cores: Harnessing Unutilized Cloud Compute Resources - Gergely Daroczi, Spare Cores

Spare Cores, an NGI Search funded open-source ecosystem, inventories and actually benchmarks for different scenarios the available compute resources of public cloud and server providers to find optimal instance types across vendors and datacenters for containerized jobs (e.g. training ML models or hosting a Shiny app). Among other open-source SDKs, we provide an R package allowing easy access to this public database, complemented by CLI helpers for launching instances in your existing cloud environment. We also briefly showcase a streamlined SaaS solution built on top of the open-source stack for those seeking simplicity and/or unwilling to manage their cloud infrastructure: the managed Spare Cores environment covers the entire life cycle of batch jobs and microservices, eliminating the need for direct cloud vendor engagement.

Speakers

Gergely Daroczi

Project lead, Spare Cores

Gergely Daróczi is an enthusiast R user and package developer, Ph.D. in Sociology; former assistant professor and founder of an R-based web reporting application at rapporter.net; ex Lead R Developer, then Director of Analytics at CARD.com; later Senior Director of Data Operations... Read More →

Tuesday July 9, 2024 14:10 - 14:30 CEST
Attersee

R workflow + deployment + production

14:10 CEST

Reproducible Data Science with WebAssembly and WebR - George Stagg, Posit, PBC

A fundamental principle of the scientific method is peer review and independent verification of results. Good science depends on transparency and reproducibility. However, in a recent study a substantial 74% of research code failed to run without errors, often caused by diverse computing environments. This talk will discuss the principles of numerical reproducibility in research and show how software can be pinned to specific versions and self-contained as a universal binary package using WebAssembly. This ensures seamless reproducibility on any machine equipped with a modern web browser and, using tools such as Shinylive, could provide a new way for researchers to share results with the community.

Speakers

George Stagg

Software Engineer, Posit, PBC

George is a senior software engineer working on the webR project as part of the Open Source Team at Posit. A former academic, George also has experience with teaching and research in computational mathematics, statistics and physics. When not working with software, George enjoys hacking... Read More →

Tuesday July 9, 2024 14:10 - 14:30 CEST
Salzburg II

Shiny + dashboards + web apps

14:10 CEST

R for Streamlined Research: Spotlight on Data Collection - Agustin Perez Santangelo, Appsilon

In this session, I will explore R's potential for data collection, a less explored aspect of the language. While R is widely recognized for its robust data analysis and reporting capabilities, its utility for data collection, particularly through R Shiny apps, is less commonly discussed. I aim to shed light on this aspect, demonstrating how R can serve as an end-to-end solution for academic workflows, especially in fields studying human behavior such as experimental psychology, social sciences, and behavioral economics. I will walk through two case studies from published papers where R was not only used for data analysis but also for data collection. These examples illustrate how an R Shiny app served as an online experiment platform, enabling efficient and effective data gathering. The goal of this session is to broaden the perspective of R users and to inspire academics to consider R as a comprehensive tool for their research journey. From data collection to data analysis, and all the way to authoring and publishing, R can streamline the process, making research more efficient and reproducible.

Speakers

Agustin Perez Santangelo

Mr, Appsilon

I am a molecular biologist and cognitive scientist from Argentina. Currently, I work as a software engineer (mainly using R and R Shiny) at Appsilon, I enjoy translating ideas into code.

Tuesday July 9, 2024 14:10 - 14:30 CEST
Pinzgau + Tennegau

Social sciences

14:10 CEST

Analyzing Real-World Geospatial Networks in R for Sustainable Transport Planning - Lucas van der Meer & Lorena Abad, University of Salzburg

Geospatial networks are graphs embedded in geographical space. They can be used to represent, analyze and model a variety of real-world complex systems. A motivating example is urban transport systems with their ongoing transition towards a sustainable design and increased focus on active travel. Streets, their surroundings, and their interconnections form the geospatial network. The analysis often involves an assessment of transport accessibility: how well does the network connect people to the places they want to go to? This talk will cover three main stages of such an analysis, and its implementation in R. First, we show how to import street geometries and amenity datasets from OpenStreetMap, using the packages {osmdata} and {osmextract}. Second, we show how to build a clean and routable street network from these data, using the package {sfnetworks}. Finally, we give an example of how to compute bicycle accessibility to different amenities, taking into account the suitability of the network for cycling. Although we focus on the application domain of transport planning, the content is meant to be useful for anyone interested in analyzing real-world geospatial networks in R.

Speakers

Lorena Abad

MSc., University of Salzburg

Doctoral researcher at the Department of Geoinformatics - Z_GIS of the University of Salzburg. Part of the research groups Risk, Hazard and Climate and EO Analytics. I focus on the analysis of big Earth observation data to map and monitor landscape dynamics and I am researching the... Read More →

Lucas van der Meer

Msc, University of Salzburg

Lucas van der Meer is a doctoral researcher in Geoinformatics at the University of Salzburg. He holds a bachelor in Environmental & Infrastructure Planning, and a master in Geospatial Technologies. He is particularly interested in the application of geospatial data science to address... Read More →

Tuesday July 9, 2024 14:10 - 14:30 CEST
Salzburg I

Spatial data and maps

14:10 CEST

How to Intepret Statistical Models Using the `Marginaleffects` Package for R - Vincent Arel-Bundock, Université de Montréal

The parameters of a statistical model can sometimes be difficult to interpret substantively, especially when that model includes non-linear components, interactions, or transformations. Analysts who fit such complex models often seek to transform raw parameter estimates into quantities that are easier for domain experts and stakeholders to understand. This article presents a simple conceptual framework to describe a vast array of such quantities of interest, which are reported under imprecise and inconsistent terminology across disciplines: predictions, marginal predictions, marginal means, marginal effects, conditional effects, slopes, contrasts, risk ratios, etc. This presentation introduces marginaleffects, a package for R which offers a simple and powerful interface to compute all of those quantities, and to conduct (non-)linear hypothesis and equivalence tests on them. marginaleffects is lightweight; extensible; it works well in combination with other R packages; and it supports over 100 classes of models, including Linear, Generalized Linear, Generalized Additive, Mixed Effects, and Bayesian.

Tuesday July 9, 2024 14:10 - 14:30 CEST
Wolfgangsee

Statistical modelling

14:30 CEST

Accessing and Managing Financial Data with Tidy Finance - Christoph Scheuch, Self-Employed

Tidy Finance is a transparent, open-source approach to research in financial economics, featuring multiple programming languages. The {tidyfinance} package is a new addition to the R toolbox and includes helper functions for empirical research in financial economics, addressing a variety of topics covered in Scheuch, Voigt, and Weiss (2023). The package is designed to provide shortcuts for issues extensively discussed in the book, facilitating easier application of its concepts. In this presentation, we focus on the family of functions that download and process financial data. We highlight the concept of tidy data and how it can be applied to compiling multiple data sources for financial applications. We also demonstrate the inner workings of the {tidyfinance} package and provide inspiration for applications in teaching and research.

Speakers

Christoph Scheuch

Co-Creator, Tidy Finance

Christoph Scheuch is an independent data science and business intelligence expert. He co-created and maintains the Tidy Finance project, a transparent, open-source approach to research in financial economics. Alongside contributing to Tidy Finance, its maintainers have published in leading academic journals, including the Journal of Finance, Journal of Financial Economics, Review of Finance, and Journal of... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Pongau + Flachgau

Economics + finance + insurance + business

14:30 CEST

Layered Design for R Package Development: Meeting the Needs of Pharmaceutical R&D Stakeholders - Jean Muller & Ligia Adamska, MSD Switzerland

In the pharmaceutical industry, we aim to create standard R packages for statisticians, economic modelers, and statistical programmers. The challenge is to have both flexible functions and a systematic structure to support automatic reporting systems. In this presentation, we introduce a layered design for R package development that addresses the requirements of these diverse users. Then, through a case study in Health Technology Assessment (HTA) analysis, we show how our design provides solutions while adhering to good programming and documentation practices. The design consists of two layers, first, a verb layer, embracing functional programming and built with pipeable functions to provide flexibility for exploratory analysis. Second, a reporting layer, wrapped around the verb layer, with the ability to generate agreed upon standard analysis with one call to a function, making it easy to repeat analysis for different populations, interventions, comparators, and outcomes (PICOs). Throughout the case study, we illustrate how we leveraged R package development tools, to organize, document, and test R code to ensure quality and maintainability.

Speakers

Jean Muller

Senior Scientist, MSD Switzerland

Jean Muller is a Senior Scientist in Statistical Programming at MSD, specializing in data analysis and Health Technology Assessment (HTA) in the pharmaceutical industry. With over 5 years of experience, Jean has a strong background in Biostatistics (MSc) and applied mathematics (BSc... Read More →

Ligia Adamska

Associate Director Statistical Programming, MSD Switzerland

Ligia Adamska is an Associate Director in HTA Statistical Programming at MSD. With a strong academic background, Ligia holds a Ph.D. in Engineering Surveying and Space Geodesy from the University of Nottingham (U.K.) and a B.Sc. in Mathematics from the University of East Anglia (U.K... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Attersee

R workflow + deployment + production

14:30 CEST

Real-Time Anomaly Detection on Voting Sundays Using R and Shiny - Simon Graf, Statistisches Amt Kanton Zürich

Swiss voters head to the polls around four times a year. One of the responsibilities of the cantons is to collect the results from the municipalities, process and transmit them to the federal level. The Canton of Zurich used to check the data manually and gave feedback in case of suspicious looking results, however this non-systematic approach on an individual data point level was error-prone, labour-intense and often lacked the context of historical trends and issue specific dynamics. To remedy these shortcomings, we developed an R- and Shiny-based application called “PlausiApp”, which conducts outlier detection in real-time on a variety of results, such as shares of voting channels used, intracommunal differences in participation levels, or the deviation of the results from our prediction. It is in use since 2020. The application works as follows: Prior to the voting Sunday, we load historical data and store it in the project repository. On the voting Sunday, the application transforms the live data and runs the checks. Finally, the results are presented in a shiny dashboard.

Speakers

Simon Graf

Political Data Scientist, Statistisches Amt Kanton Zürich

Political Data Scienctist

Tuesday July 9, 2024 14:30 - 14:50 CEST
Salzburg II

Shiny + dashboards + web apps

14:30 CEST

Diagnostic Modeling for Educational and Psychological Assessment - Jake Thompson, Accessible Teaching, Learning, & Assessment Systems (ATLAS); University of Kansas

Diagnostic classification models are psychometric models that estimate the presence or absence of discrete fine-grained attributes. Due to the categorical nature of the latent variables, assessments using diagnostic models can provide highly reliable results with fewer items, reducing the burden on respondents. In addition, the fine-grained nature of the constructs facilitates the reporting of results that are more informative and actionable than a single overall score. The attributes can represent, for example, student proficiency on educational skills, or the presence of psychological traits or disorders, making these model useful in a variety of contexts. In this session, we will discuss the general properties of diagnostic models and describe how to analyze psychological and educational assessment data with diagnostic models using the R package measr. We will show how the R package measr, which interfaces with the popular Stan language, can be used to easily estimate diagnostic models and evaluate model performance (e.g., model fit, reliability). Finally, we’ll discuss how to draw inferences from the results to answer substantive research questions.

Speakers

Jake Thompson

Assistant Director of Psychometrics, Accessible Teaching, Learning, & Assessment Systems (ATLAS); University of Kansas

W. Jake Thompson is the Assistant Director of Psychometrics for Accessible Teaching, Learning, and Assessment Systems at the University of Kansas and the lead psychometrician for the Dynamic Learning Maps Alternate Assessment and Pathways for Instructionally Embedded Assessment. His... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Pinzgau + Tennegau

Social sciences

14:30 CEST

Interfacing QGIS Spatial Processing Algorithms from R - Floris Vanderhaeghe, Research Institute for Nature and Forest (INBO) (Brussels, Belgium)

R is a powerful language for processing, analyzing and visualizing spatial data, with packages such as sf, terra, and stars. However, dedicated geographic information system (GIS) software tools offer thousands of specific algorithms that are either not available in R, or may be faster than equivalent R functions. This presentation describes how it is now possible to combine the strengths of R and QGIS, the most popular open source GIS platform, through R packages that interface QGIS processing algorithms: qgisprocess and qgis. These packages allow users to create data processing pipelines that combine R and QGIS algorithms seamlessly. We discuss the current state of these R packages and demonstrate the usage of their most important functions by example. We show the usage of qgis_search_algorithms(), qgis_run_algorithm(), qgis_extract_output(), coercion methods and more. We highlight recent updates in QGIS that improve functionality in R. Finally, we seek feedback from the community and invite contributions.

Speakers

Floris

Dr. Floris Vanderhaeghe, open science methodologist at INBO, Research Institute for Nature and Forest (INBO) (Brussels, Belgium)

Floris Vanderhaeghe is a biologist specialized in scientific methodology, with a focus on spatial survey design. Together with his team mates, he promotes the implementation of open science practices at INBO. He has a special interest in geospatial computation in R and likes to collaborate... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Salzburg I

Spatial data and maps

14:30 CEST

Vital: Tidy Data Analysis for Demography - Rob Hyndman, Monash University

I will introduce the vital package which allows analysis of demographic data using tidy tools. The package uses a variation of tsibble objects as the main data class, so all of the infrastructure available for tsibble and tibble objects can also be used with vital objects. Data may include births, deaths, mortality, fertility, population and migration data. Functions for plotting, smoothing, modelling and forecasting data are included. Models include the classical Lee-Carter model as well as functional data models. Future plans include replicating all of the models available in the demography and StMoMo packages. The package is currently available at https://pkg.robjhyndman.com/vital/. It will be on CRAN before the UseR!2024 conference.

Speakers

Rob Hyndman

Professor, Monash University

Rob J Hyndman is well-known for his many R packages including forecast, demography and fable. He is a Fellow of both the Australian Academy of Science and the Academy of Social Sciences in Australia, and the author of over 200 research papers and 5 books. He has won many awards, including... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Wolfgangsee

Statistical modelling

14:50 CEST

FPortfolio 4.0 or the Rmetrics Reloaded - Stefan Theußl, Raiffeisen Bank International AG

Several contributed packages on CRAN as well as other package repositories offer tools to carry out empirical and quantitative financial research with R [1]. A very well-recognized collection of such packages is the Rmetrics suite of packages offering functionality for many different aspects of empirical and computational finance. In this talk we focus on the fPortfolio package, in particular on portfolio handling and on solving a wide variety of portfolio optimization problems. We present a prototype of the next major version of the package which will be using ROI [2,3], an extensible framework for modeling and solving linear as well as nonlinear (possibly mixed-integer) optimization problems as a backend. Furthermore, we indicate how this package could be employed in practice to manage model portfolios along various regions, base currencies and/or risk constraints. [1] Dirk Eddelbuettel. CRAN task view: Empirical finance, 2024. https://CRAN.R-project.org/view=Finance [2] Stefan Theußl, Florian Schwendinger, and Kurt Hornik. ROI: An Extensible R Optimization Infrastructure. Journal of Statistical Software, 94(15):1-64, 2020. doi: 10.18637/jss.v094.i15.

Speakers

Stefan Theußl

Head of Quant Research, Data and Processes, Raiffeisen Bank International AG

Stefan is heading the Quant Research, Data and Processes department of Raiffeisen Research at Raiffeisen Bank International AG. He has a proven track record in designing, implementing and maintaining high quality analytics products as well as quantitative models, and holds a doctoral... Read More →

Tuesday July 9, 2024 14:50 - 15:10 CEST
Pongau + Flachgau

Economics + finance + insurance + business

14:50 CEST

Building Interoperability in Existing Software Ecosystems with S3 Classes - Hugo Gruson, data.org

It is common for R packages answering the same need to have different input and output formats. This may result in a large amount of spent time to reformat the inputs and outputs whenever a specific part of the data pipeline is swapped out to use a different R package. This time can come at a huge cost whenever results are needed quickly, such as in pandemic response. Using S3 classes providing standard formats that all downstream packages use may be a good solution to this issue, thus improving the interoperability within the global R package ecosystem. However, this approach comes with technical and social challenges. Here, I present the work we are doing to implement and encourage the adoption of standard S3 classes in epidemiology. I highlight key findings and challenges such as how to preserve backward compatibility in existing packages and give recommendation for future similar endeavors.

Tuesday July 9, 2024 14:50 - 15:10 CEST
Attersee

R workflow + deployment + production

14:50 CEST

No-Code Data Analysis and Visualization Dashboards with Blockr - David Granjon, cynkra GmbH & John Coene, The Y Company

Despite widespread adoption of tools such as Shiny and dplyr, creating dashboards in R remains challenging for non-coding developers. Solutions like PowerBI or Tableau, while popular, are proprietary and expensive, offer limited reproducibility, and have restricted integration with the R ecosystem. To address this gap, we introduce blockr, an open-source dashboard builder (https://github.com/blockr-org). blockr simplifies the construction of parameterized data pipelines as a web application, enabling collaboration and easy sharing of results through self-contained code generation. We emphasize modularity by decomposing pipelines into 'blocks'. These 'blocks' can be assembled into 'stacks', which can be connected so that the output of one stack serves as the input for others. This architecture results in a reactive dashboard that allows upstream changes to cascade through the analysis, providing instant feedback to the user. General-purpose blocks provided by blockr can be combined with user-created blocks to expand functionality to use-case specific needs. This approach makes it easy to leverage the vast and mature ecosystem of R packages.

Speakers

David Granjon

Lead Shiny, cynkra GmbH

David Granjon is Lead Shiny at cynkra since September 2023. He holds a Ph.D. in applied mathematics from Université Pierre et Marie Curie and Université de Lausanne. He is the founder of the open source RinteRface organization, where he develops Shiny extensions, writes books and... Read More →

John Coene

Co-Founder, The Y Company

Tuesday July 9, 2024 14:50 - 15:10 CEST
Salzburg II

Shiny + dashboards + web apps

14:50 CEST

Xmap: Unified Tools for Ex-Post Data Harmonisation - Cynthia A Huang, Monash University

Social science research often involves harmonising data from multiple sources. For example, analysts often must resolve differences between country-specific occupation classification standards to compare labour statistics from multiple countries. Harmonised datasets involve both domain expertise and technical data-wrangling skills. Unfortunately, details of the harmonisation logic are often lost in the idiosyncrasies of bespoke data preparation scripts and ad-hoc documentation, making it difficult for others to validate or reuse harmonisation efforts. The {xmap} package addresses these challenges with a new framework and tools for data harmonisation using 'crossmap' tables. The crossmap framework unifies and simplifies the specification, implementation, validation, and documentation of recoding, aggregating and splitting operations. Crossmaps extend existing crosswalk/look-up table approaches to support one-to-many and many-to-many relationships between alternative classification standards, in addition to one-to-one and many-to-one recoding. The package also provides built-in safeguards to avoid data leakage and graph-based methods for standardised documentation.

Speakers

Cynthia A Huang

PhD Candidate, Monash University

Cynthia Huang is a PhD Candidate in the Department of Econometrics and Business Statistics at Monash University. She completed her undergraduate and honours degrees in Economics at the University of Melbourne. Her research focuses on principles and methods for using complex and alternative... Read More →

Tuesday July 9, 2024 14:50 - 15:10 CEST
Pinzgau + Tennegau

Social sciences

14:50 CEST

Sfislands: An R Package for Accommodating Islands and Disjoint Zones in Areal Spatial Modelling - Kevin Horan, Maynooth University

Fitting areal spatial models can be a cumbersome task, particularly when the geographical units are not well-behaved. The presence of islands, for example, gives rise to particular issues when creating neighbourhood structures based on contiguity. Further complications can arise from the presence of other natural barriers such as rivers and mountains, or man-made connectivities such as bridges, tunnels and ferry crossings. In order to create what a researcher considers to be an appropriate neighbourhood structure, incorporating all of the domain knowledge that they might have about the system, it should be simple and intuitive to add and remove connections between spatial units. Using examples from Indonesian earthquakes to London's river Thames, this session demonstrates a package which streamlines the human workflow involved in both the setting up of neighbourhood structures for spatial models, and the extraction of predictions from subsequent models. The package has a heavy emphasis on visualisation of both neighbourhood structures and model predictions and this will be reflected in the examples.

Speakers

Kevin Horan

PhD researcher, Maynooth University

Kevin Horan is a third-year PhD researcher in the Science Foundation Ireland Centre for Research Training in Foundations of Data Science at Maynooth University.

Tuesday July 9, 2024 14:50 - 15:10 CEST
Salzburg I

Spatial data and maps

15:10 CEST

Data Science in Economics: Elevating Mandatory Undergraduate Education - Alexander Rieber, Ulm University

In my talk, I explore teaching data science and causal inference in a mandatory undergraduate economics course using R, with no prior student knowledge in programming, causal inference, or project management. To teach these concepts, I opted for a flipped classroom approach and group projects: Initially, students learn theoretical concepts (descriptive statistics, data visualization, and causal inference) through short videos, foster these concepts through individual RTutor Problem Sets, and apply them in a case study, which we discuss in class, using RMarkdown to make the results reproducible. Subsequently, they engage in groups of three students to solve three economic projects in RMarkdown, generating HTML reports. These projects require them to gather data (e.g. via API or web scraping), perform statistical analyses, and infer causal relationships, collaborating on private GitHub repositories. Peer reviews follow each project. Tools used include the R-packages RTutor for Problem Set design, ghclass for GitHub management, and GitHub Actions for report verification.

Tuesday July 9, 2024 15:10 - 15:30 CEST
Pongau + Flachgau

Economics + finance + insurance + business

15:10 CEST

Deep Dive Into Industry R Package Quality Assessment - Szymon Maksymiuk & Lorenzo Braschi, Roche

Over the past year, Roche/Genentech has been developing tools and infrastructure that facilitate our quality and validation exercises, a necessary part of using R in the regulated pharmaceutical industry. We’ve designed a process around broadly recognized package development best practices, where automated checks diligently assess packages. The process uses several tools that we have open-sourced over the past years. We want to share our approach, with particular emphasis on the core design philosophies we’ve set out to adhere to when building a cohort of packages applicable for usage when a certain level of trust is required. Beyond the process we have developed, we would like to present in detail the components we contributed to the R ecosystem. These are rd2markdown, a package designed to convert R package documentation into standard markdown files, and covtracer, which leverages our contributions to the well-known covr package to map any unit tests to the particular functions they evaluate. We believe various applications for these packages render them useful outside the strict quality assessment area and can play a significant role in day-to-day work with the R packages.

Speakers

Lorenzo Braschi

Szymon Maksymiuk

Senior R Developer, Roche

I’m a senior R Developer at Roche specializing in R package validation, ensuring robustness and reliability in pharmaceutical data analysis. With a background as a research software engineer at Warsaw University of Technology specializing in Machine Learning, I have experience as... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Attersee

R workflow + deployment + production

15:10 CEST

MaNTrA Labor Room Analytics Platform: Harnessing the Power of R for Public Health - Girdhari Bora & Ajil Joseph, Tattva Foundation

MaNTrA dashboard is a Shiny based web application designed for visualizing and reporting Maternal and Child Health Indicators. The application captures data from labour room registries in Public Health Facilities in the state of Uttar Pradesh, India. The application currently supports the state health department in tracking and visualizing data for over 4.5 million deliveries from 5800 facilities and is growing at a rate of approximately 6000 deliveries per day. Moreover, the application is further being scaled to cover all 7000 delivery points in the state with almost 40,000 users at different levels of decision making. The dashboard facilitates user role-based access with 11 pages providing GIS visualizations, advanced drill-down functionalities and custom reports as per the user requirements. The application tracks indicators related to deliveries, births, birth and death registrations, patient feedback and medical practitioner performance. Considering the scale and volume of the dashboard, the application is designed for optimal memory consumption, concurrent user handling and faster processing thus providing the best experience for the end users.

Speakers

Ajil Joseph

Research Scientist (Data Analytics), Tattva Foundation

A data analyst and an information systems researcher specializing in shiny web app development, data visualization and system design. Has expertise in developing health analytics platforms, statistical modelling and machine learning model development in R. Currently, working as the... Read More →

Girdhari Bora

Chief Innovation Officer, Tattva Foundation

Girdhari Bora is an ICT4D professional with nearly 20 years of experience in e-innovation, digital health, and action research. He has led many projects in scalable e-innovations across sectors such as public health, microfinance, agriculture, and education. He is the co-founder of... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Salzburg II

Shiny + dashboards + web apps

15:10 CEST

DropR: Analyze and Visualize Dropout in Research - Annika Tave Overlander, University of Konstanz

In this talk we present dropR, a tool to analyze and visualize dropout especially from internet-based research. Among other features, dropR turns input from datasets into visual displays of (1) dropout curves, (2) percent remaining, and (3) dropout statistics between different conditions. It calculates parameters relevant to dropout and survival analysis, such as Chi Square values for points of difference, initial drop, confidence bands, and percent remaining in stable states. With automated inferential components, it identifies critical points in dropout and critical differences between dropout curves for various experimental conditions and generates corresponding statistical analysis. Survival tests include Chi Square, Kaplan-Meier Estimation and Rho family tests. The visual displays in the associated Shiny app are interactive so users caneasily identify regions within a display for further analysis in demo data as well as custom data provided by the user. It produces accessible - e.g. color-blind friendly - output (e.g. pdf, png) that is publication ready. dropR is made from researchers for researchers and is currently available at https://github.com/mbannert/dropR.

Speakers

Annika Tave Overlander

M.Sc., University of Konstanz

Annika Tave Overlander began her Ph.D. in Psychological Methods in December 2023. Her research focuses on the development of online tools to assist both researchers and students in acquiring the necessary skills for proper statistical analysis. She is committed to Open Science practices... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Pinzgau + Tennegau

Social sciences

15:10 CEST

Wavelet Secure Maps: Enhancing Privacy Protected Maps - Edwin de Jonge, Statistics Netherlands

We present a novel privacy protection method for spatial density maps based on wavelet MRA analysis. sdcSpatial is an R package designed to create spatial density maps, while protecting the privacy of the obervations involved. It contains several protection methods, which work well, but may create a suboptimal density map: the spatial resolutions of urban and rural areas often are very different. Wavelet Secure Maps are a novel method that use multi-resolution analysis to derive a spatial density map that adapts to the local spatial resolution. The presentation will introduce the method and its application using the upcoming update for sdcSpatial.

Speakers

Edwin de Jonge

Statistics Netherlands

Edwin de Jonge is a research and statistical consultant working at Statistics Netherlands for more than 25 years. He has a background in theoretical and computational physics. He has a long experience in methodological research, including data cleaning, visualization and network analysis... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Salzburg I

Spatial data and maps

15:10 CEST

Yes, You Can Simulate! Reproducible, Tidy Simulation Workflows with the Reimagined Simpr Package - Ethan Brown, Fulbright University Vietnam

The simpr package was designed from the ground up to make simulation -- an invaluable tool for understanding statistical models both for students and professionals -- easier to use in R. Recent updates to simpr make simulation easier than ever, allowing a full workflow to be concisely specified in a single tidy pipeline inspired by the infer package without the need for creating external functions, global values, or using loops. This pipeline includes specifying data-generating processes, defining and varying design parameters, generating many simulated datasets, fitting models, and consolidating model results. New features include reproducibility of individual simulation datasets/results (without needing to run the entire pipeline again), parallel processing support, flexible bulk data-munging options across multiple simulated datasets, advanced error-handling options, and more. The presentation will compare simpr with other approaches for simulation in R and show applications of simpr for assessing study designs (e.g. power analysis), performing simulation studies, and teaching statistics.

Speakers

Ethan Brown

Joint Faculty in Social Studies and Psychology, Fulbright University Vietnam

Ethan C. Brown has a joint appointment in Social Studies and Psychology at Fulbright University Vietnam. An enthusiastic member of the R and open science communities, his goal is to build on cognitive science and educational research to make statistics, and critical awareness of statistics... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Wolfgangsee

Statistical modelling

15:30 CEST

Enabling Analytics for Learning Applications with Learnrextra - Markus Konrad, HTW Berlin & Andre Beinrucker, University of Applied Sciences Berlin (HTW Berlin)

We present a way to create web-based learning applications and track the user interactions to facilitate educational research and improve the learners' experience. The main ingredient is a new R package, learnrextra, which extends the well-known learnr package by interaction tracking, layout improvements and a summary panel. It allows authors to create learning applications in the familiar RMarkdown format or as Shiny applications. User interactions can then be tracked in a highly granular, configurable and anonymous way, e.g. mouse movements, clicks, exercise submissions, etc. Tracking data for all learning applications is collected via a REST interface. An administration interface allows to manage learning applications, set up experiments (A/B testing), monitor tracking and download collected data. We will present our software and demonstrate how to create applications and experiments for learning analytics in R. We will share results from our experiments with students in class and discuss challenges we faced. All components are open-source: https://github.com/orgs/IFAFMultiLA/repositories. An extensive documentation is available at https://ifafmultila.github.io.

Tuesday July 9, 2024 15:30 - 15:50 CEST
Pongau + Flachgau

Data science education

15:30 CEST

Deploying RShiny-Based Applications in Financial Institutions - Goran Lovric, Raiffeisenlandesbank Oberösterreich AG

This session focuses on a detailed approach to deploy RShiny-based applications in risk management of financial institutions on in-house servers, also providing examples and a case study of those applications such as internal rating models, GRC tools, incident management tools as well as risk assessment tools. The focus is clearly on utilizing in-house servers and architecture to deploy RShiny-based applications quantifying, presenting and reporting risk within the environment of a European Central Bank (ECB) regulated bank.

Speakers

Goran Lovric

Raiffeisenlandesbank Oberösterreich AG

Goran Lovric has over 17 years of professional experience in reputable, national and international financial companies, carrying senior management and senior leadership responsibilities in (financial and non-financial) risk management. Mr. Lovric holds degrees in Law and Quantitative... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Salzburg II

Shiny + dashboards + web apps

15:30 CEST

Handling Data from Social Science Surveys with the 'Memisc' Package - Martin Elff, Zeppelin Universität, Friedrichshafen

While R provides an excellent infrastructure for advanced statistical data analysis and graphics, it is by itself not well-suited to help users from the social science to face the typical challenges involved in the preparation of data from social science surveys. This is a reason by many social scientists stick to commercial software packages such as Stata and SPSS. The aim open-source package 'memisc' provides a comprehensive infrastructure for the preparation of social science survey data. It allows dealing with variable labels, value labels, and user-defined missing values. It provides easy ways to recode data and to produce data codebooks. It thus allows social scientists to become independent of commercial software packages.

Speakers

Martin Elff

Prof. Dr., Zeppelin Universität, Friedrichshafen

Martin Elff is a professor of political sociology at Zeppelin University (Friedrichshafen, Germany). He is the author of "Data Management with R: A Guide for Social Scientists" (Sage Publications) and of three R packages published on CRAN. He has published research articles on electoral... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Pinzgau + Tennegau

Social sciences

15:30 CEST

Boost Spatial Data Science Workflows with GRASS GIS and R - Veronica Andreo, Center for Geospatial Analytics. North Carolina State University.

GRASS GIS is a powerful geoprocessing engine that offers a robust and mature toolset for diverse applications. The core distribution brings together more than 500 tools for spatial and temporal analysis of vector, raster, 3D raster and imagery data. GRASS was developed for speed and efficiency, which allows it to scale workflows with massive datasets rather simply. At the same time, R excels at statistical analysis, modeling and data visualization. The spatial community within R has indeed grown significantly in the last decade, with the rise of packages like sf, stars, gdalcubes, terra, mapview, tmap, among many others. The beauty of open source software is that we do not need to reinvent the wheel each time. Instead, we can join forces to build bridges that connect our individual strengths. In this talk, I’ll stand over the shoulders of giants, to demonstrate how the combination of GRASS GIS and R through the rgrass package can help us integrate and streamline our spatial data engineering and data science workflows for scientific and operational applications.

Speakers

Veronica Andreo

Dr., Center for Geospatial Analytics. North Carolina State University.

Veronica Andreo holds a PhD in Biology and an MSc in Remote Sensing and GIS Applications. She is part of the GRASS Dev Team, and serves as PSC chair since 2021. She is currently working at the Center for Geospatial Analytics, in North Carolina State University (USA) within an NSF... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Salzburg I

Spatial data and maps

15:30 CEST

Flexible Multidimensional Scaling with the R Packages Smacofx, Cops and Stops - Thomas Rusch, WU Vienna University of Economics and Business & Patrick Mair, Harvard University

Multidimensional scaling (MDS) refers to methods that fit distances in a reduced space so that they optimally approximate given proximities between objects. Flexibility in modelling with MDS can be introduced by allowing for various transformations of the input proximities and/or the fitted distances, or by penalization. We will present three new R packages for flexible multidimensional scaling (fMDS) that allow to fit metric and nonmetric versions of fMDS including Power Stress MDS, Sammon Mapping, Elastic Scaling, Multiscale MDS, Box-Cox MDS, Local MDS or Cluster Optimized Proximity Scaling. Optimal structure-based hyperparameter selection of transformation parameters within the Structure Optimized Proximity Scaling framework can also be carried out. The packages offer a broad array of post-fit infrastructure for plotting MDS results, exploration of local minima, and uncertainty estimation. In the latter they follow the smacof design philosophy and are fully compatible with the smacof package, all packages together comprising the "smacofverse".

Speakers

Patrick Mair

Dr., Harvard University

Senior Lecturer in Statistics

Thomas Rusch

Dr, WU Vienna University of Economics and Business

Thomas Rusch is Assistant Professor at the Competence Center for Empirical Research Methods at WU Vienna University of Economics and Business. His research interests includes exploratory data analysis, data mining, unsupervised statistical learning, psychometrics and computational... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Wolfgangsee

Statistical modelling

15:50 CEST

Break

Tuesday July 9, 2024 15:50 - 16:10 CEST
TBD

Breaks + Special Events

16:10 CEST

Keynote Sessions to be Announced

Tuesday July 9, 2024 16:10 - 16:30 CEST
Salzburg I + II

Keynote Sessions

Level Any

16:30 CEST

Keynote: Dr. Kelly Bodwin, Cal Poly

Speakers

Dr. Kelly Bodwin

Associate Professor of Statistics and Data Science, Cal Poly

Kelly Bodwin is an Associate Professor of Statistics and Data Science at Cal Poly in San Luis Obispo, CA. She primarily teaches courses in statistical computing, data science, and predictive modeling. All her courses involve elements of programming – typically in R, with a focus... Read More →

Tuesday July 9, 2024 16:30 - 17:30 CEST
Salzburg I + II

Keynote Sessions

Level Any

08:00 CEST

Registration

Wednesday July 10, 2024 08:00 - 17:40 CEST
Salzburg Foyer

Registration

09:00 CEST

Keynote Sessions to be Announced

Wednesday July 10, 2024 09:00 - 09:20 CEST
Salzburg I + II

Keynote Sessions

Level Any

09:20 CEST

Keynote: Maëlle Salmon, rOpenSci & cynkra

Speakers

Maëlle Salmon

R(esearch) Software Engineer & Blogger, rOpenSci, cynkra

Maëlle Salmon, with a PhD in statistics, is a Research Software Engineer and blogger. At rOpenSci, she maintains the guide “rOpenSci Packages: Development, Maintenance, and Peer Review,” and has developed the babeldown and babelquarto packages for multilingual documents. At cynkra... Read More →

Wednesday July 10, 2024 09:20 - 10:20 CEST
Salzburg I + II

Keynote Sessions

Level Any

10:20 CEST

Keynote Sessions to be Announced

Wednesday July 10, 2024 10:20 - 11:00 CEST
Salzburg I + II

Keynote Sessions

Level Any

10:30 CEST

Sponsor Showcase

Wednesday July 10, 2024 10:30 - 17:30 CEST
Salzburg Foyer

Sponsor Showcase

11:00 CEST

Break

Wednesday July 10, 2024 11:00 - 11:30 CEST
TBD

Breaks + Special Events

11:30 CEST

{Admiral} – the {Dplyr} of the Pharmaceutical Industry? - Stefan Pascal Thoma, Roche & Edoardo Mancini, Roche Products Ltd

{admiral} is a package developed across the pharmaceutical industry to derive datasets that comply with industry specific data standards. In this presentation, we'd like to give a brief exposition of the {admiral} package. The talk commences by introducing our problem statement, how we solve it in {admiral} by compartmentalizing domain specific functionalities, and how the package and its family expanded to a wide cross-industry collaboration. We conclude showcasing a case-study where pandemic-driven interests led to an industry effort to create a vaccine-specific {admiral} toolset.

Speakers

Edoardo Mancini

Data Scientist, Roche

Edoardo is a Data Scientist at Roche with 3+ years of experience in pharmaceuticals. He specializes in statistical programming, leading studies in ophthalmology and immunology. Edoardo promotes R for clinical reporting and holds degrees in Mathematics and Applied Mathematics, and... Read More →

Stefan Pascal Thoma

Data Scientist, Roche

Stefan Thoma is a statistical programmer, statistician, and core {admiral} developer at Roche, joining in November 2022.He has a Masters degree in Statistics from ETH Zurich and a Masters degree in Psychology from the University of Bern.

Wednesday July 10, 2024 11:30 - 11:50 CEST
Attersee

Data handling and management

11:30 CEST

Explanation Groves - Gero Szepannek, Stralsund University of Applied Science

The increasing popularity of machine learning in many application fields has increased the demand in methods of explainable machine learning as eg provided by the packages DALEX (Biecek, 2018) and iml (Molnar, 2018). In turn, comparatively few research has been dedicated to the limits of explaining complex machine learning models (Rudin, 2019, Szepannek and Lübke, 2022). Explanation groves (Szepannek and v. Holt, 2024) are presented as a tool to extract a set of understandable rules for explanation of arbitrary machine learning models. The degree of complexity of the resulting explanation can defined be the user. This allows to analyze the trade off between the complexity of a given explanation and how well it represents the original model. The corresponding R package xgrove (Szepannek, 2023) is demonstrated. Biecek P (2018). https://jmlr.org/papers/v19/18-416.html Molnar C, Bischl B, Casalicchio G (2018). doi:10.21105/joss.00786 Rudin, C (2019). doi:10.1038/s42256-019-0048-x Szepannek G (2023). https://CRAN.R-project.org/package=xgrove Szepannek, G, v. Holt, B (2024). doi:10.1007/s41237-023-00205-2 Szepannek, G, Lübke, K (2022). doi:10.1007/s13218-022-00764-8

Wednesday July 10, 2024 11:30 - 11:50 CEST
Wolfgangsee

Machine learning and AI

11:30 CEST

Forecast Reconciliation Made Easy: The FoReco Package - Daniele Girolimetto, Department of Statistical Sciences, University of Padova

Forecast reconciliation is a post-forecasting approach to ensure the coherence of forecasts across constraints (not just simple aggregation). It harmonizes individual predictions to meet predefined relationships, leading to a consistent and comprehensive picture. This can include ensuring market share forecasts for different brands sum up to the total, or guaranteeing some property (e.g. non negativity). By incorporating these constraints, reconciliation can also improve forecast accuracy by leveraging the individual strengths. This technique finds applications in several fields like finance, supply chain, macroeconomics, load, renewable energy generation, and weather forecasting. The R package FoReco provides a powerful toolset for implementing classical and regression-based forecast reconciliation. It offers a wide range of different approaches to address different types of constraints, including cross-sectional (e.g., market share), temporal (e.g., annual-monthly data), and cross-temporal relationships. This talk presents an overview of the forecast reconciliation process and provides examples using FoReco in real-world applications. https://github.com/danigiro/FoReco

Speakers

Daniele Girolimetto

Postdoctoral researcher in Statistics, Department of Statistical Sciences, University of Padova

Daniele Girolimetto is a postdoctoral researcher in the Department of Statistical Sciences at the University of Padova. His research interests are related to time series including statistical methods (univariate/multivariate forecasting approaches, bootstrap methods, applications... Read More →

Wednesday July 10, 2024 11:30 - 11:50 CEST
Salzburg I

Predictive modelling and forecasting

11:30 CEST

Maintaining the I/O Infrastructure of R: Ten Years of `Rio` and `ReadODS` - Chung-hong Chan, GESIS – Leibniz-Institut für Sozialwissenschaften

In this proposed talk, I will talk about my experience in maintaining the "boring", but arguably important, part of R: the Input and Output (I/O) infrastructure. The foci will be two packages I am currently maintaining and recently have their respective tenth anniversary: `rio` and `readODS`. In this proposed talk, I will briefly talk about how the (chaotic) I/O infrastructure of R looked like ten years ago. Then, I will talk about how the package `rio` simplifies I/O tasks with only two functions: import() and export(). I will also talk about the package `readODS`, which is designed as a silent family member of `rio` for reading and writing OpenDocument Spreadsheets (ODS), a truly open format that has been adopted by various government agencies such as NATO and EU. Then, I will talk about what has been changed in the last ten years by `rio` and `readODS`. For example, `readODS` has a performance gain of over 1000x and is the significantly faster and usable ODS reading and writing option than the offerings for Python, Julia, and Javascript. Finally, I will give an outlook of what the future of I/O infrastructure of R would look like.

Speakers

Chung-hong Chan

Senior researcher, GESIS – Leibniz-Institut für Sozialwissenschaften

Dr. Chung-hong Chan (PhD University of Hong Kong, 2018) is Senior Researcher in the Department of Computational Social Science, GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany, and External Fellow at the Mannheim Center for European Social Research, University... Read More →

Wednesday July 10, 2024 11:30 - 11:50 CEST
Pinzgau + Tennegau

Research software engineering

11:50 CEST

The R Contribution Working Group - Heather Turner, University of Warwick

The R Contribution Working Group (RCWG) was established in July 2020, in response to concerns raised at useR! 2020 about the sustainability of the community of contributors to base R, as well as the demographic diversity of this community. The working group is a collaboration between R core developers, groups focused on diversity, equity and inclusion, as well as R enthusiasts from the wider community. This talk will delve into the key initiatives led by the RCWG, explore their impact, and outline future plans. Initially the RCWG focused on communication: engaging in social media, working on novice-friendly documentation and organizing events focused on people completely new to contribution, making particular effort to reach out to people from underrepresented groups. More recently, the focus has shifted to upskilling potential contributors and initiatives to directly support people in making contributions. Examples include R Contributor Office Hours and the R Dev Day being held as a satellite to useR! 2024.

Speakers

Heather Turner

Dr, University of Warwick

Wednesday July 10, 2024 11:50 - 12:10 CEST
Salzburg II

Community and outreach

11:50 CEST

Tackling Formatted Tabular Data from Excel - Jeremy Selva, National Heart Centre Singapore

Reading tabular data with formatted cell in Microsoft Excel can be really tricky. Unexpected things may happen if I read it blindly in R using readxl::read_excel. I have tried to use the col_types argument but it was not enough for me. Unfortunately, there are limited resources to deal with reading tabular data with formatted cells in Excel. In my presentation, I will share some problematic formatted columns that I have encountered during my work with clinical data. Examples are Date in General (Text) and Date number format Numeric column with different colour font representing different units of the same measurements Numeric columns with some numbers provided in text More importantly, I will share how I managed to handle them in R using these three R packages collateral (https://collateral.jamesgoldie.dev/), pointblank (https://rstudio.github.io/pointblank/index.html) and tidyxl (https://nacnudus.github.io/tidyxl/index.html). For more details, I have written a blog post on https://jeremy-selva.netlify.app/blog/2024-02-15-tackling-formatted-cell-data/

Speakers

Jeremy Selva

Jeremy John Selva, National Heart Centre Singapore

Jeremy is a Research Officer at the National Heart Centre Singapore. His job involves cleaning and harmonisation of clinical data from multiple labs related to cardiology such as cardiac medication, coronary artery calcium score and stenosis severity. He is curious to find ways to... Read More →

Wednesday July 10, 2024 11:50 - 12:10 CEST
Attersee

Data handling and management

11:50 CEST

Generative Modelling of Mixed Tabular Data with the R Package ‘Arf’ - Jan Kapar, Leibniz Institute for Prevention Research and Epidemiology - BIPS

Generative machine learning has gained world-wide attention and, especially since the rise of ChatGPT and DALL-E, has started to become an integral tool both in business and everyday life. While the hype has mainly focused on text, image, audio and video synthesis so far, generative modelling of mixed tabular data with both continuous and categorical variables has great unexploited potential in many research fields and industry applications. However, recent attempts to adapt the existing, mainly deep learning-based methods to this more general setting have not shown the same overwhelming successes yet. We present the CRAN package ‘arf’, an easy-to-use implementation of adversarial random forests based on ‘ranger’, which has shown the ability to match and often outperform current deep learning approaches in terms of performance, tuning efforts and runtime, also on small or high dimensional data. ‘arf’ provides tools for both synthetic data generation and density estimation. Optional conditioning on events further extends the possible area of application, enabling for use cases like missing data imputation, data balancing and augmentation.

Speakers

Jan Kapar

M. Sc., Leibniz Institute for Prevention Research and Epidemiology - BIPS

since 2022: Doctoral Student / Research Fellow in Machine Learning, Faculty for Mathematics and Computer Science, Universität Bremen, and Leibniz Institute for Prevention Research and Epidemiology - BIPS 2011 - 2016: B.Sc Mathematics and M.Sc. Business Mathematics, Julius-Maxmilians-Universität... Read More →

Wednesday July 10, 2024 11:50 - 12:10 CEST
Wolfgangsee

Machine learning and AI

11:50 CEST

Dynamic Prediction with Numerous Longitudinal Covariates - Mirko Signorelli, Leiden University

To make informed decisions, clinicians and patients rely on accurate predictions of the probability to experience adverse events such as dementia, cancer or death. Dynamic prediction models can update the probability of experiencing an event as more longitudinal data is collected. However, traditional joint modelling is computationally unfeasible with more than a handful of longitudinal covariates, and until recently R lacked a package that could deal with numerous longitudinal covariates. The R package pencal uses a penalized regression calibration approach that allows to overcome this limitation. It employs mixed-effects models to summarize the evolution of the longitudinal covariates, and a penalized Cox model to predict survival. Besides covering estimation, the package comprises functions to compute predicted survival probabilities for new subjects, and to validate model performance. For large datasets, pencal enables easy parallelization through the specification of the number of cores as argument within its functions. Reference: Signorelli, M. (2023). pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal Predictors. Preprint: arXiv.2309.15600

Speakers

Mirko Signorelli

Assistant professor, Leiden University

Mirko Signorelli is assistant professor of Statistics at Leiden University, where he develops new statistical models, creates R packages, and teaches courses on R, computational statistics and longitudinal data analysis. His research focuses on statistical models for longitudinal... Read More →

Wednesday July 10, 2024 11:50 - 12:10 CEST
Salzburg I

Predictive modelling and forecasting

11:50 CEST

LibraryStatistics: A Visual Analytics Tool for Library Assessment Data - Anjali Silva, University of Toronto

The University of Toronto Libraries (UTL) is committed to informed assessment, as we strive to continually improve library performance and align library services to meet university needs. The Association of Research Libraries (ARL) annually collects data on collections, expenditures, services, and enrolment from over 120 ARL member libraries, including UTL. This survey data is one of the most comprehensive data sources of academic libraries in North America that help libraries benchmark own performance against others and understand overall trends. The abundance of data accessible through ARL underscores the need for an intuitive tool capable of facilitating the visual comparison of this data. An R package with Shiny application (app), libraryStatistics, was developed following best practices. This Shiny app permits to visualize, track trends, and compare peer institutions for up to five years in user uploaded data, requiring no data cleaning. Overall, the tool enhances the utilization of ARL data in making evidence-based decisions within UTL and peer libraries, to gain insights into the ways in which library resources contribute to its community, to support research endeavours.

Speakers

Anjali Silva

Analyst and Lecturer, University of Toronto

Anjali (she/her) has a PhD in Informatics and currently work and teach at the University of Toronto, Canada. Her research interests are in the development of statistical classification methods for data analysis with applications in trend analysis. She is also involved in multi-omics... Read More →

Wednesday July 10, 2024 11:50 - 12:10 CEST
Pongau + Flachgau

Public sector and NGO

11:50 CEST

Statistical Software Engineering : a Statistician’s Technical Journey in R - Audrey Yeo, Roche

Showcasing my one year journey as a statistician in statistical software engineering, required learnings and future outlooks.

Speakers

Audrey Yeo

Statistical Software Engineer && Biostatistician, Roche

Wednesday July 10, 2024 11:50 - 12:10 CEST
Pinzgau + Tennegau

Research software engineering

12:10 CEST

I'm Only Giving This Talk Because I'm a Woman - Clarissa Barratt, Jumping Rivers

It will come as no surprise to the reader that women are underrepresented in the R community. Anyone who has organised a conference will know the feeling of seeing the abstracts come in with a particular demographic making up most of the entries. What do we do at this point? Do we positively discriminate to put more women out there? Is that patronising? It certainly leads to being told “you’re only giving the talk because you’re a woman”, or “you’re definitely going to get accepted, you’re a woman”. To have your abilities as a scientist and presenter questioned like that is awful, but is it worth it to move towards equal representation? And how do you fight the imposter syndrome that it instils? There is no perfect answer to these questions, but I wondered, has anyone tried asking the women? I have been gathering the thoughts of some women in the R Community, and in this talk I’d like to present the big picture. Some are newer to the community, and some have been around a bit longer. How do their experiences differ? What help is out there for things like imposter syndrome? Interviewees include Dr Nic Crane, Dr Nicola Rennie, Dr Heather Turner and Dr Sarah Heaps.

Speakers

Clarissa Barratt

Data Scientist, Jumping Rivers

Clarissa is the Ambassador for Data Science at Jumping Rivers. While working towards her PhD in Quantum Physics, Clarissa discovered her love of science communication. Her goal is to make data science accessible to as many people as possible.

Wednesday July 10, 2024 12:10 - 12:30 CEST
Salzburg II

Community and outreach

12:10 CEST

Spatio Temporal Kriging for Irregularly Spaced Oceanographic Data: Development and Challenges - GiSeop Lee, Korea Institute of Ocean Science & Technology

In this presentation, we discuss the development of a custom solution for gridding oceanographic data using Spatio Temporal Kriging. The high computational load necessitated the use of lower level languages, and the computation time was significantly reduced through parallel processing. Applying the code to irregularly spaced oceanographic data observed in the field was a challenging process that required a flexible solution. We successfully gridded the oceanographic observational data using the developed code, demonstrating its effectiveness. Despite the high performance of tools like gstat, they did not meet the main project requirements as they do not support 4D input data that changes over space (xyz) and time (t). Moving forward, our goal is to enable the processing of as many datasets as possible at the level of personal computers, thereby promoting accessibility in research. We aim to share our experiences and insights, hoping to inspire further advancements in the field of oceanographic data analysis.

Speakers

GiSeop Lee

Researcher, Korea Institute of Ocean Science & Technology

Name and Position: GiSeop Lee, Researcher in KIOST. Education: B.L(Law), M.S(Physical Oceanography), Ph D(Oceanographic Data Science) Professional Experience: 8+ years in oceanographic data analysis. Specialization and Interests: Ocean data analysis and data-driven modeling, data-based... Read More →

Wednesday July 10, 2024 12:10 - 12:30 CEST
Pongau + Flachgau

Environmental sciences

12:10 CEST

Mlr3torch - Deep Learning in R - Sebastian Fischer & Martin Binder, LMU Munich

mlr3torch is a high level deep learning framework for the mlr3 ecosystem designed to easily build, train, and evaluate neural networks in a few lines of code. It leverages the torch package, which is an R interface to the LibTorch C++ library. On the one hand, the package comes with predefined and easy-to-use neural network architectures, both for classification and regression. On the other hand, it defines a language that allows to easily define custom, fully parameterized neural networks. Because the package is integrated into the mlr3 ecosystem, these neural networks can be easily benchmarked, tuned, or combined with other machine learning workflows such as preprocessing or stacking. While mlr3’s focus is tabular data, mlr3torch extends this to other modalities such as images or text, by defining a new data type: the 'lazy_tensor' . This type can be treated similarly to standard vectors and can, e.g., be preprocessed without requiring the data to be stored in-memory. This presentation will give an overview of mlr3torch's features and demonstrate its application in both research and practical machine learning scenarios. https://github.com/mlr-org/mlr3torch

Speakers

Sebastian Fischer

MSc., LMU Munich

Sebastian Fischer has a Bachelors degree in Philosophy & Economics from the University of Bayreuth and a Masters degree in Statistics from LMU Munich. He is currently doing a PhD at LMU Munich under the supervision of Prof. Dr. Bernd Bischl and is working on the MaRDI project (Mathematical... Read More →

Wednesday July 10, 2024 12:10 - 12:30 CEST
Wolfgangsee

Machine learning and AI

12:10 CEST

Tidymodels: Now Also for Time-to-Event Data! - Hannah Frick, Posit

The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles. In addition to regression and classification, it now also supports censored regression for time-to-event data. This type of data with potential censoring requires dedicated models and performance metrics from the field of survival analysis. While the censored package has made survival models available for a while, the recent addition of survival metrics to the yardstick package has enabled us to support this type of analysis across the entire framework. The same ease of use and vast functionality, from resampling and feature engineering to tuning, is now available for this additional modeling problem.

Speakers

Hannah Frick

Senior Software Engineer, Posit

Wednesday July 10, 2024 12:10 - 12:30 CEST
Salzburg I

Predictive modelling and forecasting

12:10 CEST

Engineering a Reliable R Package for Regulatory Use Using "Rpact" as an Example - Friedrich Pahlke & Gernot Wassmer, RPACT

In the ever-evolving world of clinical trial design, the R package "rpact" (available on CRAN and GitHub) has emerged as a pivotal tool for confirmatory adaptive clinical trials, crafted specifically to meet the stringent demands of regulatory requirements. This presentation will dive into the core concepts and challenges encountered over the past six years since the project's inception, which began with successful crowdfunding. Our solution was a robust validation framework inspired by GAMP 5 principles, incorporating comprehensive validation documentation, tools, and utility packages from the outset. This approach enabled high automation levels in the validation process, making development feasible with a minimal team. A key concept in our methodology is the use of template-based unit tests. These templates not only generate "testthat" test cases but also enable automation of the creation of test plans and references to function specifications, with the test protocol linking back to individual test cases. This seamless integration of testing and documentation has made "rpact" a trusted and highly accepted package in the pharmaceutical industry.

Speakers

Friedrich Pahlke

CEO, RPACT

Gernot Wassmer

CEO, RPACT

Gernot Wassmer, PhD, is a statistician and co-founder of RPACT. He received his PhD in 1993 at the Institute of Statistics, University of Munich, and was a Research Fellow at the Institute for Epidemiology, GSF Neuherberg, and the Institute of Medical Statistics, University of Cologne... Read More →

Wednesday July 10, 2024 12:10 - 12:30 CEST
Pinzgau + Tennegau

Research software engineering

12:30 CEST

Lunch

Wednesday July 10, 2024 12:30 - 13:30 CEST
TBD

Breaks + Special Events

13:30 CEST

Screening and Random Projection Tools for Regression Analysis in R - Laura Vana-Guer, TU Wien

Random projection is a powerful and important tool for dimensionality reduction where a set of high-dimensional points is linearly mapped onto a lower dimension. Random projection matrices can be rapidly generated and are oblivious to the data distribution, maintain interpretability and are equipped with theoretical guarantees on preserving the geometry of the original space with a high probability. When employed in a supervised setting, they can provide a significant reduction in computational cost. However, they tend to overfit so it is desirable to first eliminate the unimportant predictors and then perform the random projection and estimate the model on the space of the reduced (i.e., projected) predictors. Moreover, to reduce the uncertainty from the random projection, ensembles can be built. In this work we propose an R package which implements a variety of random projection and screening tools for regression in high-dimensional settings. The functionality of the package is presented using simulated and real data examples.

Speakers

Laura Vana-Guer

PhD, TU Wien

Laura's work focuses on developing methods and statistical software for the analysis of complex data structures such as high-dimensional and multivariate data. She is the co-author of several R packages including mvord, an R package for the analysis of multivariate ordinal data, and... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Big and high-dimensional data, Poster Session

13:30 CEST

Dupseqr: Disentangling Genomic Aberrations Made Easy - Ekaterina Akimova & Philine Hoven, Laboratory for Immunological and Molecular Cancer Research

Aberrant repair of DNA double strand breaks is a prominent feature of various cancers. It can result in deletions, duplications, translocations and insertions. In our previous work, we analyzed amplicon-sequencing data with our custom pipeline to detect templated insertions at the DNA damage sites (Akimova et al. 2021, doi:10.1093/nar/gkab051). Here we present the dupseqr, an R package, which summarizes several functions for a sequential tracing of insertions, duplications and inversions. Dupseqr comprises, on the one hand, the existing bash commands in a pipe function for the pre-processing of the FASTQ files and BLAST search, followed by precise trimming and filtering of mapped sequences in order to identify insertions. On the other hand, it includes a novel function to detect and depict duplications and inversions directly from your DNA sequences, whereas the input and the output can be adjusted depending on your initial data structure and your final goal. All in all, dupseqr provides a quick possibility to elucidate aberrations, such as short duplications, inversions and insertions from distant genomic sites using the sequencing data.

Speakers

Ekaterina Akimova

Dr. rer. nat., Laboratory for Immunological and Molecular Cancer Research

Philine Hoven

MSc, Laboratory for Immunological and Molecular Cancer Research

I am a PhD student of Natural and Life Sciences, currently working on the characterization of templated sequences insertions in the cancer background. My work involves wet lab techniques as well as data analytics with R.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

Exploring the Within-Individual Variability of Human Motor Learning Using GAMLSS - Julia Wood, The University of Queensland

The neural correlates of learning are frequently explored in neuroscience research, typically through learning-induced changes in the mean of a response variable. Motor skill learning can enhance neural communication between the brain and the trained muscle. This communication is typically assessed by inducing muscle contractions in the trained pathway and measuring changes in mean size over time, with larger measurements suggesting enhanced communication. Motor learning may also improve the efficiency of this communication, possibly reflected by more consistent muscle contractions and a reduction in the within-individual variability of these measurements over time. This study explored how motor skill learning and a subsequent intervention (active vs. placebo) influenced changes in the mean size and within-individual variability of these measurements. Effects were estimated by fitting a location and scale model using the GAMLSS package in R. GAMLSS fits a distributional model, which can estimate all parameters for the specified distribution. The results and analysis pipeline from this study will be discussed, emphasising the utility of the GAMLSS model in this research.

Speakers

Julia Wood

Miss, The University of Queensland

After working as an R&D chemist for several years, I became intrigued by why we sleep and how we form new memories. This inspired me to pursue a doctoral path in human sleep and memory research. During my PhD, I have discovered deep interests in data analysis, statistical modelling... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

MINT+: Web App with R Brains for SDTM Automation - Magdalena Krochmal & Adam Forys, Roche

In the realm of clinical research, a web application known as MINT+ is revolutionizing the process of SDTM automation. At its core, MINT+ utilizes a set of R-packages to power the entire solution. Its intuitive React UI empowers users to create custom SDTM mapping specifications, accommodating diverse study requirements. Leveraging DocumentDB for data storage, MINT+ enables easy metadata sharing and facilitates reuse across studies, significantly reducing workload and improving accuracy.
During this session, we will explore the R-based components that power MINT+ and are responsible for data processing and backend processes. The "rmint.sdtm" automates SDTM mappings, "rsaffron.api" serves as the backend API, and "roak" allows customization of mappings. Users can address complex scenarios that often arise in the SDTM mapping creation process, making R packages the preferred choice for overcoming industry challenges.
With advanced algorithms, a user-friendly interface, and seamless integration, MINT+ streamlines SDTM creation workflow, greatly reducing the time and effort required.

Speakers

Magdalena Krochmal

Senior Data Scientist, Roche

Magdalena Krochmal is a Senior Data Scientist based in Basel, Switzerland. With a background in biomedical engineering and a Ph.D. in bioinformatics, she has spent three impactful years at Roche. Magdalena is an expert R developer specializing in SDTM automation. Her work centers... Read More →

Adam Forys

Mr., Roche

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

Use of R in Calibration of Infectious Disease Models - Nicole Swartwood, Harvard TH Chan School of Public Health

Calibration approaches are commonly used in infectious disease modeling, but there has been little study to describe the use of these techniques within the field. Furthermore, R is increasingly used by epidemiologists to understand disease dynamics. As part of a larger scoping review investigating the distribution of calibration methods for models of HIV, TB, and malaria, we will collect data on programming languages and packages/libraries cited in published manuscripts. We aim to identify with which calibration strategies R is most commonly used and ultimately identify any gaps in and potential for development in the available calibration packages within R. We also aim to identify any association with disease, model goal, and or reducibility.

Speakers

Nicole Swartwood

Senior Research Analyst, Harvard TH Chan School of Public Health

Nicole Anne Swartwood is a infectious disease modeler at the Harvard TH Chan School of Public Health. Her work focuses on tuberculosis and COVID-19 in the United States. She co-founded the Harvard R User Group and remains as a co-organizer. She is passionate about empowering junior... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

The Rbanism Community: Empowering Urbanists to Use Research Software Effectively and with Confidence - Claudiu Forgaci, Delft University of Technology

The Rbanism community aims to empower urbanism researchers, students, educators and practitioners to use open-source software and related open-science practices effectively and with confidence. It raises awareness, stimulates engagement and builds capacity by demonstrating the benefits of reproducibility, automation and scalability. Rbanism was initiated in 2021 by a group of R users in the Department of Urbanism at TU Delft, and it has scaled up to an international community of 70+ members. Our mission is to cultivate scientific computing, data science, computational thinking and software management skills applied to urbanism. To that end, our activities include workshops, many of which are carried out as part of the Carpentries, challenges with prizes, and meetups. In addition to in-person activities, we organise online events open to our international community members. These various forms of engagement follow our commitment to inclusion and accessibility. The Rbanism community is supported by the Netherlands eScience Center, the Open Science Community Delft, as well as the Department of Urbanism and Central Library at TU Delft. Website: rbanism.org

Speakers

Claudiu Forgaci

Assistant Professor of Urban Design and Analytics, Delft University of Technology

I am an assistant professor of urban design and analytics at TU Delft, passionate about asking spatial and non-spatial questions with R. I co-initiated Rbanism, a community of R users that aims to empower urbanism researchers, students, educators and practitioners to use open-source... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Community and outreach, Poster Session

13:30 CEST

Confusion Matrices of Any Size with Number-Based Color Intensities Visualized Easily with R! - Lubomír Štěpánek, First Faculty of Medicine, Charles University, Prague & Faculty of Informatics and Statistics, Prague University of Economics and Business

A confusion matrix is a crucial tool in evaluating predictive models and comparing predicted values against actual observations. While R offers several packages such as caret, mlearning, ConfusionTableR, and others for constructing confusion matrices, customization options for color representations are often limited, although asked in papers and reports both by publication and business practice. Common methods like heatmap(), called on top of the table() function, can produce misleading color shades that do not accurately reflect the underlying data. Other solutions may require extensive coding and user-own-defined fingers-on solutions, such as using ggplot2 or similar packages, and may be time-consuming. To address this gap, we have developed a versatile graphical function that allows users to easily customize the visualization of confusion matrices with just a single line of code when called. This function can be seamlessly integrated into R workflows and has the potential to be further developed into a standalone R package for broader use. The source code and examples for this functionality can be found on our GitHub repository, https://github.com/lstepanek/confusionMatrices.

Speakers

Lubomír Štěpánek

Dr., First Faculty of Medicine, Charles University, Prague & Faculty of Informatics and Statistics, Prague University of Economics and Business

I hold M.Sc. and Ph.D. degrees in Statistics, an M.D. in General Medicine, and I'm pursuing a Ph.D. in Biomedical Informatics. As an assistant professor at Charles University and Prague University of Economics and Business, I specialize in survival analysis, machine learning, computational... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Data visualisation, Poster Session

13:30 CEST

Openstatsguide - Minimum Viable Good Practices for High Quality Statistical Software Packages - Daniel Sabanés Bové, RCONIS

The success of the R programming language is largely due to its ease of creating and sharing R packages. We propose an opinionated framework called “openstatsguide”, published on openstatsware.org/guide.html, which can guide R package developers towards a minimum set of good practices. As far as we know from our literature search, this is the first attempt at providing a small and concise set of rules for package developers. This applies not just to R, but can also be used for functionally oriented programming languages used in data science, and we give examples for R, Python, and Julia. Rather than a full and detailed how-to guide, we keep “openstatsguide” short and on a high level, thus lowering the entry point for novice and seasoned developers alike. Our hope is that this guide can increase the adoption of software engineering good practices in the statistics community. In this talk we describe the motivation and scope of “openstatsguide”, relationship with existing work, the set of good practices, the maintenance model and ideas for future complementary guides produced by the openstatsware.org working group.

Speakers

Daniel Sabanés Bové

Ph.D., RCONIS

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Efficient programming, Poster Session

13:30 CEST

Tidy and Reproducible Projects with the Cookiecutter R Package - Felix Henninger, Ludwig Maximilian University of Munich

Best practices for reproducible analyses help to make our work easier and more reliable. However, there is frequently an initial hurdle to overcome to set up an analysis environment well, and this task becomes progressively harder as work takes shape and gains in complexity. To solve this, we present cookiecutter, an R package and RStudio plugin following the popular Python standard (Greenfield et al., 2022) for creating project templates. It helps create structured work environments that adhere to best practices and build on common helpers (e.g. workflow tools), while leaving room for flexibility and customisation through a guided setup wizard. Users with more specialised needs can adapt, create and (optionally) publish their own templates, contributing back to the wider data science community. Our goal is to encourage researchers and analysts to structure their projects from the get-go, by using accessible templates that support them in creating uncluttered projects and organised workflows. Ultimately, we hope that this will increase the adoption of best practices, and more robust research generally.

Speakers

Felix Henninger

Research Software Engineer, Ludwig Maximilian University of Munich

Felix makes better science easier. He builds tools, educates and advocates, to help improve how we collect and analyse data. Felix is currently a graduate student and Research Software Engineer at the Social Data Science and AI Lab (SODA), Ludwig Maximilian University of Munich.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Efficient programming, Poster Session

13:30 CEST

A New Correlation-Based Fuzzy Cluster Validity Index with UniversalCVI R Package - Onthada Preedasawakul, King Mongkut’s University of Technology Thonburi & Nathakhun Wiroonsri, King Mongkut's University of Technology Thonburi

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes (CVI) have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. In this study, we introduce a fuzzy CVI known as the Wiroonsri–Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. Overall, the WP index outperforms most of the traditional indexes in terms of efficiency and detecting secondary options. Moreover, our index remains effective even when the fuzziness parameter m is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.

Speakers

Nathakhun Wiroonsri

Assistant Professor, King Mongkut's University of Technology Thonburi

Onthada Preedasawakul

Miss Onthada Preedasawakul, King Mongkut’s University of Technology Thonburi

Onthada Preedasawakul is currently pursuing her B.A. in Statistics from the Department of Mathematics at the Faculty of Science in King Mongkut’s University of Technology Thonburi, Bangkok, Thailand. She is a student member of the Mathematics and Statistics with Applications (MaSA... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Machine learning and AI, Poster Session

13:30 CEST

SpICE: An Interpretable Method for Spatial Data - Natalia da Silva, Universidad de la República, UDELAR

Statistical learning methods are widely utilised in tackling complex problems due to their flexibility, good predictive performance and ability to capture complex relationships among variables. One of the main drawbacks of statistical learning is the lack of interpretability of the results. Having interpretable statistical learning methods is necessary for obtaining a deeper understanding of these models. Specifically in problems in which spatial information is relevant, combining interpretable methods with spatial data can help to provide a better understanding of the problem and an improved interpretation of the results. This presentation focused on individual conditional expectation plot (ICE-plot), a model-agnostic method for interpreting statistical learning models and combining them with spatial information. An ICE-plot extension is proposed in which spatial information is used as a restriction to define spatial ICE (SpICE) curves (https://github.com/natydasilva/SpICE).

Speakers

Natalia da Silva

Assistant Professor, Universidad de la República, UDELAR

I am an Assistant Professor in the Department of Statistics at the Universidad de la República. I earned my Ph.D. degree in Statistics from Iowa State University in July 2017, under the supervision of Di Cook and Heike Hofmann. My research interests include supervised learning methods... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Machine learning and AI, Poster Session

13:30 CEST

Using Statistical Models to Generate Optimization Problems - Florian Schwendinger, Quintik - Technologies

Optimization benchmark sets are commonly used to evaluate the quality and speed of optimization solvers. These problems are typically collected from real world applications. We suggest using statistical models to automatically generate optimization problems. This has the advantages that for statistical models the data generating process is typically well known therefore it is easy to generate data for the model and then transform the data into an optimization problem. Furthermore, for statistical models, properties like convexity and unboundedness are typically well known.

Speakers

Florian Schwendinger

Dipl.-Ing. PhD, Quintik - Technologies

Wrote several R packages to different topics.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Numerical methods, Poster Session

13:30 CEST

CRANhaven - Your backup repository for recently archived CRAN packages - Lluís Revilla, IrsiCaixa & Henrik Bengtsson, University of California San Francisco (UCSF)

The Comprehensive R Archive Network (CRAN) provides the R community with more than 20,000 well-tested community-contributed R packages. One cornerstone of R is trust and correctness, which is why all CRAN packages undergo a rich set of checks - when first submitted but also daily.

R introduces new checks regularly, which means existing packages may start failing. If issues are severe enough, the CRAN Team asks the maintainer to submit a corrected version within, typically, two weeks. If not updated in time, the package is “archived” and is no longer available via traditional installation methods. As there is no public notice ahead of time, archiving of packages is a sudden, disruptive, and sometimes also blocking event for users and developers, resulting in wasted time and resources.

We have studied the archival-unarchival of CRAN packages. We will present the most common reasons for packages being archived, and how often and when they are unarchived. Based on these findings, we propose CRANhaven (https://www.cranhaven.org) - a package repository designed to mitigate the negative impact that suddenly archived packages have on the community.

Speakers

Henrik Bengtsson

Henrik Bengtsson, University of California San Francisco (UCSF)

UCSF, R Foundation, R Consortium, MSC in Computer Science, PhD in Mathematical Statistics, Applied, large-scale research in Bioinformatics and Genomics. R since 2000.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Open and reproducible science, Poster Session

13:30 CEST

Open Time Series Initiative - Minna Heim, RSEED at KOF Lab at ETH Zurich

The Open Time Series Initiative aims to bridge the gap between academic research and official statistics. Founded and developed at the Research Software Engineering and Economic Data section at KOF Lab at ETH Zurich (RSEED), the initiative plans to start its activities around publicly macroeconomic data. First, we publish an R based framework to provide the time series community with reusable code to source publicly available data and process data publications into time series. Second, we plan to publish maintain and version over 100'000 machine-readable macroeconomic time series following modern open science standards (FAIR principles). Third, we integrate processing of publicly available data into academic teaching helping to make sure the next generation of public servants become Open Science leaders who are familiar with the Swiss data landscape. Our exploratory discussion with the Swiss Federal Statistical Office (FSO) and the pioneering collaboration with the Swiss Secretariat of Economic affairs (SECO) make us optimistic that our starting push will lead to spillovers to other fields of research and parts of official statistics. Proposal: https://rseed.ch/opnts-ord-proposal/

Speakers

Minna Heim

Ms., RSEED at KOF Lab at ETH Zurich

Minna Heim is an economics student at the University of St. Gallen and works as a research assistant and for organisational development at the Research Software Engineering and Economic Data (RSEED) Section at KOF Lab at ETH Zurich.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Open and reproducible science, Poster Session

13:30 CEST

Get Rolling with R in the Public Sector - Thomas Knecht & Philipp Bosch, Statistical Office of the Canton of Zurich

Creating an R process for publishing data and deploying it for other departments sounds like a mundane task and probably not worth mentioning. In the public administration of the Canton of Zurich this is still the exception.

Based on a recently finished collaborative project we show why this is a milestone on our journey towards a more digitized and data driven administration and how this transformation unfolded over the last decade.

Without giving too much away: Building an internal community around R has proven to be at least as important as configuring proxies & Git configurations in coordination with a central IT department.

Today, as a result of our efforts, the majority of the 10.000 employees of the Canton of Zurich are able to install R out of the box from our central IT department.

Speakers

Thomas Knecht

Data Scientist, Statistical Office of the Canton of Zurich

Climbing mountains - crunching data.

Philipp Bosch

Data Scientist, Statistical Office of the Canton of Zurich

Computational Political Scientist by ❤️Data Scientist by training & job.Data4Good activist @CorrelAid.21st century public servant @Kanton Zürich.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Public sector and NGO, Poster Session

13:30 CEST

opentimeseries: An R Package to Transform (Ugly) Data Publications into Machine-Friendly Time Series - Matthias Bannert & Minna Heim, ETH Zurich

Because publications by public data providers focus on a broader audience, their datasets are often not convenient to use for research.
To mitigate this problem, the opentimeseries R package provides the time series and official statistics communities with reusable code to conveniently source data from public sources. By splitting data and metadata into two different files, a long format CSV file for the data and a JSON file for multi-lingual metainformation, the package generates output that is inclusive to humans (and their favorite spreadsheet software) _and_ convenient to ingest for machines.
This data output is the starting point not only for intertemporal comparisons but also for versioning of time series, as it is needed for real-time analysis or evaluation of forecasts. The package open-sources a data ingestion framework, proven through its longtime usage in monitoring the Swiss economy at the KOF Swiss Economic Institute at ETH Zurich, for the first time. We explicitly chose the R ecosystem with its great documentation and boiler plating tools to encourage dataset maintenance and community contributions across different fields that use public data for research.

Speakers

Minna Heim

Ms., RSEED at KOF Lab at ETH Zurich

Matthias Bannert

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Public sector and NGO, Poster Session

13:30 CEST

An R-Dominated Workflow to Produce 850.000 Feedback Reports to Schools in (Almost) Real Time - Gabriele von Eichhorn & Elisabeth Rothe & Moritz Friedrich, Federal Institute for Quality Assurance of the Austrian School System (IQS); Roman Freunberger, http

Educational Large-Scale Assessment in Austria has undergone a major change in 2022, when the system was changed from one to multiple yearly tests for several subjects and grades (iKMPLUS). The major challenge for the test developers was the immediate feedback of test results to test takers and teachers. Keeping high psychometric standards, we used a mixture of pre-calibrating the test booklets for scaling and cohort-specific reference scores and automatically sourced R scripts for coding, analysing and reporting the test data. R was used in all these processes with TAM for IRT-scaling, dplyr and tidyr for convenient data wrangling, doParallel for handling the workload and R Markdown and ggplot2 for reporting and monitoring. In sum, our R-based process of reporting nation-wide test results, for primary and secondary school pupils, produces 850.000 reports every year for teachers, pupils and principals. Here, we want to present our main principles and experiences of reporting test results under an R-based programming approach with special emphasis on the underlying psychometric analyses, the subsequent automated generation of graphs and the process in general.

Speakers

Elisabeth Rothe

Dipl.-Psych., Federal Institute for Quality Assurance of the Austrian School System (IQS)

Elisabeth Rothe has worked as a Psychometrician at IQS (Federal Institute for Quality Assurance of the Austrian School System) since 2018. Her focus has been on test design, psychometric evaluation of study designs, standard setting and reporting for various audiences. She was leader... Read More →

Gabriele von Eichhorn

Psychometrician, Federal Institute for Quality Assurance of the Austrian School System (IQS)

Gabriele von Eichhorn obtained a Master’s degree in Psychology and a Bachelor’s degree in Educational Science from the University of Salzburg. Having gained experience in different psychometric and diagnostic environments, she is currently a Psychometrician at the Federal Institute... Read More →

Moritz Friedrich

MSc, Federal Institute for Quality Assurance of the Austrian School System (IQS)

study of psychology, psychometrician at IQS

Roman Freunberger

PhD, https://www.iqs.gv.at/

study of psychology, psychometrician at IQS

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Quarto and reporting, Poster Session

13:30 CEST

Distributed GxP Workloads for R - Magnus Mengelbier, Limelogic AB

The broad and constantly evolving GxP use of R within Life Sciences is powerful. As the user base grows across the organization and R capabilities are added and evolved, you are not just managing a single environment of a particular use case. The workloads naturally become distributed across multiple environments with different architectures tailored to their peculiar role and use in the business.

We consider a set of common environments and their architectures and how a little bit of {plumber} can enable a simple-to-manage R architecture across dissimilar environments, even those that do not currently or simply cannot support the use of R. This new approach is easily extendable to Good Clinical Practice, and any of the other GxP domains, with a few simple processes and controls.

Speakers

Magnus Mengelbier

Managing Director, Limelogic AB

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

R workflow + deployment + production, Poster Session

13:30 CEST

Implementing Behavioral Nudges in Shiny - Shel Kariuki, UCD

Nudging is based on the idea that behavior is influenced by a wide range of enviromental factors, some of which can be altered by simple actions. Nudges can be used to boost employee motivation and productivity.

In this talk, I will demonstrate how behavioral nudges can be integrated into a shiny app.

We'll explore a hypothetical scenario involving a company named Nyawanja, which aims to boost the sales of a new product that they've recently developed. Nyawanja has developed a shiny
app that their sales people can use to keep track of their performance. They have divided their sales team into five different treatment groups as part of an experiment designed on the app to motivate their staff and hopefully improve performance. Once someone logs into the app, they'll get a different version of the home page based on their assigned treatment group. The rest of the app will be the same for everyone. Nyawanja can then analyse the effect of these different nudges on performance.

From this talk, I hope the audience will learn how to design and run experiments in a shiny app. This presentation will benefit those interested in experimental design and data-driven decision making.

Speakers

Shelmith Nyagathiri Kariuki

Shel Kariuki is currently a student pursuing an MSc in Economics and Data Analytics at University College Dublin. She has worked as a data analyst for around 7 years and has in the recent years been involved in building shiny apps. Shel is also a retired co-organizer of R-Ladies Nairobi... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

R workflow + deployment + production, Poster Session

13:30 CEST

A reproducible analysis of CRAN Task Views to understand the state of an R package ecosystem - Hugo Gruson, data.org

The research community is increasingly aware of the need to apply software engineering best practices to scientific software. This however doesn't mean that we should discard the huge ecosystem of existing tools with large, well-established, user bases. Instead, efforts should be dedicated to integrate best practices in existing tools where possible. But this can only be done if we have a clear idea of the current state of the ecosystem, with its gaps and needs.
In this presentation, I will describe the analysis we have conducted on the ecosystem of R packages for Epidemiology, as represented by the CRAN Task View in Epidemiology. It allows us to draw a picture of where efforts to support this ecosystem should focus. This also informs future training needs for this research community, and maps a path for external contributions to packages that wish it.
Importantly, this analysis is made reproducible and applicable to any CRAN Task View out of the box, which allows research and software communities from other fields to conduct the same assessment on their own domain.

Speakers

Hugo Gruson

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Research software engineering, Poster Session

13:30 CEST

Ambiorix - a web framework for R - John Coene, The Y Company

The {ambiorix} package is a web framework for R inspired by express.js which allows building traditional web application, and RESTful APIs.

Speakers

John Coene

Co-Founder, The Y Company

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

RKernel: Yet Another Jupyter Kernel for R - Martin Elff, Zeppelin Universität, Friedrichshafen

A new Jupyter kernel for R is presented. Jupyter is a web and desktop application that allows to create notebooks that combine R code with (textual and graphical) output created by this code. While other kernels for R already exist, the presented one has a several advantages: It gives maximal control to the useR of how R objects are presented in the frontend, it fully supports the Jupyter widget infrastructure, and it allows using R's own debugging infrastructure.

The new kernel is available from Github as open source package.

Speakers

Martin Elff

Prof. Dr., Zeppelin Universität, Friedrichshafen

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

StatLearning: A Shiny App for Practicing Statistical Hypothesis Testing - Juan Claramunt, Leiden University

With this poster, we want to introduce StatLearning, our shiny app for practicing diverse statistical tests. This app is a step forward in digital exercising. While many digital exercises only provide data and questions, StatLearning provides an individualized learning path. To do so, we analyzed the different types of students and their preferred learning styles (reading, watching, listening, etc.). Besides, apart from the data and questions, we provide the users with statistical definitions and help windows. These windows include written explanations and videos. Moreover, we consider the diverse student's abilities, allowing students to skip steps if their statistical abilities are advanced while providing extra help for students with lower skills. This way, we aim for students with low abilities to reach advanced skills by practicing. We aim to introduce StatLearning to other R users at the conference who might find it useful for their courses. Furthermore, we would like to receive feedback to improve our app. We have used this app for three years and keep improving it yearly. Therefore, any new suggestions are welcomed.

Speakers

Juan Claramunt

Specialist in sciencific information, Leiden University

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

Volcano.View: Building Dashboards That Aren't Slow - Michael Galanakis, Hasselt University / Novo Nordisk

We introduce the R package volcano.view that allows users to create interactive dashboards for proteomics data. Proteomics results are high-dimensional, making interactive dashboards crucial for navigating the thousands of proteins. However, the many data points that need to render also introduced slow load times when using the R package Shiny. We therefore showcase an alternative approach using the JavaScript libraries React and D3. We compared the performance between volcano.view and shiny using Google's Lighthouse tool. The package was developed to visualize results from the SomaScan proteomics assay. The package is organized into modules that can add different functionality. It can include gene set enrichment results, as wells compare results from different studies. Although, the package was developed to be used on proteomics data, it can easily be used more broadly. Although, using React to develop your dashboard is more time consuming and requires more training for data scientists when compared to Shiny. It also offers tools for greatly improving performance. Researchers should be aware of this trade-off between performance and development time.

Speakers

Michael Galanakis

Mr, Hasselt University / Novo Nordisk

Michael is an industrial PhD fellow working with Novo Nordisk and Hasselt university. He has experience working as a statistician in both clinical trials and epidemiological studies, where he has co-authored 9 peer-reviewed publications. He completed his bachelor's in mathematics... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

WebApp Studio for productionizing shiny applications, rmakrdown and R Plumber APIs

Webapp studio that gives the possibility to R developpers to build, test and deploy entreprise level shiny dashboards, rmakrdown reports and plumber APIs on a large scale (> 100 users) in a very intuitive fashion

Speakers

Farid Azouaou

CTO, thaink²

A Data passionate with more than 10 years experience in the fields of analytics & AI for industry. I worked for different companies in different fields, the last one was at Mercedes-Benz Mobility before landing at thaink² as a CTO & Co-founder.I humbly consider myself an expert in... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

SurveymonkeyR : Tools for Communicating with Surveymonkey's API - Yasuto Nakano, Kwansei Gakuin University

The purpose of this presentation is to propose an useful tool for social researchers using the online survey service SurveyMonkey(surveymonkey.com). That is surverymonekyR. surverymonekyR is a library package containing functions to perform tasks such as authenticating a user, creating surveys and retrieving data from the SurveyMonkey's API. surverymonekyR offers an effective option for a data lifecycle of social surveys, from creating questionnaires to obtaining and analyzing data, and finally publishing the results, within the R environment. Many individuals involved in social surveys operate online survey services via GUI in web browsers. While this method is user-friendly, it becomes inefficient for repetitive tasks. The functions included in surverymonekyR provide efficient and reproducible survey environments for social scientists who may not be proficient at API operations.

Speakers

Yasuto Nakano

Prof. Dr., Kwansei Gakuin University

professor of sociology, Ph.D. https://researchmap.jp/yasuto.nakano?lang=en

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Social sciences, Poster Session

13:30 CEST

R for Spatio-Temporal Handling of Moving Polygons - Lorena Abad, University of Salzburg

Data cubes are structures to store and analyse spatio-temporal data in raster and vector format. Typical examples of spatio-temporal vector data are weather stations collecting data over time, or administrative polygons where historical data is aggregated per zone. A less explored use case for data cubes are moving polygons. Example of moving polygons would be spatial representations of glacier retreat, emergence of volcanic lava flows or the changes of a city boundary over time. In this contribution, I introduce the handling of polygons that evolve and move over time using vector data cubes. The implementation in R makes use of the packages {stars} and {cubble} as ways to represent data in array and tabular formats. The advantage of vector data cubes in both formats is the ability to apply common array operations, but also tidy data wrangling techniques to explore and analyse data. Temporal analyses can be performed using packages like {tsibble}, while spatial analyses can be performed using {sf} methods. Further, more complex spatio-temporal analyses like change detection can be performed using {stampr}. Visualization techniques using {ggplot2} and {tmap} are also explored.

Speakers

Lorena Abad

MSc., University of Salzburg

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Spatial data and maps, Poster Session

13:30 CEST

A guide to R packages for synthetic data generation - Michael Kammer, University of Vienna

Statistical method development is partly driven through applications and the complexities of real world datasets. But we all know that sharing these datasets is often difficult because of legal, ethical or practical concerns, thus making the creation of synthetic data closely reproducing the real world data an attractive option circumventing such issues. Similarly, generating realistic data is important for method comparison studies that are crucial for establishing the evidence base for statistical methods.

Yet there seems to be little consensus on how to actually code data generators. As a first step to make coding of simulations more accessible, we provide a systematic scoping review of existing R packages to support data generation (results publicly available on osf). We will also include our own package that aims to complement the existing ecosystem by building a library of interesting data generators derived from real-world datasets.

A single tool is not enough to fit all needs, so we will discuss how these tools help you to support open science principles by facilitating sharing of data from your own research, or by generating data for your own methods development.

Speakers

Michael Kammer

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Combining probabilistic forecasts with the `gamstackr` package - Euan Enticott, University of Bristol

Ensemble models are increasingly popular tools for capturing heterogeneous information and improving predictive performance. We will present the `gamstackr` R package, which provides tools for aggregating or `stacking` the probabilistic forecasts produced by different models or `experts`. In particular, the package implements a versatile, easy-to-use framework for probabilistic stacking that allows to control the experts’ weights via additive models containing fixed, random or smooth effects. It also provides statistical and computational scalability in the number of experts by exploiting context-specific relationships between them.

We will illustrate the typical workflow of the `gamstackr` package, that is how to: create a heterogeneous set of experts, build and fit several types of stacking models and visualise the ensemble weights and their relationship with the covariates. The package is currently available at https://github.com/eenticott/gamstackr.

Speakers

Euan Enticott

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

CompInt: A Package for Interpretable and Comparable Reporting of Effect Sizes - Hannah Schulz-Kümpel, Department of Statistics, LMU Munich

Ever struggled with how to report and explain the results of a statistical model you just fit? Do not worry, the CompInt R-package is here to help you with this more than common problem! In fact, misinterpretations of statistical significance and classical effect measures like odds ratios are widespread, even among researchers familiar with their definitions. More than that, trying to compare or accumulate the results from several different models, as is the goal of multi-analyst studies and Meta-analysis, there currently really does not exist a uniform gold standard. Based on [Kümpel & Hoffmann](https://arxiv.org/pdf/2211.02621.pdf), the CompInt package implements a general reporting framework, allowing for the consistent derivation of effect size measure definitions and visualization techniques aimed at maximizing the interpretability and comparability of regression results. This session will highlight the importance of transparent reporting, explain the possible specifications of the framework, and generally showcase the applications of the CompInt package.

Speakers

Hannah Schulz-Kümpel

M.Sc., Department of Statistics, LMU Munich

After receiving her Bachelor's in Mathematics from Heidelberg University and Master's in Statistics from LMU Munich, Hannah Schulz-Kümpel is now a PhD student at the ‘Konrad Zuse School of Excellence in Reliable AI’ (relAI) under the supervision of Bernd Bischl.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Improving the Modeling of Binary Regression Based on New Proposals for Statistical Diagnostics - Alejandra Andrea Tapia Silva, Pontificia Universidad Católica de Chile

Binary regression models using logit or probit link functions have been widely employed in examining the relationship between binary responses and covariates. However, misspecification of the link function can result in poor model fit and compromise the significance of covariate effects. In this study, we present a local influence diagnostic method associated with a new family of link functions that allows evaluating the sensitivity of symmetric links towards asymmetric ones. This new family offers a comprehensive model that encompasses nested symmetric cases. Furthermore, we present a local influence diagnostic method to evaluate the sensitivity of odds ratios. Monte Carlo simulations are performed to evaluate both the performance of the diagnostic method and the parameter estimation of the overall model, complemented by illustrations using medical data related to menstruation and respiratory problems. The results confirm the effectiveness of our proposal, highlighting the critical role of statistical diagnostics in modeling.

Speakers

Alejandra Andrea Tapia Silva

Dr., Pontificia Universidad Católica de Chile

"I'm Alejandra, an assistant professor in the Statistics Department at PUC, Chile. I'm part of R-Ladies and I love statistical modeling, R, cats, art, and David Bowie."

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

miniSize: An R package to calculate the minimal sample size in balanced ANOVA models - Bernhard Spangl, BOKU University

We consider balanced one-way, two-way, and three-way ANOVA models to test the hypothesis that the fixed factor A has no effect. The other factors are fixed or random. For most of these models (including all balanced 1-way and 2-way ANOVA models) an exact F-test exists.

Given a prespecified power, miniSize allows the user to compute the minimal sample size of the above mentioned ANOVA models, i.e. the minimal number of experiments needed.

This is achieved by the determination of the noncentrality parameter for the exact F-test, a description of its minimal value by a sharp lower bound, and thus a guarantee of the worst-case power for the F-test. Additionally, we provide a structural result for the minimal sample size that we call "pivot" effect.

We will present the newly developed R package "miniSize" and give some examples of how to use its functionality to calculate the minimal sample size.

Speakers

Bernhard Spangl

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Multilevel Regression with Projection Pursuit Tree - Eun-Kyung Lee & Seowoo Jung, Ewha Womans University

Multilevel regression and post-stratification (MRP; Gelman & Hill, 2006; Gelman et al., 2020) are developed to process data from demographically diverse groups in complex survey designs. To obtain a representative estimate for a specific group, a multilevel regression model combines an individual-level model using individual-level data and a population-level model using group-level data. MRP is divided into two stages. The first step is the multilevel regression step, which estimates a stratified model divided into an individual model and a population model into an individual-response model using priors for parameters. The multilevel regression model is intended to calculate estimates for each class used for later post-stratification. In the individual model, only variables that enable post-stratification can be used. In this study, the existing problem of MRP, which uses only categorical variables that can be used for post-stratification, was solved by proposing a method incorporating a projection pursuit tree and implementing it in R.

Speakers

Eun-Kyung Lee

Professor, Ewha Womans University

Eun-Kyung Lee is a Professor in the Statistics Department. She earned a Ph.D., majoring in Statistical Computation and Visualization of Multi-variate Data at Iowa State University in the U.S. Currently, she's engaging in projects in medical statistics and statistical computing ar... Read More →

Seowoo Jung

Multilevel regression with projection pursuit tree, Ewha Womans University

- Bachelor’s Degree in Statistics, Ewha Womans University (2019-2023) - Master of Science in Statistics, Ewha Womans University (2023~)

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Multinomial Logit Models, with or without Random Effects or Overdispersion - Martin Elff, Zeppelin Universität, Friedrichshafen

The open-source package 'mclogit' allows to fit multinomial logit models without or with random effects. It supports two main types of models: conditional logit models, which can be used to analyse discrete choices (from potentially varying choice sets), and baseline category logit models, which can be used to analyse polychotomous responses. Both types of models can be extended by (multilevel) random effects or an overdipersion parameter. Estimation of random effects variants of these models is based either on a Laplace or Solomon-Cox approximation.

In the session, the specification of conditional logit models and baseline category logit model is discussed with practical examples. In addition, results of a simulation study are presented, which examines the performance of the approximation in finite samples.

Speakers

Martin Elff

Prof. Dr., Zeppelin Universität, Friedrichshafen

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

SCM: An R package for Generalized Additive Modelling of Covariance Matrices - Vincenzo Gioia, University of Trieste

Coupling additive mean vector and covariance matrix modelling for multivariate Gaussian models is a complex task, requiring methodological choices on the model structure, scalability of the model fitting procedures, and a set of tailored inferential and model-checking tools. The SCM (Smoothing for Covariance matrix Modelling) R package enables smooth additive modelling of the elements of the mean vector and of an unconstrained parametrisation of the covariance matrix, while ensuring computational scalability by exploiting model sparsity and using the efficient linear algebra routines provided by the RcppArmadillo package. It also leverages the well-developed inferential methods and the visualization tools provided by the mgcv and mgcViz R packages.

In this talk, we will illustrate the modelling capabilities of the SCM package and we will provide useful insights into the data modelling process on several real-world applications. In particular, we will provide an overview of the main aspects of the model building and checking phases, as well as insights on how to interpret the model output. The SCM package is currently available at https://github.com/VinGioia90/SCM/.

Speakers

Vincenzo Gioia

Ph.D., University of Trieste

Vincenzo Gioia is a research assistant at the Department of Economic, Business, Mathematical and Statistical Sciences, University of Trieste, Italy.He received a PhD in Managerial and Actuarial Sciences from the University of Udine in 2023.His research interests range from asymptotic... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

15:00 CEST

Exploring Class Overlap in Classification Challenges: Introducing the R Package 'Clap' - Priyanga Dilini Talagala, University of Moratuwa, Sri Lanka

The issue of class overlap in classification arises when two or more classes share similar feature representations, making it challenging for a classification model to differentiate between them with accuracy. This problem can lead to misclassification errors, which can significantly impact the performance of classification models and make it difficult to interpret the decision-making process. In this session, I will discuss our novel XAI framework that provides post-hoc, model-agnostic, and local explanations to enhance the trustworthiness of classification models in the presence of overlapping problems. Our framework derives a feature map using different transfer learning approaches, reduces feature space dimensionality, detects overlapping regions using density estimation, and integrates an XAI module to explain the classification model. We assess the framework’s validity using real-world datasets and demonstrate its usefulness in enhancing model transparency and interpretability in the presence of class overlapping problems. Furthermore, we introduce the R package "clap" to facilitate the detection of overlapping regions in multidimensional data, as proposed in our approach.

Speakers

Priyanga Dilini Talagala

Dr., University of Moratuwa, Sri Lanka

I am a senior lecturer in the Department of Computational Mathematics, University of Moratuwa, Sri Lanka. I earned my PhD in statistics from Monash University, Australia. I am a fellow of OWSD, a program unit of UNESCO. I am an associate editor for The R Journal. I am a co-founder... Read More →

Wednesday July 10, 2024 15:00 - 15:20 CEST
Pongau + Flachgau

Big and high-dimensional data

15:00 CEST

Using R to Create Quantitive Insights on the Benefit from Open Sourcing Company R Packages - James Black, Roche

R is increasingly used in the pharmaceutical industry as the backbone for the pan-study codebase for the design and analysis of clinical trials. In parallel with this shift to R, many companies are open sourcing, and collaborating, on the post-competitive code used across studies. While numerous benefits come from companies open sourcing their R codebase, from better talent acquisition, to transparency with regulators, activity on git repos provides an insight into the return on investment (ROI) from external contributions to the codebase a company depends on. In this talk I share examples of how the Github API and git history can be leveraged to assess external contributions to the late-stage codebase Roche open-sourced, shedding light on the tangible benefits derived from collaborative development in an open setting.

Speakers

James Black

Dr, Roche

Director for Roche's late-stage Insights codebase and business lead for our clinical and RWD scientific computing environment environments.

Wednesday July 10, 2024 15:00 - 15:20 CEST
Attersee

Cross-industry collaboration

15:00 CEST

Getting the Most Out of Test-Driven Development for Shiny - Jakub Sobolewski, Appsilon

Tests are not only a way of catching bugs but also a way of building software. During the talk, I’ll share how we can use Test-Driven Development to build Shiny apps. We’ll start with tips on gathering requirements in a format that is easy to translate to test cases. Implementing requirements as automated tests helps us get confidence we’ve built the correct code. This is crucial when using Shiny in enterprise, when producing incorrect results may cost dearly. I’ll introduce patterns that help us separate test code from implementation details, making tests more durable. We’ll discuss ways we can shape test code to build specifications that read almost like natural language. Furthermore, we’ll talk about how to use shinytest2 effectively and what the alternatives are for robust testing of Shiny apps. Even if you don’t plan to employ Test Driven Development, you’ll be able to reuse the same patterns to produce more durable tests that document the app's behavior.

Speakers

Jakub Sobolewski

Mr, Appsilon

Jakub is a senior engineer at Appsilon. He has a background in applied physics. Before Appsilon he worked in insurance and banking as an analyst. Maintainer of the shiny.react and shiny.fluent packages.

Wednesday July 10, 2024 15:00 - 15:20 CEST
Salzburg I

Efficient programming

15:00 CEST

Web APIs with Httr2 - Hadley Wickham, Posit

httr2 is an R package that helps you generate and perform HTTP requests then process the responses. In this talk, I'll introduce you to httr2 and the basics of HTTP, so you can learn what powers so much of the modern internet, and how you can work with it from R. httr2 is the successor to httr and has been under development for around two years. Compared to httr, httr2 has an explicit request object which leads to a more familiar interface where you can iteratively build up complex requests with the pipe. There's no need to stop using httr, as we'll continue to maintain it for many years to come, but for new projects, I'd highly recommend giving httr2 a shot.

Speakers

Hadley Wickham

Chief Scientist, Posit

Wednesday July 10, 2024 15:00 - 15:20 CEST
Salzburg II

Interfaces with other programming languages

15:00 CEST

Autovi: Automated Assessment of Residual Plots Using Computer Vision - Weihao Li, Monash University

Visual assessment of residual plots is crucial for evaluating linear regression model assumptions and fit, but accurately interpreting these plots can be challenging. The 'autovi' package provides an automated solution in R by leveraging computer vision models. Taking a residual plot as input, 'autovi' approximates a distance metric that quantifies the divergence of the actual residual distribution from the reference distribution expected under correct model specification. This approximated distance enables formal statistical tests and provides a holistic approach to collectively assess different model assumptions. This talk will introduce the functionality of 'autovi', demonstrate its performance across diverse regression scenarios, and discuss opportunities to extend the package.

Speakers

Weihao Li

Mr, Monash University

Weihao (Patrick) Li, a third-year PhD student in the Department of Econometrics and Business Statistics at Monash University, is actively engaged in research focused on automating visual inference for residual diagnostics. Patrick completed a Bachelor of Commerce majoring in Business... Read More →

Wednesday July 10, 2024 15:00 - 15:20 CEST
Pinzgau + Tennegau

Machine learning and AI

15:00 CEST

Quantile Additive Modelling on Large Data Sets Using the Qgam R Package - Benjamin Griffiths, University of Bristol

The qgam R package is an extension of the mgcv package, offering methods for building and fitting quantile additive models (QGAMs), which do not make any parametric assumption on the distribution of the response variable. While QGAMs make fewer assumption than standard GAMs, they are slower to fit due to the cost of selecting the so-called “learning-rate”. The longer fitting time is particularly problematic when handling large data sets and complex models. This talk focuses on the development of new Big Data methods for QGAMs (and on their implementation in the qgam package) which much alleviate this issue. In particular, we will show that the new methods lead to a significant decrease in computational time and to much lower memory requirements, but do not affect the accuracy of the fitted quantiles. While we will demonstrate the methods on regional solar production modelling, they are useful in a wide range of industrial and scientific applications.

Speakers

Benjamin Griffiths

Mr, University of Bristol

3rd year PhD student in COMPASS CDT of the University of Bristol, sponsored by Électricité de France. Research interests lie in developing scalable fitting methods for quantile and loss-based GAMs, and on their implementation in open-source software.

Wednesday July 10, 2024 15:00 - 15:20 CEST
Wolfgangsee

Predictive modelling and forecasting

15:20 CEST

Visualize Your Fitted Nonlinear Dimension Reduction Model in High-Dimensional Space - Jayani Piyadi Gamage, Monash University, Australia

Nonlinear dimension reduction (NLDR) techniques such as tSNE, and UMAP provide a low-dimensional representation of high-dimensional data using non-linear transformation. The methods and parameter choices can create wildly different representations, making it difficult to decide which is best, or whether any or all are accurate or misleading. NLDR often exaggerates random patterns, sometimes due to the samples observed. But NLDR views have an important role in data analysis because, if done well, they provide a concise visual (and conceptual) summary of high-dimensional distributions. To help evaluate the NLDR we have developed an algorithm to show the 2D NLDR model in the high-dimensional space, viewed with a tour. One can see if the model fits everywhere or better in some subspaces, or completely mismatches the data. It is used to help with evaluating which 2D layout is the best representation of the high-dimensional distribution. Also, we can see how different methods may have similar summaries or quirks. This methodology is available in the R package `quollr`. We'll demonstrate this using single-cell data, focusing on understanding cluster structure.

Speakers

Jayani Piyadi Gamage

Miss., Monash University, Australia

I’m a second year PhD student in the Department of Econometrics and Business Statistics at Monash University, Australia. Under the guidance of Professor Dianne Cook, Dr. Paul Harrison, Dr. Michael Lydeamore, and Dr. Thiyanga Talagala, my research focuses on developing a novel tool... Read More →

Wednesday July 10, 2024 15:20 - 15:40 CEST
Pongau + Flachgau

Big and high-dimensional data

15:20 CEST

CAMIS: An Open-Source, Community Endeavour for Comparing Analysis Method Implementations - Chi Zhang, University of Oslo

Statisticians using multiple softwares (SAS, R, Python) will have found differences in analysis results that warrant further justification. Whilst some industries may accept results not being the same as long as they are "close", the highly regulated pharmaceutical industry would require an identical match in results. Yet, discrepancies might still occur, and knowing the reasons (different methods, options, algorithms etc) is critical to the modern statistician and subsequent regulatory submissions. In this talk I will introduce CAMIS: Comparing Analysis Method Implementations in Software (CAMIS). https://psiaims.github.io/CAMIS/ It is a joint-project between PHUSE, the R Validation Hub, PSI AIMS, R consortium and openstatsware. The aim of CAMIS is to investigate and document differences and similarities between different statistical softwares such as SAS and R. We use Quarto and Github to document methods, algorithms and comparisons between softwares through small case studies, and all articles are contributed by the community. In the transition from proprietary to open source technology in the industry, CAMIS can serve as a guidebook to navigate this process.

Speakers

Zhang Chi

Statistician and lecturer, Co-lead of CAMIS, University of Oslo

Chi is a statistician and part-time lecturer in biostatistics, at University of Oslo in Norway, and she is enthusiastic about using open-source technology in teaching and collaborations with clinicians and public health researchers. Chi is an active member of the R community and is... Read More →

Wednesday July 10, 2024 15:20 - 15:40 CEST
Attersee

Cross-industry collaboration

15:20 CEST

What They Forgot to Teach You About Shiny Development in Production - Mohamed El Fodil Ihaddaden, HDI AG

The amount of documentation about Shiny is getting larger and larger. The official Shiny website offers a nice interface with live examples. Many books and video tutorials have been published about the subject. However, the amount of documentation and freely available knowledge about Shiny development in a production context remains scarce. As such, many Shiny developers, even the experienced once find themselves confused when starting their journey within a company that makes money using Shiny, at least at first. Indeed, while developing Shiny applications in production, there are a set of rules that should be respected in order to have the smoothest experience possible, for the developers themselves and for the stockholders too. By stockholders, I don't mean the users only, rather everyone who interact directly or indirectly with the app, for example someone might need an output that is generated through the app. Through my experience as a Shiny developer for a company that leverages Shiny in production in an extensive way, I would like to think that I have gathered an interesting amount of knowledge that will help anyone improve their Shiny development process

Speakers

Fodil Ihaddaden

Analytics Engineer, HDI AG

R enthusiast from Algeria and based in Hamburg, Germany. Working as an R/Shiny developer.

Wednesday July 10, 2024 15:20 - 15:40 CEST
Salzburg I

Efficient programming

15:20 CEST

C for R Users - Ella Kaye, University of Warwick

Much of base R is written in C. As R users, we may encounter this code when debugging our own code. As R contributors, an understanding of C can enable us to find the root cause of a bug and/or propose a patch to the C code to fix a bug. In any case, learning a new programming language can be fun and rewarding! In this talk, I'll discuss why, as R users/programmers, we may want to learn C, and resources for doing so. I'll show examples of how C is used in the codebase of base R. I'll give an example of how, with only a little C knowledge, it was possible to add a new feature into the R language (specifying colours with three-digit hex codes). Finally, I'll promote the C Study Group for R contributors as a friendly community learning C together.

Speakers

Ella Kaye

Ms, University of Warwick

Wednesday July 10, 2024 15:20 - 15:40 CEST
Salzburg II

Interfaces with other programming languages

15:20 CEST

Mlr3mbo: Modern and Flexible Bayesian Optimization - Lennart Schneider, LMU Munich & Munich Center for Machine Learning (MCML)

Bayesian Optimization has emerged as the de facto standard for optimizing computationally intensive black-box functions. Such functions, characterized by the lack of availability of any information beyond the output value for a given input, present significant challenges in domains ranging from hyperparameter optimization in machine learning to applied sciences such as chemical engineering, material sciences, and drug discovery. mlr3mbo offers a modern and versatile approach to Bayesian Optimization in R as part of the mlr3 ecosystem. It not only provides ready-to-use optimization algorithms but also provides the essential building blocks necessary for the easy development of custom Bayesian Optimization algorithms. This flexibility extends to supporting both single and multi-objective optimization problems, along with handling mixed search spaces that include continuous, categorical, and conditional variables. In this talk, we will showcase mlr3mbo and its key features with practical demonstrations on hyperparameter optimization in machine learning, illustrating its potential to boost efficiency and effectiveness.

Speakers

Lennart Schneider

PhD Student, LMU Munich & Munich Center for Machine Learning (MCML)

Lennart Schneider is pursuing his PhD at LMU Munich's Chair of Statistical Learning and Data Science and the Munich Center for Machine Learning (MCML), under the guidance of Prof. Dr. Bernd Bischl. His research primarily focuses on Hyperparameter Optimization, Neural Architecture... Read More →

Wednesday July 10, 2024 15:20 - 15:40 CEST
Pinzgau + Tennegau

Machine learning and AI

15:20 CEST

MissForestPredict - Missing Data Imputation in Prediction Settings - Elena Albu, KU Leuven, Belgium

Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occurs at model development and at prediction time. The newly released missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a convergence criterion (unified for continuous and categorical variables and based on the out-of-bag error) is met. The imputation models are saved for each variable and iteration and can be applied later to new observations. The missForestPredict package offers extended error monitoring, control over variables used in the imputation and custom initialization. This allows users to tailor the imputation to their specific needs. The missForestPredict algorithm is further compared to mean/mode imputation, k-nearest neighbours, bagging and two iterative algorithms (miceRanger and IterativeImputer) on 8 simulated datasets with simulated missingness and 8 public datasets using different prediction models. missForestPredict provides satisfactory results within short computation times.

Speakers

Elena Albu

Ms., KU Leuven, Belgium

During her career in healthcare IT and data science, she worked with Electronic Health Record (EHR) data and gained knowledge on medical workflows. In 2019, she earned her Master of Science in Statistical Data Analysis at the University of Ghent, Belgium. During this program, her... Read More →

Wednesday July 10, 2024 15:20 - 15:40 CEST
Wolfgangsee

Predictive modelling and forecasting

15:40 CEST

The DiscreteFDR Package for Multiple Testing with Discrete Data - Florian Junge, Darmstadt University of Applied Sciences - Darmstadt Institute of Statistics and Operations Research & Sebastian Doehler, Darmstadt University of Applied Sciences

The simultaneous analysis of a large number of statistical tests is ubiquitous in applications. The false discovery rate (FDR) is a popular error rate for avoiding Type-I error inflation, controlled by the famous BH procedure, which is implemented in R’s stats package. As this procedure was designed for continuous test statistics, it is known to be conservative for discrete data. More efficient methods that still guarantee FDR control in the latter setting have been proposed by Döhler, Durand & Roquain (2018). In our talk, we present the R package DiscreteFDR which provides efficient implementations of these procedures. It can be applied as an off-the-shelf tool for commonly used discrete tests such as Fisher’s exact test, or for any arbitrary discrete test by using information on its p-value distribution. After a brief introduction to the statistical background we focus on the implementation, which relies heavily on Rcpp. With large numbers of tests, some tweaks are required to lower computing and RAM requirements. The results are output in an S3 class, for which print, summary and plot methods are available. Finally, we will demonstrate the usage of the package with real data.

Wednesday July 10, 2024 15:40 - 16:00 CEST
Pongau + Flachgau

Big and high-dimensional data

15:40 CEST

Balancing Global Infrastructure and Local Autonomy: Lessons from R-Ladies Global - Shannon Pileggi, The Prostate Cancer Clinical Trials Consortium

As a global non-profit established in 2016, R-Ladies has more than 100k members from 233 chapters in 63 countries to support the mission of increasing gender diversity in the R community. Empowering local chapters is challenging as accessibility and awareness of communication methods, software choices, social platforms, and support avenues varies internationally. Join us for insights into our journey of developing a global technical and social infrastructure while fostering collaboration and growth and granting chapters the freedom to tailor their activities to local contexts. Walk away with practical technical and social strategies to empower and diversify your own data science communities based on learning from continuous feedback.

Speakers

Shannon Pileggi

Lead Data Scientist, The Prostate Cancer Clinical Trials Consortium

Wednesday July 10, 2024 15:40 - 16:00 CEST
Attersee

Community and outreach

15:40 CEST

Mastering Plumber Structure: Your API's Solutions. - Adam Forys & Magdalena Krochmal, Roche

This presentation offers a comprehensive analysis of the structural design patterns within the Plumber API framework. We will explore a spectrum of approaches, ranging from fundamental implementations to sophisticated techniques such as 'Plumber as code' and 'Plumber as package.' Each structure will be examined for its advantages and best-use scenarios. Finally, we will provide guidance on selecting the most suitable API structure based on a development team's skills and project requirements.

Speakers

Adam Forys

Mr., Roche

Magdalena Krochmal

Senior Data Scientist, Roche

Wednesday July 10, 2024 15:40 - 16:00 CEST
Salzburg I

Efficient programming

15:40 CEST

Unveiling the R Package Universe: Exploring CRAN Ecosystem with Knowledge Graphs - Dennis Irorere, Tripadvisor, Inc

Dive into the heart of the Comprehensive R Archive Network (CRAN) ecosystem in this session, where we introduce the concept of knowledge graphs to navigate the vast landscape of R packages. By constructing a knowledge graph that captures package dependencies, author relationships, and semantic associations, we enable users to explore, analyse, and visualise the intricate web of interactions within CRAN. Discover how leveraging knowledge graphs enhances package discovery, facilitates collaboration, and fosters innovation within the R community. Join us as we embark on a journey to unlock the hidden insights and synergies within the R package universe, empowering users to harness the full potential of CRAN for their data science endeavours.

Speakers

Dennis Irorere

Mr., Tripadvisor, Inc

Dennis is a data engineer with over five years of R experience, began as a community builder, mentoring aspiring data scientists. Now, he leverages R and other tools to develop intelligent systems, employing graph and relational databases for diverse data challenges.

Wednesday July 10, 2024 15:40 - 16:00 CEST
Salzburg II

Interfaces with other programming languages

15:40 CEST

ML-Based Imputation Methods in R Package VIM: Performance and Considerations - Johannes Gussenbauer & Alexander Kowarik, Statistics Austria; Nina Niederhametner, Statistik Austria

Missing data poses a pervasive issue in statistical analysis across various domains. Ignoring missing values or using incongruous imputation methods can introduce bias and decrease the validity of statistical results. To overcome the challenge of missing data imputation, we propose the use of novel machine learning algorithms: The R package VIM (Visualization and Imputation of Missing Values) has incorporated machine learning (ML)-based imputation methods, including xgboost and transformer models. This presentation will elucidate the recent advancements in VIM, with a special emphasis on the performance of these ML models in handling missing data, comparing them to more conventional imputation methods, and highlight their advantages and disadvantages. Through real-world examples, we aim to demonstrate the effectiveness of our models in improving accuracy and reliability.

Speakers

Alexander Kowarik

Head of Statistical methods and survey methodology, Statistics Austria

Dr. Alexander Kowarik is head of the methods unit at Statistics Austria with more than 10 years of experience working at a NSI. He is an active contributor to the R open source community with a focus on official statistics application.

Johannes Gussenbauer

Methodologist, Statistics Austria

I studied Mathematics at the Universtiy of Technology in Vienna and am working as a methodoligst at Statistics Austria since 2017. My main topics at work cover imputation, calibration and error estimation for surveys as well as text classification using R. I contribute to various... Read More →

Nina Niederhametner

Methodologist, Statistics Austria

Nina Niederhametner started working as a methodologist at Statistik Austria in November 2023, where her main work centers around imputation and classification using large language models. She also specializes in data privacy and anonymization with special focus on synthetic data... Read More →

Wednesday July 10, 2024 15:40 - 16:00 CEST
Pinzgau + Tennegau

Machine learning and AI

15:40 CEST

Tsdataleaks: Tool to Detect Data Leaks in Large Time Series Collections in Forecasting Competitions - Dr. Thiyanga Talagala, University of Sri Jayewardenepura

Large-scale time series forecasting competitions are excellent platforms for fostering innovation and advancing the field of time series analysis. One of the most frequent problems that arises in forecasting competitions is data leakage. Data leaks can happen when the training period values contain information about the test period values. There are a variety of different ways that data leaks can occur with time series data. For example: i) randomly chosen blocks of time series are concatenated to form a new time series; ii) scale-shifts; iii) repeating patterns ; iv) addition of white noise; v) modified scales; vi) temporal aggregations: for example, create monthly series using daily series, etc. The tsdataleaks package provides a simple and computationally efficient algorithm to exploit data leaks in time series data. The tsdataleaks package is available on CRAN.

Speakers

Thiyanga S. Talagala

Dr, University of Sri Jayewardenepura

I am a senior lecturer in the Department of Statistics, Faculty of Applied Sciences, at the University of Sri Jayewardenepura, Sri Lanka. I received my PhD in statistics from Monash University. I am a co-founder and co-organizer of R Ladies-Colombo, Sri Lanka. I am also serving as... Read More →

Wednesday July 10, 2024 15:40 - 16:00 CEST
Wolfgangsee

Predictive modelling and forecasting

16:00 CEST

Break

Wednesday July 10, 2024 16:00 - 16:25 CEST
TBD

Breaks + Special Events

16:25 CEST

Keynote Sessions to be Announced

Wednesday July 10, 2024 16:25 - 16:40 CEST
Salzburg I + II

Keynote Sessions

Level Any

16:40 CEST

Keynote: Torsten Hothorn, University of Zürich

Speakers

Torsten Hothorn

Professor of Biostatistics, Epidemiology, Biostatistics and Prevention Institute, University of Zürich

Torsten Hothorn is Professor of Biostatistics ad personam at the Epidemiology, Biostatistics and Prevention Institute of the University of Zurich.He received a Diploma in Statistics from the University of Dortmund in 2000 and a Dr. rerum naturalium in 2003 from the same universit... Read More →

Wednesday July 10, 2024 16:40 - 17:40 CEST
Salzburg I + II

Keynote Sessions

Level Any

08:00 CEST

Registration

Thursday July 11, 2024 08:00 - 14:00 CEST
Salzburg Foyer

Registration

09:00 CEST

Keynote Sessions to be Announced

Thursday July 11, 2024 09:00 - 09:20 CEST
Salzburg I + II

Keynote Sessions

Level Any

09:20 CEST

Keynote: Abhishek Ulayil, Institute of Actuaries of India

Speakers

Abhishek Ulayil

Software Developer and Aspiring Actuary, Institute of Actuaries of India

Abhishek Ulayil is a software developer and aspiring actuary, currently pursuing in actuarial science studies at the Institute of Actuaries of India. With a strong commitment to open source contributions, he has released numerous packages in both Python and R.Over the past two years... Read More →

Thursday July 11, 2024 09:20 - 10:20 CEST
Salzburg I + II

Keynote Sessions

Level Any

10:00 CEST

Sponsor Showcase

Thursday July 11, 2024 10:00 - 14:00 CEST
Salzburg Foyer

Sponsor Showcase

10:30 CEST

PACTA: Empowering the Climate Finance Transition with R - Alex Axthelm, RMI

In the urgent pursuit of climate action, the need for effective financial tools to drive sustainable investment has become paramount.
The Paris Agreement Capital Transition Assessment (PACTA) is a forward-looking, science based analysis helping shift capital flows in greener directions and enabling the financial sector to contribute to the goals of the Paris Agreement.
PACTA offers free tools supporting investors in determining the alignment of their portfolios and loan books with widely accepted climate scenarios.
To date more than 1500 institutions have assessed their portfolios with PACTA, analyzing assets totalling over US$100T.

PACTA also equips governments and regulators to assess the climate alignment of their regulated entities, both individually and at the level of an entire sector.
Our team has supported more than a dozen government entities and regulatory bodies in assessing the climate alignment of their financial sectors.

PACTA, written in R and freely available under the MIT license, stands as a powerful tool in the fight against climate change.
Join us to explore its transformative potential and contribute to the advancement of sustainable finance.

Speakers

Alex Axthelm

Thursday July 11, 2024 10:30 - 10:35 CEST
Pinzgau + Tennegau

Economics + finance + insurance + business, Lightning Talk

10:30 CEST

Benchmarking (R)Cpp Code with rcpptimer - Jonathan Berrisch, University of Duisburg-Essen

This talk presents 'rcpptimer' [1]. 'rcpptimer' is a novel R package that provides Rcpp bindings for 'cpptimer' [2], a simple tic-toc timer class for benchmarking C++ code. This sleek tic-toc timer supports overlapping timers and OpenMP parallelism. It boasts a microsecond-level time resolution. We did not find any overhead of the timer itself at this resolution. Results (with summary statistics) are automatically passed back to R as a data frame.

I will demonstrate the versatile application of 'rcpptimer' in diverse Rcpp projects. Furthermore, we'll delve into the intricacies of its implementation, shedding light on its evolution from 'rcppclock' [3]. I will highlight key features such as OpenMP parallelism and automatically returning results to R. Additionally, I will discuss the rationale behind the decision to separate the core components of 'rcpptimer' into the standalone project 'cpptimer'.

This talk addresses 'Rcpp' beginners. Basic knowledge of C++ classes is needed to follow every implementation aspect.

[1] https://cran.r-project.org/web/packages/rcpptimer/index.html
[2] https://github.com/BerriJ/cpptimer
[3] https://github.com/zdebruine/RcppClock

Speakers

Jonathan Berrisch

Thursday July 11, 2024 10:30 - 10:35 CEST
Attersee

Interfaces with other programming languages, Lightning Talk

10:30 CEST

Performance Testing and Comparative Benchmarking for data.table - Doris Afriyie Amoakohene, Northern Arizona University

The data.table package in R is a powerful tool for data analysis, combining efficient C code with user-friendly R syntax. To ensure its long-term sustainability, the NSF POSE program has funded a project from 2023 to 2025 to build a self-sustaining ecosystem around data.table.

In this presentation, we will discuss the importance of performance testing in the development of data.table and present a general approach that can be applied to other R packages. By creating performance tests based on historical regressions, we can measure the package's efficiency over time and memory usage, ensuring that code and version releases do not impact its performance. We will demonstrate the use of the atime package to benchmark execution time and memory usage, providing developers with confidence in maintaining efficient performance and reliability. This approach not only benefits data.table but also serves as a model for other R package developers to enhance the performance and popularity of their own projects.

Speakers

Doris Afriyie Amoakohene

Thursday July 11, 2024 10:30 - 10:35 CEST
Pongau + Flachgau

Open and reproducible science, Lightning Talk

10:35 CEST

tRialblazing – advantages of using R in large clinical trials - Piotr Starnawski, Novo Nordisk A/S

Pharmaceutical industry programming has for many years been characterized by "one programming language - take it or leave it". This is reflected in persistent use of established standard programs and closed source languages, due to their prevalence within the field.
However, the transition to open source is well underway and the advantages of using modern languages, such as R, are becoming more common and accepted. Programming of datasets for large clinical trials in R greatly benefits from using i) modern, scalable infrastructure; ii) large speed gains from parallelization paired with new file formats; iii) integrated version control, and iv) DevOps solutions, just to name a few advantages. The nature of open source itself enables tapping into community solutions, e.g. the pharmaverse packages, and, in return, contributing to them with internally developed code.
This presentation will outline the challenges we have been facing while transitioning to R in Novo Nordisk, the expected and often unexpected gains resulting from that change and the direction, in our opinion, that clinical trial programming is headed towards.

Speakers

Piotr Starnawski

Thursday July 11, 2024 10:35 - 10:40 CEST
Pinzgau + Tennegau

Biostatistics + epidemiology + bioinformatics, Lightning Talk

10:35 CEST

Managing REDCap Data: The R package REDCapDM - João Carmezim, Germans Trias i Pujol Research Institute and Hospital (IGTP)

REDCap is a secure web application for creating and managing online surveys and databases. The aim of the R package “REDCapDM” is to process REDCap data and provide useful tools to perform all tasks involved in the data cleansing process prior to statistical analysis. The ‘REDCapDM’ package is structured into four dimensions, each serving a specific purpose. Firstly, read and process raw data from REDCap or through a REDCap API connection in R. Secondly, perform data transformation and data organization. Thirdly, identification of queries, specifically missing values, values outside the lower and upper limit of a variable and other types of inconsistencies in data from REDCap in R. Fourthly, perform an automatic control of queries already resolved or pending resolution. This package fills a gap in the available tools to manage REDCap data, making it an invaluable asset to researchers. The “REDCapDM” package is available on the CRAN library (https://cran.r-project.org/web/packages/REDCapDM/index.html) and is regularly updated.

Speakers

João Carmezim

Thursday July 11, 2024 10:35 - 10:40 CEST
Attersee

R workflow + deployment + production, Lightning Talk

10:40 CEST

caRdoon – a task queue API for R - Jakob Gepp, statworx GmbH

In this talk, I will introduce caRdoon, a plumber API that creates a local task management by enabling the asynchronous execution of arbitrary functions and providing a real-time view of job queues inspired by celery. By utilizing the asynchronous setup, one can avoid waiting for long tasks to finish, but still be able to get information on when a task is scheduled in the current queue. The result of each function is stored in a database for later retrieval. This enables the user to run tasks and review the results on demand.

Speakers

Jakob Gepp

Senior Consultant Data Science, statworx GmbH

After my M.Sc Statistics in 2016, I began working at statworx. Here I started providing statistical support in R for companies and private customers. Over the years I got more into the data science aspect, but kept R close to my heart. I developed some internal R packages and last... Read More →

Thursday July 11, 2024 10:40 - 10:45 CEST
Attersee

R workflow + deployment + production, Lightning Talk

11:00 CEST

Break

Thursday July 11, 2024 11:00 - 11:30 CEST
TBA

Breaks + Special Events

11:30 CEST

R Evolution: The Retirement of R Packages with Many Reverse Dependencies - Edzer Pebesma, University of Muenster & Roger Bivand, Norwegian School of Economics

We report on a project where three older R packages for spatial analysis: rgdal for reading and writing vector and raster data and coordinate transfromation, rgeos for geometric transformations and predicates and maptools have been taken off CRAN on Oct 16, 2023 because their maintainer retired, and more modern approaches (e.g., sf and terra) had superseded them. To avoid a very large number of R packages that depended on one or more of these packages, directly or indirectly, being removed from CRAN, we took a number of steps. In this talk we describe the steps we took to minimize lasting damage to other packages on CRAN, and report on the lessons learnt. Over the course of the project the number of at-risk packages decreased from more than 550 to less than 100. Of the 70 or so packages still on an at-risk watch-list, about half were actively archived by their maintainers as outdated. We will discuss some key takeaways/pieces of advice for developers considering software retirement, and propose a mechanism of deprecating packages with a deprecation date, which could for instance show up as a NOTE when checking packages that use them.

Speakers

Roger Bivand

Norwegian School of Economics

Retired

Edzer Pebesma

University of Muenster

I lead the spatio-temporal modelling laboratory at the institute for geoinformatics, and am deputy head of institute. I hold a PhD in geosciences, and am interested in spatial statistics, environmental modelling, geoinformatics and GI Science, semantic technology for spatial analysis... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Salzburg I

Community and outreach

11:30 CEST

Designing a Drop-in Replacement for Dplyr - Kirill Müller, cynkra GmbH

The dplyr package is a powerful tool for data manipulation in R. It provides a consistent grammar for manipulating data frames and is widely used by data scientists and analysts. However, dplyr requires that the entire dataset fit into memory, and can be slow for large datasets. The duckdb package is a new in-memory database that is designed to be blazing fast and efficient for analytical workloads. A relational frontend, modeled after Codd's relational algebra, is provided alongside an SQL interface. The new duckplyr package uses this relational frontend: unlike dbplyr, which translates dplyr commands into SQL, duckplyr translates dplyr commands into relational algebra. The package has been designed to be a fully compatible drop-in replacement for dplyr from day one. Operations are run in duckdb when possible, and fall back to dplyr when not. The project's goal is to speed up more and more dplyr verbs, R functions, and data types, towards becoming the primary implementation of the dplyr grammar of data manipulation. In this talk, I will present duckdb and duckplyr, and discuss the design of duckplyr and the supporting tools.

Speakers

Kirill Müller

Founding partner, cynkra GmbH

Kirill Müller has been working on the boundary between data and computer science for more than 25 years. He has been awarded five R consortium projects to improve database connectivity and performance in R. Kirill is a core contributor to several tidyverse packages, including dplyr... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Attersee

Efficient programming

11:30 CEST

Desert Island Docker: R Edition - Andrew Collier, Fathom Data

What 3 Docker images would you choose if you were shipwrecked on a desert island? Choosing the right images will determine whether you are rescued or end up in a cannibals' cooking pot (R images will make you unpalatable). Docker is an essential tool for survival as an R developer, regardless of whether you are stranded or not. In this talk I'll describe three R Docker images that I consider essential for survival on a desert island. I'll demonstrate how to set up and build a custom image. And finally I'll demonstrate how using a Docker image can simplify CI/CD and deployment. - What is Docker? - Food & Shelter: Base Image - SOS Signal: Shiny Image - Building a Raft: Custom Image (which includes RJava and uses renv) - Applications - CI/CD - Deployment

Speakers

Andrew Collier

Dr, Fathom Data

Andrew is Lead Data Scientist at Fathom Data. He spends his days tinkering with R, Python and Docker.

Thursday July 11, 2024 11:30 - 11:50 CEST
Salzburg II

R workflow + deployment + production

11:30 CEST

Crafting Intuitive Spatial Select Fields with ReactJS, R, and Nivo Library - Anastasiia Kostiv, esqLABS GmbH

"Unleash the Power of Spatial Visualization: Exploring ReactR, Nivo, and NivoR" Join us for an electrifying session at UseR2024, where we'll dive into the dynamic world of spatial data visualization like never before! Delve into the cutting-edge capabilities of the ReactR package as we showcase its ability to create breathtaking UI experiences using React libraries. Get ready to be spellbound as we unveil the secrets of intuitive spatial data filtering and selection, leveraging the unparalleled features of Nivo widgets. But that's not all! Brace yourself for the unveiling of NivoR, a groundbreaking collaboration of Shiny, ReactR, React.js, and the Nivo framework. Witness the fusion of art and technology as we demonstrate how NivoR pushes the boundaries of possibility in data visualization, offering an immersive journey into the heart of interactive spatial exploration. Join us at UseR2024 for a session that promises to ignite your imagination, elevate your understanding, and leave you inspired by the endless possibilities of spatial visualization!" Be ready to test the nivoR package during the talk!

Speakers

Anastasiia Kostiv

Senior Software Developer, esqLABS GmbH

Experienced Senior Software Engineer and Data Analyst with a diverse background in engineering, web development, and data science. Successfully implements advanced techniques in healthcare and data management. Creator of shinycalendar and nivoR packages. Passionate about addressing... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Wolfgangsee

Shiny + dashboards + web apps

11:30 CEST

Navigating the R Ecosystem Using R-Universe - Jeroen Ooms, rOpenSci

One of the hardest parts of effectively using R, is finding the best packages for the problem you are trying to solve. This might even be as important as being fluent in the language itself. Building your code on reliable foundations is essential for good results, and difficult to change later on in a project. There are over 20.000 packages on CRAN and many more on other networks such as BioConductor and GitHub. New packages are released every day. The quality and scope of packages varies, which can make it difficult to judge which tools are the best choice, and get a sense of the software landscape in general. R-universe [https://r-universe.dev] is an ambitious platform supported by rOpenSci and the R consortium to help you publish, discover, and start using R packages. In this talk we show different ways in which you can use the search engine, dashboards, and APIs to browse the R ecosystem. R-universe shows you everything there is to know about packages and their maintainers, prepares binaries for all platforms, and provides beautifully rendered documentation to help you get you started immediately.

Speakers

Jeroen Ooms

Research software engineer, rOpenSci

Jeroen is staff member of rOpenSci, and maintains too many R packages.

Thursday July 11, 2024 11:30 - 11:50 CEST
Pongau + Flachgau

Shiny + dashboards + web apps

11:30 CEST

Squat: Statistics for Quaternions Over Time - Aymeric Stamm, CNRS

The study of rotational movement is of paramount importance in robotics as well as in health science. The statistical unit behind measurements of rotational motion is a sequence of 3-dimensional rotations that evolve over time. The goal of {squat} is to provide accessibility to extensions of common statistical methods for the analysis of rotation-valued time series and functional data. The package relies on the Quaternion class from the Eigen library accessed through the {RcppEigen} package. It provides dedicated classes for a single curve as well as a set of curves. Currently, it supports centring, standardisation, visualisation (powered by {ggplot2} and, optionally, {gganimate}), mean and median computation, random sampling, exponential and logarithmic maps to go back and forth from and to the tangent space respectively, smoothing, resampling, distance matrix computation, clustering methods (hierarchical, k-means and dbscan) and principal component analysis. Clustering and PCA also have their dedicated visualisation tools. The package has a dedicated website (https://lmjl-alea.github.io/squat/index.html) as well as a public Github repository (https://github.com/LMJL-Alea/squat/).

Speakers

Aymeric Stamm

Research Engineer, CNRS

I’m Aymeric (pronounced M-Rick). I am a research engineer specialised in statistical information. My theoretical research revolves around developing novel statistical methods for analysing complex data, such as manifold-valued data, network-valued data, topological data, connectome... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Pinzgau + Tennegau

Statistical modelling

11:50 CEST

Security and Scalability in Shiny with Httr2: Strategies for Efficient API Use - Alexandros Kouretsis, Appsilon

Join us for a deep dive into the role of Application Programming Interfaces (APIs) in modern web development, focusing on how they enable smooth communication between systems. We'll explore the latest stable release of httr2 and its impact on enhancing Shiny applications through improved data retrieval, error handling, and working with asynchronous operations. This session will cover practical aspects of HTTP communication in Shiny, response handling, and header manipulation, utilizing httr2's features for more efficient HTTP requests and advanced error management. We'll also discuss the importance of implementing asynchronous API communication in Shiny to optimize performance and user experience with data-intensive applications. Aimed at both new and experienced Shiny developers, this presentation offers insights and techniques to improve your projects' security, stability, and performance.

Speakers

Alexandros Kouretsis

Dr., Appsilon

Alexandros Kouretsis comes from the field of Astrophysics with a solid understanding of applied statistics, probability, and machine learning. Currently serving as an R/Shiny developer at Appsilon, Alexandros leverages his expertise to create interactive and visually compelling applications... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Attersee

Efficient programming

11:50 CEST

Fifteen Years of the R Journal - Mark van der Loo, Statistics Netherlands

The first issue of the R Journal was published in June 2009. Run by volunteers from academia, government and industry, the journal has grown into an increasingly popular outlet for scientific research on anything related to R. At the time of writing the Journal has an impact factor of 1.673. In this talk I will look back at the origins and history of The R Journal. I will look back on the people involved and the formal organisation of the journal, including associate editors, editors, and the advisory board. We will take a detailed look at the current editorial process and production of issues in HTML and pdf format will be explained. This will yield extensive tips and tricks that help aspiring authors to get their submissions processed quickly. Finally, we will look into the future developments of the R Journal.

Speakers

Mark van der Loo

Senior Researcher, Statistics Netherlands

Mark is a Senior Researcher at Statistics Netherlands and a Research Fellow at the Leiden Institute for Advanced Computer Science at the University of Leiden. Mark published his first package in 2009 and has since co-authored about 20 R packages, a book on statistical data cleaning... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Salzburg I

Open and reproducible science

11:50 CEST

Flowchart: An R Package for Creating Participant Flow Diagrams Integrated with Tidyverse - Pau Satorra, Germans Trias i Pujol Research Institute and Hospital (IGTP)

The presentation will be a brief tutorial about a new released R package in CRAN called {flowchart} to create participant flow diagrams directly from a dataframe (https://cran.r-project.org/web/packages/flowchart/index.html). In health research, a patient flowchart is the best way to show the flow of participants in a study when reporting results as stated by the CONSORT guideline (https://www.bmj.com/content/340/bmj.c332.long). There are several packages in R for drawing flowcharts using different approaches but generally the programming is quite complex and the numbers need to be manually entered or parameterized beforehand. This new package uses a different approach integrated into the tidyverse framework. It allows you to create many different types of flowcharts in an easy and much more reproducible way because it automatically adapts to the data. This means we don’t have to manually set the flowchart parameters, such as the box coordinates or the numbers to display . The main idea behind the package is to create flowcharts from an initial dataset by combining different basic functions with the pipe operator (\|\> or %\>%).

Speakers

Pau Satorra

Mr, Germans Trias i Pujol Research Institute and Hospital (IGTP)

I'm a biostatistician with a background in mathematics and 4 years of experience in clinical research analysis. In 2019 I graduated in Mathematics from the University of Barcelona (UB). In 2023, I graduated from the Master in Fundamental Principles of Data Science at UB. From 2019... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Salzburg II

R workflow + deployment + production

11:50 CEST

How I Built an API for My Life (and How You Can Too) - Deepansh Khurana, Appsilon

Last year, I built two apps for myself since all the existing apps were either too bloated or privacy nightmares: Ebenezer (finances) and Livingston (travel). A pattern emerged: bespoke apps named after literary characters, which were now the mainstays of my browser window. I wondered: what if there was a dashboard for my life? My networth from Ebenezer. My upcoming trips from Livingston. And possibly more literary-character-themed apps? With a stack of R, AWS DynamoDb, S3, EC2, Plumber, ShinyProxy, Google Sheets and more, I built an infinitely extensible framework for my life: Hrafnagud (the all-seeing Odin). In this talk, I aim to share my journey, hacks and learnings and inspire and empower you to build things for yourself, too!

Speakers

Deepansh Khurana

R/Shiny Developer, Appsilon

Deepansh is a data enthusiast proficient in both R and Python, with a penchant for exploratory analysis. He enjoys the open-ended nature of data analysis and building applications on all sorts of topics in general. He lives for the coincidental insights and "aha!" moments. When not... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Pongau + Flachgau

Shiny + dashboards + web apps

11:50 CEST

Template for Engaging Quiz with GenAI Response - Lynna Jirpongopas, Advanced Micro Devices Inc.

Let R Shiny and shinysurveys do the heavy lifting of creating a web application to test generative AI use cases or concepts. The use case we are exploring here is a Career Archetype Quiz. The quiz aims to provide career insights and advice to the app user. This session provides a template for creating a quiz while leveraging the capabilities of AI-generated responses. Ultimately this quiz template can be used for other types topics and personalized responses.

Speakers

Lynna Jirpongopas

Data Scientist, Advanced Micro Devices Inc.

Lynna has comprehensive background in data science within high-tech environments. She's a math graduate from the University of California, San Diego. Recently, she has expanded her expertise by completing a Professional Certificate in Artificial Intelligence from Stanford University... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Wolfgangsee

Shiny + dashboards + web apps

11:50 CEST

Introducing the 'Gasmodel' Package for Generalized Autoregressive Score Models - Vladimír Holý, Prague University of Economics and Business

I present the 'gasmodel' package, designed to facilitate the estimation, forecasting, and simulation of a broad range of generalized autoregressive score (GAS) models. GAS models are a class of observation-driven time series models that employ the score to dynamically update time-varying parameters of the underlying probability distribution. The package supports diverse data types, offers a rich selection of distributions, provides flexible options for specifying dynamics, and allows for the incorporation of exogenous variables.

Speakers

Vladimír Holý

Dr., Prague University of Economics and Business

Vladimír Holý is an assistant professor at the Prague University of Economics and Business. His area of expertise is time series analysis.

Thursday July 11, 2024 11:50 - 12:10 CEST
Pinzgau + Tennegau

Statistical modelling

12:10 CEST

Translate R for Global Reach - Binod Jung Bogati, Numeric Mind

Do you use R and want to help extend its global reach? Our talk on translating R is just for you! Translation involves translating R's messages, warnings, and errors from English into other languages, making it accessible to a global audience. Support for translation has been part of R since 2005, but it relies heavily on community contributions to provide the necessary translations and help keep up-to-date with changes in R. In this talk, we'll begin by providing an overview of recent efforts to facilitate community contribution, celebrating the achievements of translation teams at community events like R Project Sprint 2023 in coordination with the R Contribution Working Group and R Core Team. We'll then dive into the practical aspects of contributing to R's translations (via Weblate), including explaining how R's messages are structured and tips for translating technical terms. In addition, we'll offer valuable tips and tricks to streamline the translation process and our communication with the translation community. Join us on this exciting journey to make R accessible to all and discover how you can be a part of this global endeavor.

Speakers

Binod Jung Bogati

Associate Manager - Data Science, Numeric Mind

Thursday July 11, 2024 12:10 - 12:30 CEST
Salzburg I

Community and outreach

12:10 CEST

Seven Deadly Sins Holding You Back as a Software Developer - Pedro Silva,

In the dynamic world of R development, traditional software engineering practices offer invaluable insights and strategies for growth. Our presentation invites attendees to explore this symbiosis, offering tailored insights and practical tips for the R ecosystem. We begin by emphasizing the significance of coding standards, demonstrating how they bolster code readability and maintainability, laying a robust foundation for sustainable R projects. Drawing parallels to programming design principles and patterns, we showcase their adaptability to R's unique challenges, empowering developers to craft scalable, efficient solutions. Additionally, we underscore the importance of clear naming and rigorous testing in R programming, pivotal for fostering code clarity, reliability, and effective teamwork. Lastly, we delve into the art of project estimation, essential for efficient planning and stakeholder satisfaction in R development. By embracing these traditional software practices, R developers can unlock pathways for growth and excellence, elevating their coding skills to new heights.

Speakers

Pedro Silva

Pedro has worked as a software developer for over 15 years, focusing on R and Shiny at a enterprise level for the last 5. He has worked in many technologies over the years, and has an extensive background in backend, frontend, and developing both websites and applications. In his... Read More →

Thursday July 11, 2024 12:10 - 12:30 CEST
Attersee

Efficient programming

12:10 CEST

Improving Development Tooling with an R Grammar for Tree-Sitter - Davis Vaughan, Posit

tree-sitter is an efficient incremental parsing library that builds concrete syntax trees from source files, and is fast enough to update those trees on every keystroke. A syntax tree is a powerful tool that can serve as the basis for many IDE features, such as goto definition, syntax highlighting, code diagnostics, and code formatting. One of tree-sitter's biggest selling points is that it is general enough to parse any programming language through language specific "grammars". In this talk, we'll discuss the R grammar that we've built for tree-sitter, along with a companion R package that exposes bindings for tree-sitter itself. We'll look at how the grammar can be utilized in IDEs and R packages to empower developers with tooling that aids in writing, reading, and debugging their code, and how the companion package allows you to parse code for any language directly from the comfort of R.

Speakers

Davis Vaughan

Senior Software Engineer, Posit

Davis Vaughan is a software engineer at Posit focused on improving tooling in the tidyverse. He's one of the maintainers of core tidyverse packages such as dplyr and tidyr, along with lower level infrastructure packages like vctrs, slider, and clock.

Thursday July 11, 2024 12:10 - 12:30 CEST
Pongau + Flachgau

Interfaces with other programming languages

12:10 CEST

Optimising Your Git WorkFlow - Colin Gillespie, https://jumpingrivers.com/

Everyone(?!) uses git in their day-to-day R workflow. Very soon, pushing, pulling, cloning and forking become second nature. But once you’ve mastered the basics, what next? This talk discusses the next steps in using Git. Wouldn’t it be nice if our code was automatically formatted? Errors in our packages flagged? Packages deployed to a remote CRAN-like repository. Well, have you considered GitHub Actions? Have you started working with other data scientists? How should you set up your repo to ensure a smooth workflow? How should merge requests be handled? How do you best utilise Git issues? Is your code sensitive? Should you set up GPG keys for commits? How should you ensure your API keys remain hidden? This talk aims to point useRs in the right direction for a friction-free R workflow.

Speakers

Colin Gillespie

CTO, Jumping Rivers

Colin is a Senior Statistics lecturer at Newcastle University and is a co-founder & CTO of Jumping Rivers. He has used R for over twenty years and has been teaching R for the past fifteen years. He co-authored the O’Reilly book on Efficient R Programming.

Thursday July 11, 2024 12:10 - 12:30 CEST
Salzburg II

R workflow + deployment + production

12:10 CEST

CRAS: Cybersecurity Risk Analysis and Simulation Shiny App - Emilio L. Cano, Rey Juan Carlos University

Risk analysis and management rely on sound statistical methods and Montecarlo simulation. Performing and reporting quantitative risk has become crucial in Cybersecurity, which affects all sectors, including finance. Actually, Cybersecurity risks are included within the “operational risks” in the finance sector. The FAIR methodology has become a standard for cybersecurity risk analysis using PERT and triangular distributions to simulate loss based on experts input. However, other distributions and methods can be used for analysing risks, e.g., lognormal. In this work, we present a shiny app for simulating cybersecurity losses, allowing the user to choose whether to use the FAIR methodology, or modifications to it, such as different probability distributions, or modifications on the FAIR ontology. Statistical analysis of the simulation results are shown by means of interactive tables and plots. A quarto report can be generated automatically. The app is also useful for teaching Risk Analysis in Degree courses on cybersecurity. Future work includes adding more probability distributions, such as extreme value distributions, and publish the app in CRAN as a contributed package.

Speakers

Emilio L. Cano

Associate Professor, Rey Juan Carlos University

I’m a passionate Data Scientist, Statistician, enthusiast of the R statistical software and programming language. I am the President of the “Comunidad R Hispano” Association (Spanish R Users) and the author of the SixSigma package at CRAN. I serve as Associate Professor at Rey... Read More →

Thursday July 11, 2024 12:10 - 12:30 CEST
Wolfgangsee

Shiny + dashboards + web apps

12:10 CEST

Recalibration of Gaussian Neural Network Regression Models: The RecalibratiNN Package - Carolina Musso, Instituto de Pesquisa e Estatística do Distrito Federal

Machine learning has significantly enhanced prediction performance; however, the estimation of uncertainty in these predictions is still a challenge. This issue is particularly pronounced in Artificial Neural Networks (ANNs), where predictions often suffer from poor calibration. Although some methods are available for recalibration, choosing and implementing the appropriate one can be challenging. To address this issue, we introduce the R package recalibratiNN that provides a computational implementation of a quantile-based post-processing technique for recalibration. The current version of the package includes functions specifically designed for recalibrating Gaussian models (i.e., where the ANN was trained with the Mean Squared Error (MSE) loss function). The method can be applied at any representation layer of the network. The package is based on the technique presented in the recent study "Model-Free Recalibration of Neural Networks" (https://arxiv.org/abs/2403.05756) by the co-authors Ricardo Torres, Gabriel Reis and Guilherme Rodrigues, among other authors. It leverages information from cumulative probabilities, enabling the generation of Monte Carlo samples from the recalibrated predictive distribution and facilitating both local and global recalibration efforts. The recalibratriNN package also features diagnostic functions to help visualize miscalibration issues. It is readily available on both GitHub (https://github.com/cmusso86/recalibratiNN) and CRAN (https://cran.r-project.org/web/packages/recalibratiNN/).

Speakers

Carolina Musso

PhD, Instituto de Pesquisa e Estatística do Distrito Federal

Biologist and Statistician, with a phD in Ecology and specialization in Data Science. Working as data analysis in public service for the past seven years.

Thursday July 11, 2024 12:10 - 12:30 CEST
Pinzgau + Tennegau

Statistical modelling

12:30 CEST

Using R to Co-Create an Inclusive Data Analysis Approach with the HBCU Health Equity Data Consortium - Lois Adler-Johnson, North Carolina Institute for Public Health

From February 2023 to February 2024, the Historically Black Colleges and Universities (HBCU) Health Equity Data Consortium (HEDC) in North Carolina (NC) deployed the COVID-19 Impact Survey to address critical data gaps on the pandemic’s impact on households across NC. To provide capacity building support for wrangling, analyzing, and visualizing survey results, the NC Institute for Public Health (NCIPH) formed a Data Analysis Workgroup composed of faculty and students from all 10 universities within the HBCU HEDC. Workgroup members had a variety of preferred, mostly licensed programming languages; NCIPH selected R as the primary language as it was free and accessible. NCIPH led R trainings, compiled relevant R resources, and developed shared code for transforming raw results, descriptive statistics, and univariate regression. The group used R Markdown, Quarto, and Shiny to report results, ultimately using the output as a basis for exploratory analyses and dissemination of findings to NC communities. The facilitation of a Data Analysis Workgroup and use of free, open-source R packages and outputs can serve as an engaging framework to bolster data science education and autonomy.

Speakers

Lois Adler-Johnson

Public Health Data Scientist, North Carolina Institute for Public Health

Lois Adler-Johnson is a Data Scientist at the North Carolina Institute for Public Health who's passionate about sharing and applying her quantitative data analysis and programming skills in ways that address racial and health inequities across North Carolina. Lois has an academic... Read More →

Thursday July 11, 2024 12:30 - 12:50 CEST
Salzburg I

Data science education

12:30 CEST

Serving R with Ark: A New Jupyter Kernel for R - Lionel Henry, Posit

This last decade the landscape of integrated development environments (IDEs) has drastically changed with multi-lingual support propelled by language servers implementing protocols such as the Language Server Protocol (LSP), the Debugger Adapter Protocol (DAP), and the Jupyter protocol. This allows IDEs and notebooks to cheaply implement support for an arbitrary number of languages just by conforming to the protocol. At Posit we are fully embracing this approach of sharing functionality and we have developed the Ark Jupyter kernel with the aim to provide first class interactive development for R in a portable way. The kernel includes language and debugging servers for modern completions, code navigation, linting, refactoring, and more. This talk will provide an overview of the features we have implemented in Ark and how they can be used portably across IDEs.

Speakers

Lionel Henry

Software engineer, Posit

I'm a software developer at Posit, initially focused on low-level packages for the tidyverse, and now working on development tools for R.

Thursday July 11, 2024 12:30 - 12:50 CEST
Pongau + Flachgau

Interfaces with other programming languages

12:30 CEST

Split-Apply-Combine with Dynamic Grouping - Mark van der Loo, Statistics Netherlands

Group-wise aggregation is one of the most common operations in data analyses.. There are use cases where the grouping is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the target aggregate. Examples include cases where some of the target groups suffer from missing data, or cases where the quality of target group data is judged to be too low. Often, hierarchical classifications serve as a basis for forming larger groups, but custom 'collapsing schemes' are in use as well. In this presentation we demonstrate the R package 'accumulate' [1] that offers interfaces for defining grouped aggregation, where the grouping may be dynamically determined, based on user-defined aggregations, user-defined decision rules, and user-defined collapsing schemes. The package offers several ways to define collapsing schemes, including tabular definitions that can be maintained separately from the aggregation code. It also includes facilities to use hierarchical classifications and for testing the (possibly complex) decision rules that user can create. [1] https://cran.r-project.org/package=accumulate

Speakers

Mark van der Loo

Senior Researcher, Statistics Netherlands

Thursday July 11, 2024 12:30 - 12:50 CEST
Attersee

Numerical methods

12:30 CEST

Building Large-Scale Simulation Pipelines Using Targets, Git and GitHub Actions - Sergio Olmos, Sanofi

Innovative clinical trial designs typically involve advanced statistical methods and extensive simulations. Building these complex simulation pipelines introduces challenges in reproducibility and transparency not easily addressed by traditional development workflows. In this session we will present how to use the targets R package, a Make-like pipeline tool, to develop efficient and reproducible simulation pipelines for innovative clinical trial designs. We will then show how Git and GitHub Actions can be used to deploy these large-scale simulation pipelines to cloud computing instances/clusters. The combination of these tools results in a robust and efficient workflow, enhancing the reproducibility of complex simulation pipelines. We will provide a detailed walkthrough of our approach, complete with practical examples and best practices, making it a valuable resource for statisticians and research software engineers working on innovative clinical trial designs and beyond.

Speakers

Sergio Olmos

Statistician, Sanofi

Sergio Olmos is a statistician in Sanofi working on the implementation of innovative clinical trial designs within the Statistical Innovation Hub. He is an experienced R developer with experience building reproducible analytical pipelines and creating R packages using software engineering... Read More →

Thursday July 11, 2024 12:30 - 12:50 CEST
Salzburg II

R workflow + deployment + production

12:30 CEST

Parts Beyond Code: Crafting Sensible Statistician-Led Automation with Shiny in Pharma - Gregory Chen, MSD

The R Shiny framework empowers statisticians to create interactive apps without much excursion from a typical curriculum of R. This technique can significantly enhance internal processes across the pharmaceutical industry and potentially in other sectors, by automating tasks to boost efficiency and productivity. However, the journey from concept to integration of Shiny-based tools into business workflows involves much more than coding. Critical aspects such as user experience and seamless integration with existing processes are often underestimated but essential for the successful deployment of these tools. In this presentation, we delve into the nuances of creating value-added Shiny apps that transcend basic functionality to become integral components of business operations. By examining a recent project of ours, an automation spanning from statistical analysis planning to reporting, we highlight the pivotal elements outside of coding. These learning and thinking are summarized by four key topics (design thinking on the product and user experience, collaboration mode in a product team, verification/validation, change management), and by different phases of the product lifecycle.

Speakers

Gregory Chen

Principal Statistician, MSD

Gregory Chen, with a PhD in Statistics and 13+ years in the pharmaceutical industry, currently work as a Principal Statistician in the space of health technology assessment (HTA) in MSD, based in Switzerland. His past work spans across manufacturing, quality control, and clinical... Read More →

Thursday July 11, 2024 12:30 - 12:50 CEST
Wolfgangsee

Shiny + dashboards + web apps

12:30 CEST

Neural Network-Based Text Classification for International Standardized Codes Using R - Nina Niederhametner, Statistik Austria & Johannes Gussenbauer, Statistics Austria

International standard classifications such as ISCO (for Occupation), ISCED (for Education) and COICOP (for Consumption) serve as pivotal statistical frameworks for the organization and classification of information. In official statistical practices, adherence to these codes is essential for thorough analysis and comparison of findings. Survey respondents typically provide information in an unstructured free textual format, requiring subsequent assignment to standardized code. This process is often done manually, resulting in time-consuming laborious tasks. In our talk, we propose an approach that automates the classification of textual data into various standardized codes using simple mathematical techniques combined with neural network-based language models, utilizing the R libraries TensorFlow and Keras. Additionally, we illustrate the development of application programming interfaces (APIs) using plumber, and the deployment of our models through posit connect, establishing accessibility to a broad user base.

Speakers

Johannes Gussenbauer

Methodologist, Statistics Austria

Nina Niederhametner

Methodologist, Statistics Austria

Thursday July 11, 2024 12:30 - 12:50 CEST
Pinzgau + Tennegau

Text data and NLP

12:50 CEST

Lunch

Thursday July 11, 2024 12:50 - 14:00 CEST
TBD

Breaks + Special Events

14:00 CEST

Keynote: Awards Ceremony

Thursday July 11, 2024 14:00 - 14:15 CEST
Salzburg I + II

Keynote Sessions

Level Any

14:15 CEST

Keynote: Hilary Parker, Stoneware Data

Speakers

Dr. Hilary Parker

Independent Consultant, Stoneware Data

Dr. Hilary Parker is an independent consultant and coach based in San Francisco, often referred to as the heart of “Cerebral Valley” for its rich concentration of tech and AI research. With a career that includes pivotal roles at Stitch Fix, Etsy, and the 2020 Biden for President... Read More →

Thursday July 11, 2024 14:15 - 15:15 CEST
Salzburg I + II

Keynote Sessions

Level Any

15:15 CEST

Keynote: Closing Remarks

Thursday July 11, 2024 15:15 - 15:30 CEST
Salzburg I + II

Keynote Sessions

Level Any