useR! 2024: Full Schedule

In Person
8 - 11 July, 2024
Learn more and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for useR! 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Summer Time (UTC+02:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

The virtual program will take place on 2 July. Please see the virtual schedule page for more information.

09:00 CEST

Tutorial: Streamlining R Package Development with Github Actions Workflows - Daphne Grasselly & Pawel Rucki, Roche; Dinakar Kulkarni, Genentech [Pre-Registration Required]

GitHub Actions provide an automated workflow for continuous integration and deployment, enhancing collaboration and code quality. This tutorial aims to demystify GitHub Actions, offering insights into their fundamentals and guiding participants through the process of crafting reusable actions tailored for R package development. The tutorial begins with an overview of GitHub Actions, elucidating their role in automating software workflows and boosting productivity in the R programming ecosystem. Attendees will gain a comprehensive understanding of the basics, including syntax, triggers, and workflow components, paving the way for seamless integration into their development pipelines. Building on this foundation, the tutorial delves into the creation of reusable actions, emphasizing best practices for designing modular, versatile components. The tutorial also showcases the benefits of running both development as well as CI/CD workflows in a common Docker container environment to guarantee reproducibility. Participants will learn how to encapsulate common tasks and share them across different projects, fostering a culture of code reuse within the R community.

Pre-requisites:
1. Create a GitHub account.
2. (Optional) Have Git and SSH installed on your computer, and have an SSH key ready.

Registration:
To add this tutorial to your registration, log in to your existing registration, click the Modify Registration button, and navigate to the Reg Options page (page 4). Select the tutorial you want to attend.

Speakers

Franciszek Walkowiak

Senior IT Professional at Roche, Roche

DevOps engineer with 4 years of experience in the pharmaceutical industry. I have worked with Amazon Web Services, Google Cloud Platform, and infrastructure as code practices. Currently, I support teams of R software developers by employing DevOps practices and tools such as GitLab... Read More →

Daphne Grasselly

Senior Data Scientist - Roche, Roche

I am currently working at Roche as a Senior Data Scientist. My main focus is on enhancing automation workflows for efficient package delivery, particularly in the realm of R development within the pharmaceutical industry. I am passionate about optimizing processes and improving code... Read More →

Pawel Rucki

Ms, Roche

Pawel graduated in 2015 from University of Warsaw, Econometrics and Quantitative Economics. Working with R for almost 10 years now, Pawel applied it in the field of geospatial data analysis, credit risk assessment, financial provisions calculation and clinical trial data analysis... Read More →

Monday July 8, 2024 09:00 - 12:30 CEST
Wolfgangsee

R workflow + deployment + production, Tutorial

14:00 CEST

Tutorial: Tidy Time Series Analysis and Forecasting - Mitchell O'Hara-Wild, Nectric [Pre-Registration Required]

Organisations of all types collect vast amounts of time series data, and there is a growing need for time series analytics to understand how things change in our fast-moving world. This tutorial provides a practical introduction to time series analytics and forecasting using R, utilising the tidyverse and tidy time series tools to enable analysis across many time series. Attendees will learn about commonly seen time series patterns, and how to find them with specialised time series graphics created with ggplot2. Then we will use fable to capture these patterns with statistical time series models, and produce probabilistic forecasts. Finally, participants will gain insights into evaluating model performance, ensuring the accuracy and reliability of their forecasts. Through a combination of foundational concepts and practical demonstrations, this tutorial equips participants with the skills to extract meaningful insights from time series data for informed decision-making in various domains.

Please carefully read through the information at this link to prepare for the tutorial: https://workshop.nectric.com.au/user2024

Registration:
To add this tutorial to your registration, log in to your existing registration, click the Modify Registration button, and navigate to the Reg Options page (page 4). Select the tutorial you want to attend.

Speakers

Mitchell O'Hara-Wild

Data Scientist, Nectric

Mitchell O’Hara-Wild (he/him) is a PhD candidate at Monash University, creating new techniques and tools for forecasting large collections of time series with Rob Hyndman and George Athanasopoulos. He is the lead developer of the tidy time-series forecasting tools fable and feasts... Read More →

Monday July 8, 2024 14:00 - 17:30 CEST
Wolfgangsee

Predictive modelling and forecasting, Tutorial

11:00 CEST

{Mmrm}: a Robust and Comprehensive R Package for Implementing Mixed Models for Repeated Measures - Daniel Sabanés Bové, RCONIS

Mixed models for repeated measures (MMRM) analysis has been extensively used to analyze longitudinal datasets. SAS has been the gold standard for this analysis in the past, and so far R packages fall short for one of the following reasons: model convergence issues, unavailability of covariance structures or adjusted degrees of freedom, or numerical results being far from SAS. To fill in this important gap in the open-source statistical software landscape, a cross-company workstream of openstatsware.org has developed the new {mmrm} R package. A critical advantage of {mmrm} over existing implementations is that it is faster and converges more reliably. It also provides a comprehensive set of features: users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjusted degrees of freedom, extract the least square means estimates using the emmeans package, and use tidymodels for easy model fitting. We introduce the modeling framework, the implementation strategy and discuss open source collaboration as a critical ingredient to success.

Speakers

Daniel Sabanés Bové

Ph.D., RCONIS

Daniel Sabanés Bové studied statistics and obtained his PhD in 2013. He started his career with 5 years in Roche as a biostatistician, then worked 2 years at Google as a Data Scientist, before rejoining Roche in 2020, where he founded and led the Statistical Engineering team. Daniel... Read More →

Tuesday July 9, 2024 11:00 - 11:20 CEST
Wolfgangsee

Statistical modelling

11:20 CEST

Statistical Computing with Vectorised Distributions - Mitchell O'Hara-Wild, Nectric

The uncertainty of model outputs is often absent or hidden in R, and tools for interacting with distributions are limited. For example, most prediction methods in R only produce point predictions by default. Although it is possible to obtain other parameters and form the complete distribution, additional knowledge about the distribution's shape and properties are needed. The distributional package vastly simplifies creating and interacting with distributions in R. The package provides vectorised distributions and supports the calculation of various statistics without needing to use shape-specific d*/p*/q*/r* functions. Statistics can be easily calculated for distributions in the same vector, regardless of shape. Manipulating distributions is also supported, including applying transformations, inflating values, truncating, and creating mixtures of distributions. When vectors are stored as data frame columns, these operations integrate seamlessly with tidyverse workflows. Distributions can also be visualised with ggplot2 using the ggdist extension package, which offers many graphical representations of uncertainty.

Speakers

Mitchell O'Hara-Wild

Data Scientist, Nectric

Tuesday July 9, 2024 11:20 - 11:40 CEST
Wolfgangsee

Statistical modelling

11:40 CEST

Bayesian Modeling of Panel Data with R Package Dynamite - Santtu Tikka, University of Jyväskylä

In this talk, I will present dynamite: an R package for Bayesian inference of intensive panel data comprising of multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, and latent factors via the dynamic multivariate panel model (DMPM) framework. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of both individual-level and summarized predictions and offers a comprehensive suite of tools for visualization and model diagnostics. I will demonstrate how the package can be used to estimate long-term causal effects of interventions with proper accounting for the uncertainties in these estimates.

Speakers

Santtu Tikka

Senior Lecturer, University of Jyväskylä

Santtu Tikka is a senior lecturer at the Department of Mathematics and Statistics, University of Jyväskylä. His research focuses on causal inference and Bayesian modeling. He is also the author of several R packages such as causaleffect, dosearch and dynamite.

Tuesday July 9, 2024 11:40 - 12:00 CEST
Wolfgangsee

Statistical modelling

12:00 CEST

Regression for Compositions: Logit Models Versus Log-Ratio Transformation of Data - David Firth, University of Warwick

Compositional data analysis is growing rapidly in modern application areas, e.g. microbiome analysis, time-use studies and archaeometry to name a few. The dominant statistical methods use work of J Aitchison from the 1980s, where log-ratio transformation of data (compositional measurements) was developed as the key to use of standard multivariate methods with composition data. I outline some difficulties with log-ratios, and show that the assumptions underpinning log-ratios actually lead to a simple variance-covariance function that is readily used to build appropriate GLMs with logit link. The theory is in [1], but emphasis here will be on practice, and working in R. A key focus will be examples from in-development R package _compos_, which implements the new approach in the style of standard `lm` or `glm` methods. The new `colm` model class in _compos_ not only mimics R's `lm` and `glm` classes, it also neatly uses `lm` internally in a robust fitting algorithm (similar to the "Poisson trick", used for fitting multinomials with `glm`). The package is under open-source development [2] for CRAN submission by June 2024. [1] arxiv.org/abs/2312.10548 [2] github.com/DavidFirth/compos

Speakers

David Firth

Professor, University of Warwick

An academic statistician with wide-ranging applied interests. Open-source enthusiast.

Tuesday July 9, 2024 12:00 - 12:20 CEST
Wolfgangsee

Statistical modelling

13:20 CEST

Predictors Optimization for Sensory Profiles Modelling Based on Electronic Signals - Jean-Vincent Le Bé, Nestlé Research

The sensory profiles of coffee products can be generated by using predictive models with inputs from specific electrodes immersed in the liquid coffee. The electrical signals are generated by molecules diffusing through selective bio-polymers and reaching electrodes. These signals are time-series data, and various features are calculated from them (such as transient peaks or steady-state averages). These features are then used to train models for the prediction of sensory profiles or proximity to a given reference product. Six features are calculated for each of the 15 pairs of electrodes resulting in 90 variables for the classification model (proximity to a reference) or the regression model (sensory profile). These variables can be correlated to a certain extent and lead to over-fitting or unnecessary recording. This work presents a method for reducing the number of variables to a minimum relevant set using clustering and random forest variable importance. It shows that a proper selection results in a more robust model across different experimental batches.

Tuesday July 9, 2024 13:20 - 13:25 CEST
Wolfgangsee

Machine learning and AI, Lightning Talk

13:25 CEST

Latent Transition Analysis Using R - Christian Ritz, University of Southern Denmark

Latent transition analysis (LTA) is a useful statistical modelling approach for identifying subgroups ("latent classes") and describe transitions between subgroups over time. Subgroups may be characterized in terms of prevalence at each time point and through transition probabilities capturing the likelihood of transition from one subgroup at one time point to another (or the same) subgroup at another time point. Investigating predictors of transition between subgroups is often of key interest. Currently, LTA is mostly carried out using commercial and specialized software, but not by means of open source statistical software. This talk will show that there exist a flexible and modular approach for LTA using R.

Speakers

Christian Ritz

Dr., University of Southern Denmark

Christian Ritz is a professor of statistics and epidemiology. He has been using R for many years for a wide range of statistical analyses in biology, environmental science, medicine, nutrition epidemiology, physiology, and toxicology. He is the developer of the R package "drc".

Tuesday July 9, 2024 13:25 - 13:30 CEST
Wolfgangsee

Statistical modelling, Lightning Talk

13:30 CEST

A Comprehensive List of Normality Tests in R - Fernando Corrêa, Curso-R

Goodness-of-fit remains a cornerstone in statistical analysis, with normality testing standing as a pivotal subset. With numerous solutions available, each boasting distinct characteristics in power, type I error control, and theoretical underpinnings, the landscape can be daunting to navigate. Moreover, the proliferation of R packages, encompassing both classical Fortran implementations and modern alternatives, adds another layer of complexity. In this presentation, we embark on a comprehensive journey through the normal goodness-of-fit tests. We will meticulously list, demonstrate, and critically assess these implementations, juxtaposing old-school methodologies with their contemporary counterparts. From the Shapiro-Wilk to the Anscombe-Glynn test, we'll explore a plethora of tests, shedding light on computacional performance, recent advancements and key differences in interface design.

Speakers

Fernando Corrêa

Mr, Curso-R

Bachelor's degree and master's student in Statistics at IME-USP. Former Technical Director at the Brazilian Association of Jurimetrics, partner at the lawtech Terranova Jurimetrica, and currently works as a data science consultant at R6 Consultancy. Uses R for everything, but has... Read More →

Tuesday July 9, 2024 13:30 - 13:35 CEST
Wolfgangsee

Statistical modelling, Lightning Talk

13:35 CEST

From SPSS to R in Social Sciences - Juan Claramunt, Leiden University

In this session, we will discuss the transition from SPSS to R that we made at the Methodology and Statistics unit of the Institute of Psychology of Leiden University.
With this talk, we want to inspire and help other users to introduce R in their Social Sciences bachelors replacing other common software such as SPSS, as well as making connections with other R users willing to improve Data Science/Statistics education in Social Sciences.
We will introduce the main reasons for the change: alignment research-education, didactical purposes, incentivize good research practices (open science, reproducibility), and better career perspectives.
Afterwards, a description of the implementation of R in our bachelor program will be outlined. Here, we aim to provide tips on how to transition from SPSS to R in a social sciences environment, including the experience of other R users working on related education.
We finalize with a summary of the quantitative and qualitative results we have obtained during the first R year.
We anticipate the transition to R to greatly impact our students' future research practices, helping to solve issues such as the reproducibility crisis.

Speakers

Juan Claramunt

Specialist in sciencific information, Leiden University

Bachelor in Mathematics at Universidad de Cantabria, Utrecht University & Brown University.Master in Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences, & European Master in Official Statistics (Utrecht University).Scientific information specialist at... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Wolfgangsee

Social sciences, Lightning Talk

13:40 CEST

R for Exploring Spatial Data: Lightning Overview - Siddharth Gupta, University of Potsdam

I will talk about different map/spatial related libraries in R, their comparison and overview using personal experiences and (hopefully) fun examples!

Speakers

Siddharth Gupta

PhD student in Cognitive Science, University of Potsdam

I am a PhD student at the University of Potsdam, working in the domain of psycholinguistics. Besides, I am interested in NLP, linguistics, behavioral economics and Deep Learning. When I am not consumed with college work, I post YouTube videos, create Discord bots, write Twitter threads... Read More →

Tuesday July 9, 2024 13:40 - 13:45 CEST
Wolfgangsee

Data visualisation, Lightning Talk

13:45 CEST

Regression Models for [0, 1] Responses Using Betareg and Crch - Achim Zeileis, Universität Innsbruck

In this presentation we show how to model data from the closed unit interval [0, 1] using extended-support beta regression and heteroscedastic two-limit tobit models. In contrast to zero- and/or one-inflated beta regression, both approaches only require estimation of a single latent process that captures both the distribution of the inner observations and the point masses for observations on the boundaries at 0 and/or 1. The heteroscedastic two-limit tobit model does so by fitting a Gaussian distribution censored at 0 and 1 which is conveniently available in the R package "crch". Extended-support beta regression has recently been proposed and implemented in the development version of the "betareg" package. It contains both classic beta regression and heteroscedastic two-limit tobit as special cases, shifting between the two with just one additional parameter. Both approaches are illustrated by modeling reading accuracy scores of children and investments in an economic loss aversion experiment, respectively, discussing the models' relative (dis)advantages.

Speakers

Achim Zeileis

Professor of Statistics, Universität Innsbruck

Achim Zeileis is Professor of Statistics at the Faculty of Economics and Statistics at Universität Innsbruck. Being an R user since version 0.64.0, Achim is co-author of a variety of CRAN packages such as zoo, colorspace, party(kit), sandwich, or exams. In the R community he is active... Read More →

Tuesday July 9, 2024 13:45 - 13:50 CEST
Wolfgangsee

Statistical modelling, Lightning Talk

14:10 CEST

How to Intepret Statistical Models Using the `Marginaleffects` Package for R - Vincent Arel-Bundock, Université de Montréal

The parameters of a statistical model can sometimes be difficult to interpret substantively, especially when that model includes non-linear components, interactions, or transformations. Analysts who fit such complex models often seek to transform raw parameter estimates into quantities that are easier for domain experts and stakeholders to understand. This article presents a simple conceptual framework to describe a vast array of such quantities of interest, which are reported under imprecise and inconsistent terminology across disciplines: predictions, marginal predictions, marginal means, marginal effects, conditional effects, slopes, contrasts, risk ratios, etc. This presentation introduces marginaleffects, a package for R which offers a simple and powerful interface to compute all of those quantities, and to conduct (non-)linear hypothesis and equivalence tests on them. marginaleffects is lightweight; extensible; it works well in combination with other R packages; and it supports over 100 classes of models, including Linear, Generalized Linear, Generalized Additive, Mixed Effects, and Bayesian.

Tuesday July 9, 2024 14:10 - 14:30 CEST
Wolfgangsee

Statistical modelling

14:30 CEST

Vital: Tidy Data Analysis for Demography - Rob Hyndman, Monash University

I will introduce the vital package which allows analysis of demographic data using tidy tools. The package uses a variation of tsibble objects as the main data class, so all of the infrastructure available for tsibble and tibble objects can also be used with vital objects. Data may include births, deaths, mortality, fertility, population and migration data. Functions for plotting, smoothing, modelling and forecasting data are included. Models include the classical Lee-Carter model as well as functional data models. Future plans include replicating all of the models available in the demography and StMoMo packages. The package is currently available at https://pkg.robjhyndman.com/vital/. It will be on CRAN before the UseR!2024 conference.

Speakers

Rob Hyndman

Professor, Monash University

Rob J Hyndman is well-known for his many R packages including forecast, demography and fable. He is a Fellow of both the Australian Academy of Science and the Academy of Social Sciences in Australia, and the author of over 200 research papers and 5 books. He has won many awards, including... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Wolfgangsee

Statistical modelling

15:10 CEST

Yes, You Can Simulate! Reproducible, Tidy Simulation Workflows with the Reimagined Simpr Package - Ethan Brown, Fulbright University Vietnam

The simpr package was designed from the ground up to make simulation -- an invaluable tool for understanding statistical models both for students and professionals -- easier to use in R. Recent updates to simpr make simulation easier than ever, allowing a full workflow to be concisely specified in a single tidy pipeline inspired by the infer package without the need for creating external functions, global values, or using loops. This pipeline includes specifying data-generating processes, defining and varying design parameters, generating many simulated datasets, fitting models, and consolidating model results. New features include reproducibility of individual simulation datasets/results (without needing to run the entire pipeline again), parallel processing support, flexible bulk data-munging options across multiple simulated datasets, advanced error-handling options, and more. The presentation will compare simpr with other approaches for simulation in R and show applications of simpr for assessing study designs (e.g. power analysis), performing simulation studies, and teaching statistics.

Speakers

Ethan Brown

Joint Faculty in Social Studies and Psychology, Fulbright University Vietnam

Ethan C. Brown has a joint appointment in Social Studies and Psychology at Fulbright University Vietnam. An enthusiastic member of the R and open science communities, his goal is to build on cognitive science and educational research to make statistics, and critical awareness of statistics... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Wolfgangsee

Statistical modelling

15:30 CEST

Flexible Multidimensional Scaling with the R Packages Smacofx, Cops and Stops - Thomas Rusch, WU Vienna University of Economics and Business & Patrick Mair, Harvard University

Multidimensional scaling (MDS) refers to methods that fit distances in a reduced space so that they optimally approximate given proximities between objects. Flexibility in modelling with MDS can be introduced by allowing for various transformations of the input proximities and/or the fitted distances, or by penalization. We will present three new R packages for flexible multidimensional scaling (fMDS) that allow to fit metric and nonmetric versions of fMDS including Power Stress MDS, Sammon Mapping, Elastic Scaling, Multiscale MDS, Box-Cox MDS, Local MDS or Cluster Optimized Proximity Scaling. Optimal structure-based hyperparameter selection of transformation parameters within the Structure Optimized Proximity Scaling framework can also be carried out. The packages offer a broad array of post-fit infrastructure for plotting MDS results, exploration of local minima, and uncertainty estimation. In the latter they follow the smacof design philosophy and are fully compatible with the smacof package, all packages together comprising the "smacofverse".

Speakers

Patrick Mair

Dr., Harvard University

Senior Lecturer in Statistics

Thomas Rusch

Dr, WU Vienna University of Economics and Business

Thomas Rusch is Assistant Professor at the Competence Center for Empirical Research Methods at WU Vienna University of Economics and Business. His research interests includes exploratory data analysis, data mining, unsupervised statistical learning, psychometrics and computational... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Wolfgangsee

Statistical modelling

11:30 CEST

Explanation Groves - Gero Szepannek, Stralsund University of Applied Science

The increasing popularity of machine learning in many application fields has increased the demand in methods of explainable machine learning as eg provided by the packages DALEX (Biecek, 2018) and iml (Molnar, 2018). In turn, comparatively few research has been dedicated to the limits of explaining complex machine learning models (Rudin, 2019, Szepannek and Lübke, 2022). Explanation groves (Szepannek and v. Holt, 2024) are presented as a tool to extract a set of understandable rules for explanation of arbitrary machine learning models. The degree of complexity of the resulting explanation can defined be the user. This allows to analyze the trade off between the complexity of a given explanation and how well it represents the original model. The corresponding R package xgrove (Szepannek, 2023) is demonstrated. Biecek P (2018). https://jmlr.org/papers/v19/18-416.html Molnar C, Bischl B, Casalicchio G (2018). doi:10.21105/joss.00786 Rudin, C (2019). doi:10.1038/s42256-019-0048-x Szepannek G (2023). https://CRAN.R-project.org/package=xgrove Szepannek, G, v. Holt, B (2024). doi:10.1007/s41237-023-00205-2 Szepannek, G, Lübke, K (2022). doi:10.1007/s13218-022-00764-8

Wednesday July 10, 2024 11:30 - 11:50 CEST
Wolfgangsee

Machine learning and AI

11:50 CEST

Generative Modelling of Mixed Tabular Data with the R Package ‘Arf’ - Jan Kapar, Leibniz Institute for Prevention Research and Epidemiology - BIPS

Generative machine learning has gained world-wide attention and, especially since the rise of ChatGPT and DALL-E, has started to become an integral tool both in business and everyday life. While the hype has mainly focused on text, image, audio and video synthesis so far, generative modelling of mixed tabular data with both continuous and categorical variables has great unexploited potential in many research fields and industry applications. However, recent attempts to adapt the existing, mainly deep learning-based methods to this more general setting have not shown the same overwhelming successes yet. We present the CRAN package ‘arf’, an easy-to-use implementation of adversarial random forests based on ‘ranger’, which has shown the ability to match and often outperform current deep learning approaches in terms of performance, tuning efforts and runtime, also on small or high dimensional data. ‘arf’ provides tools for both synthetic data generation and density estimation. Optional conditioning on events further extends the possible area of application, enabling for use cases like missing data imputation, data balancing and augmentation.

Speakers

Jan Kapar

M. Sc., Leibniz Institute for Prevention Research and Epidemiology - BIPS

since 2022: Doctoral Student / Research Fellow in Machine Learning, Faculty for Mathematics and Computer Science, Universität Bremen, and Leibniz Institute for Prevention Research and Epidemiology - BIPS 2011 - 2016: B.Sc Mathematics and M.Sc. Business Mathematics, Julius-Maxmilians-Universität... Read More →

Wednesday July 10, 2024 11:50 - 12:10 CEST
Wolfgangsee

Machine learning and AI

12:10 CEST

Mlr3torch - Deep Learning in R - Sebastian Fischer & Martin Binder, LMU Munich

mlr3torch is a high level deep learning framework for the mlr3 ecosystem designed to easily build, train, and evaluate neural networks in a few lines of code. It leverages the torch package, which is an R interface to the LibTorch C++ library. On the one hand, the package comes with predefined and easy-to-use neural network architectures, both for classification and regression. On the other hand, it defines a language that allows to easily define custom, fully parameterized neural networks. Because the package is integrated into the mlr3 ecosystem, these neural networks can be easily benchmarked, tuned, or combined with other machine learning workflows such as preprocessing or stacking. While mlr3’s focus is tabular data, mlr3torch extends this to other modalities such as images or text, by defining a new data type: the 'lazy_tensor' . This type can be treated similarly to standard vectors and can, e.g., be preprocessed without requiring the data to be stored in-memory. This presentation will give an overview of mlr3torch's features and demonstrate its application in both research and practical machine learning scenarios. https://github.com/mlr-org/mlr3torch

Speakers

Sebastian Fischer

MSc., LMU Munich

Sebastian Fischer has a Bachelors degree in Philosophy & Economics from the University of Bayreuth and a Masters degree in Statistics from LMU Munich. He is currently doing a PhD at LMU Munich under the supervision of Prof. Dr. Bernd Bischl and is working on the MaRDI project (Mathematical... Read More →

Wednesday July 10, 2024 12:10 - 12:30 CEST
Wolfgangsee

Machine learning and AI

15:00 CEST

Quantile Additive Modelling on Large Data Sets Using the Qgam R Package - Benjamin Griffiths, University of Bristol

The qgam R package is an extension of the mgcv package, offering methods for building and fitting quantile additive models (QGAMs), which do not make any parametric assumption on the distribution of the response variable. While QGAMs make fewer assumption than standard GAMs, they are slower to fit due to the cost of selecting the so-called “learning-rate”. The longer fitting time is particularly problematic when handling large data sets and complex models. This talk focuses on the development of new Big Data methods for QGAMs (and on their implementation in the qgam package) which much alleviate this issue. In particular, we will show that the new methods lead to a significant decrease in computational time and to much lower memory requirements, but do not affect the accuracy of the fitted quantiles. While we will demonstrate the methods on regional solar production modelling, they are useful in a wide range of industrial and scientific applications.

Speakers

Benjamin Griffiths

Mr, University of Bristol

3rd year PhD student in COMPASS CDT of the University of Bristol, sponsored by Électricité de France. Research interests lie in developing scalable fitting methods for quantile and loss-based GAMs, and on their implementation in open-source software.

Wednesday July 10, 2024 15:00 - 15:20 CEST
Wolfgangsee

Predictive modelling and forecasting

15:20 CEST

MissForestPredict - Missing Data Imputation in Prediction Settings - Elena Albu, KU Leuven, Belgium

Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occurs at model development and at prediction time. The newly released missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a convergence criterion (unified for continuous and categorical variables and based on the out-of-bag error) is met. The imputation models are saved for each variable and iteration and can be applied later to new observations. The missForestPredict package offers extended error monitoring, control over variables used in the imputation and custom initialization. This allows users to tailor the imputation to their specific needs. The missForestPredict algorithm is further compared to mean/mode imputation, k-nearest neighbours, bagging and two iterative algorithms (miceRanger and IterativeImputer) on 8 simulated datasets with simulated missingness and 8 public datasets using different prediction models. missForestPredict provides satisfactory results within short computation times.

Speakers

Elena Albu

Ms., KU Leuven, Belgium

During her career in healthcare IT and data science, she worked with Electronic Health Record (EHR) data and gained knowledge on medical workflows. In 2019, she earned her Master of Science in Statistical Data Analysis at the University of Ghent, Belgium. During this program, her... Read More →

Wednesday July 10, 2024 15:20 - 15:40 CEST
Wolfgangsee

Predictive modelling and forecasting

15:40 CEST

Tsdataleaks: Tool to Detect Data Leaks in Large Time Series Collections in Forecasting Competitions - Dr. Thiyanga Talagala, University of Sri Jayewardenepura

Large-scale time series forecasting competitions are excellent platforms for fostering innovation and advancing the field of time series analysis. One of the most frequent problems that arises in forecasting competitions is data leakage. Data leaks can happen when the training period values contain information about the test period values. There are a variety of different ways that data leaks can occur with time series data. For example: i) randomly chosen blocks of time series are concatenated to form a new time series; ii) scale-shifts; iii) repeating patterns ; iv) addition of white noise; v) modified scales; vi) temporal aggregations: for example, create monthly series using daily series, etc. The tsdataleaks package provides a simple and computationally efficient algorithm to exploit data leaks in time series data. The tsdataleaks package is available on CRAN.

Speakers

Thiyanga S. Talagala

Dr, University of Sri Jayewardenepura

I am a senior lecturer in the Department of Statistics, Faculty of Applied Sciences, at the University of Sri Jayewardenepura, Sri Lanka. I received my PhD in statistics from Monash University. I am a co-founder and co-organizer of R Ladies-Colombo, Sri Lanka. I am also serving as... Read More →

Wednesday July 10, 2024 15:40 - 16:00 CEST
Wolfgangsee

Predictive modelling and forecasting

11:30 CEST

Crafting Intuitive Spatial Select Fields with ReactJS, R, and Nivo Library - Anastasiia Kostiv, esqLABS GmbH

"Unleash the Power of Spatial Visualization: Exploring ReactR, Nivo, and NivoR" Join us for an electrifying session at UseR2024, where we'll dive into the dynamic world of spatial data visualization like never before! Delve into the cutting-edge capabilities of the ReactR package as we showcase its ability to create breathtaking UI experiences using React libraries. Get ready to be spellbound as we unveil the secrets of intuitive spatial data filtering and selection, leveraging the unparalleled features of Nivo widgets. But that's not all! Brace yourself for the unveiling of NivoR, a groundbreaking collaboration of Shiny, ReactR, React.js, and the Nivo framework. Witness the fusion of art and technology as we demonstrate how NivoR pushes the boundaries of possibility in data visualization, offering an immersive journey into the heart of interactive spatial exploration. Join us at UseR2024 for a session that promises to ignite your imagination, elevate your understanding, and leave you inspired by the endless possibilities of spatial visualization!" Be ready to test the nivoR package during the talk!

Speakers

Anastasiia Kostiv

Senior Software Developer, esqLABS GmbH

Experienced Senior Software Engineer and Data Analyst with a diverse background in engineering, web development, and data science. Successfully implements advanced techniques in healthcare and data management. Creator of shinycalendar and nivoR packages. Passionate about addressing... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Wolfgangsee

Shiny + dashboards + web apps

11:50 CEST

Template for Engaging Quiz with GenAI Response - Lynna Jirpongopas, Advanced Micro Devices Inc.

Let R Shiny and shinysurveys do the heavy lifting of creating a web application to test generative AI use cases or concepts. The use case we are exploring here is a Career Archetype Quiz. The quiz aims to provide career insights and advice to the app user. This session provides a template for creating a quiz while leveraging the capabilities of AI-generated responses. Ultimately this quiz template can be used for other types topics and personalized responses.

Speakers

Lynna Jirpongopas

Data Scientist, Advanced Micro Devices Inc.

Lynna has comprehensive background in data science within high-tech environments. She's a math graduate from the University of California, San Diego. Recently, she has expanded her expertise by completing a Professional Certificate in Artificial Intelligence from Stanford University... Read More →

Thursday July 11, 2024 11:50 - 12:10 CEST
Wolfgangsee

Shiny + dashboards + web apps

12:10 CEST

CRAS: Cybersecurity Risk Analysis and Simulation Shiny App - Emilio L. Cano, Rey Juan Carlos University

Risk analysis and management rely on sound statistical methods and Montecarlo simulation. Performing and reporting quantitative risk has become crucial in Cybersecurity, which affects all sectors, including finance. Actually, Cybersecurity risks are included within the “operational risks” in the finance sector. The FAIR methodology has become a standard for cybersecurity risk analysis using PERT and triangular distributions to simulate loss based on experts input. However, other distributions and methods can be used for analysing risks, e.g., lognormal. In this work, we present a shiny app for simulating cybersecurity losses, allowing the user to choose whether to use the FAIR methodology, or modifications to it, such as different probability distributions, or modifications on the FAIR ontology. Statistical analysis of the simulation results are shown by means of interactive tables and plots. A quarto report can be generated automatically. The app is also useful for teaching Risk Analysis in Degree courses on cybersecurity. Future work includes adding more probability distributions, such as extreme value distributions, and publish the app in CRAN as a contributed package.

Speakers

Emilio L. Cano

Associate Professor, Rey Juan Carlos University

I’m a passionate Data Scientist, Statistician, enthusiast of the R statistical software and programming language. I am the President of the “Comunidad R Hispano” Association (Spanish R Users) and the author of the SixSigma package at CRAN. I serve as Associate Professor at Rey... Read More →

Thursday July 11, 2024 12:10 - 12:30 CEST
Wolfgangsee

Shiny + dashboards + web apps

12:30 CEST

Parts Beyond Code: Crafting Sensible Statistician-Led Automation with Shiny in Pharma - Gregory Chen, MSD

The R Shiny framework empowers statisticians to create interactive apps without much excursion from a typical curriculum of R. This technique can significantly enhance internal processes across the pharmaceutical industry and potentially in other sectors, by automating tasks to boost efficiency and productivity. However, the journey from concept to integration of Shiny-based tools into business workflows involves much more than coding. Critical aspects such as user experience and seamless integration with existing processes are often underestimated but essential for the successful deployment of these tools. In this presentation, we delve into the nuances of creating value-added Shiny apps that transcend basic functionality to become integral components of business operations. By examining a recent project of ours, an automation spanning from statistical analysis planning to reporting, we highlight the pivotal elements outside of coding. These learning and thinking are summarized by four key topics (design thinking on the product and user experience, collaboration mode in a product team, verification/validation, change management), and by different phases of the product lifecycle.

Speakers

Gregory Chen

Principal Statistician, MSD

Gregory Chen, with a PhD in Statistics and 13+ years in the pharmaceutical industry, currently work as a Principal Statistician in the space of health technology assessment (HTA) in MSD, based in Switzerland. His past work spans across manufacturing, quality control, and clinical... Read More →

Thursday July 11, 2024 12:30 - 12:50 CEST
Wolfgangsee

Shiny + dashboards + web apps