useR! 2024: Full Schedule

In Person & Virtual
8 - 11 July, 2024
Learn more and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for useR! 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Time (UTC+1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

11:00 CEST

Past, Present, and Future of Data.Table - Tyson Barrett, Highmark Health

This talk will walk through the past, present, and future of the data.table package. The timing of this talk is particularly important as changes to the governance of the package aimed at providing a solid foundation for long-term maintenance of the package have recently been approved. As a leading data wrangling and cleaning package in the R ecosystem, the goals of the new governance is to create a broader community that can more easily engage with the development of the package and find support for its use.

Speakers

Tyson Barrett

Manager Research Analytics and Enablement, Highmark Health

Tyson Barrett, PhD is the current data.table maintainer working with a talented team of developers and a wonderful development community. During his day job, he works with a team of researchers at Highmark Health, a healthcare organization, to improve healthcare outcomes, costs, and... Read More →

Tuesday July 9, 2024 11:00 - 11:20 CEST
Pinzgau + Tennegau

R workflow + deployment + production

11:20 CEST

WebR, and the Future of Building Web Applications with R - Colin Fay, ThinkR

One of the great joys of being a software engineer is that things keep moving. New technologies, new languages, new frameworks, every now and then new things are emerging that are changing the way we build software. In the past couple of years in the R world, we've been building and deploying web apps and API in a pretty stable way: building {shiny} app with frameworks like {golem} or {rhino}, API with {plumber}, and sending them to a server that can launch R and make our R code available to the world. In the past months, something new has emerged: webR, a version of R compiled for WebAssembly (WASM), allowing to run R in the browser and un NodeJS, with no need for an R installation. This opened a lot of new doors, JavaScript being the tool of choice when it comes to building web apps and API. In this talk, Colin will start by explaining what webR is and how it will change the way we think about building and deploying R code on the web. He will present `webrcli` and `spidyr`, two tools for creating NodeJS apps that can call R code via webR. And finally, Colin will also focus on the challenges that will arise with `webR`, and how we'll build web apps with R in the future.

Speakers

Colin Fay

Lead Developer at ThinkR, ThinkR

Colin FAY is a lead developer at ThinkR, a french agency of R experts. During the day, he helps companies by building tools and deploying infrastructure. His main areas of expertise are data & software engineering, web applications (frontend and backend), and R in production. During... Read More →

Tuesday July 9, 2024 11:20 - 11:40 CEST
Pinzgau + Tennegau

R workflow + deployment + production

11:40 CEST

Building Bilingual Bridges with Multilingual Manuals - Elio Campitelli, Universidad de Buenos Aires

The vast majority of packages are documented in English, due to the language's status as de-facto lingua franca. But what if your package is designed with a specific demographic in mind that could be better served by documentation in another language? Non-English documentation would make your package more accessible to them at the expense of isolating it from the wider international community. But... ¿por qué no los dos? The rhelpi18n package adds support for multilingual documentation in R so you can have the best of both worlds. Package authors or community projects can create translation modules that users can install to access documentation in their languages directly from R. The talk will include a high-level view of how this package extends R help system and will explain how people can create, install and use translation modules for R packages.

Speakers

Elio Campitelli

Lic, Universidad de Buenos Aires

I’m a PhD student in atmospheric sciences at the Centre for Ocean and Atmospheric Research, where I study the atmospheric circulation in the Southern Hemisphere and how it affects the weather in South America. I’m also the maintainer for several R packages and give courses.

Tuesday July 9, 2024 11:40 - 12:00 CEST
Pinzgau + Tennegau

R workflow + deployment + production

12:00 CEST

Systems Integration Tests for R Package Cohorts - Franciszek Walkowiak, Roche

One of the challenges for R developers is ensuring that their packages work correctly on an ever-increasing number of operating systems, platforms, and R versions. To aid in this endeavor, we are introducing two tools: Locksmith and Scribe. Their task is to install a cohort of R packages, along with all dependencies, and test the cohort on any kind of system. Locksmith resolves all dependencies of the cohort using provided package repositories and saves the list of all package versions and repositories to a snapshot. Scribe utilizes the snapshot to download, build, install, and check the packages in an efficient and reproducible manner. An older snapshot of packages can be restored by scribe on a new system to check for any compatibility issues. Both tools are written in Go, making their binaries easily buildable and distributable for different systems and platforms. Go also simplifies concurrent package installation and checking, significantly reducing execution time. As a result, package cohort testing can be performed frequently for various systems, allowing developers to quickly assess the overall health of their packages.

Speakers

Franciszek Walkowiak

Senior IT Professional at Roche, Roche

DevOps engineer with 4 years of experience in the pharmaceutical industry. I have worked with Amazon Web Services, Google Cloud Platform, and infrastructure as code practices. Currently, I support teams of R software developers by employing DevOps practices and tools such as GitLab... Read More →

Tuesday July 9, 2024 12:00 - 12:20 CEST
Pinzgau + Tennegau

R workflow + deployment + production

13:20 CEST

Shinydraw: Quickly Wireframe Shiny Apps in Excalidraw - Michael Page, cynkra

Wireframing Shiny apps is a time consuming process, typically involving proprietary tools. This often results in the wireframing stage of development coming with increased costs, or being skipped entirely. One alternative has been to use Excalidraw, an open-source virtual whiteboard for sketching hand-drawn like diagrams through a simple and intuitive graphical user interface. But, to date, the use of Excalidraw has still been a time intensive process requiring individual Shiny components to be drawn from scratch. That was, until now. Enter shinydraw: an Excalidraw library that offers pre-drawn Shiny components including inputs, outputs, theming, and more. Drawing wireframes with shinydraw is as simple as loading the library and then dragging-and-dropping the components you desire. It is a batteries included approach to wireframing, with a near-zero learning curve, leveraging the powers of open-source technologies and standards. This talk will show you how easy it is to get started using the shinydraw library so you can get building Shiny wireframes within minutes.

Speakers

Michael Page

Mr, cynkra

Mike Page is a data scientist with more than five years of experience working with R in the third sector. Here, his focus has been on developing open-source Shiny apps and tools such as the humaniverse collection of R packages. Mike holds a Masters by Research degree in psychoendocrinology... Read More →

Tuesday July 9, 2024 13:20 - 13:25 CEST
Pinzgau + Tennegau

Shiny + dashboards + web apps, Lightning Talk

13:25 CEST

A Bayesian Approach to Decision Making in Early Development Clinical Trials : an R Solution. - Audrey Yeo, Roche

Showcasing a new statistical software that supports decision making on whether a novel cancer treatment demonstrates sufficient safety and efficacy signals to warrant further investment.

Speakers

Audrey Yeo

Statistical Software Engineer && Biostatistician, Roche

Audrey Yeo is a Statistical Software Engineer and Clinical Trial Biostatistician at F. Hoffman La-Roche since 2021. Together with the statistician engineering team, they are creating a state of art engineering tool to enhance decision making for early development. Audrey has a pharma... Read More →

Tuesday July 9, 2024 13:25 - 13:30 CEST
Pinzgau + Tennegau

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:30 CEST

Generate Raw Synthetic Dataset for Clinical Trial - Binod Jung Bogati, Numeric Mind

Obtaining synthetic raw datasets, particularly for clinical trials, poses significant challenges. The reliance on manual data entry in Electronic Data Capture (EDC) systems, along with the creation of test data scenarios for generating Study Data Tabulation Model (SDTM) and other clinical programming tasks, presents complexities. syngenR, an R package, addresses these challenges by offering a solution that generates customized synthetic raw datasets for clinical trials. This presentation introduces an alternative to conventional test data generation and entry methods, addressing specific limitations and challenges of the current approach. By automating the creation of synthetic data that accurately reflects real-world variability, reliability, and efficiency of SDTM generation and other clinical programming tasks while avoiding the inaccuracies associated with manual data entry. This package can also be used in educational settings, and its capability to test various clinical trial scenarios, and its potential to significantly reduce the time and effort required for clinical trial preparation and execution.

Speakers

Binod Jung Bogati

Associate Manager - Data Science, Numeric Mind

Binod Jung Bogati is a Statistical Programmer at Numeric Mind since 2020. Apart from work, he is also rOpenSci 2023/24 Champion, R User Group Nepal's organizer, hosts R community events. He loves working on data and currently focusing on Clinical Data Science / Life Science.

Tuesday July 9, 2024 13:30 - 13:35 CEST
Pinzgau + Tennegau

Biostatistics + epidemiology + bioinformatics, Lightning Talk

13:35 CEST

Roam: Remote Objects with Active-Binding Magic - Yangzhuoran Fin Yang, Monash University

The "roam" package simplifies the creation of R objects that resemble regular objects but are sourced from remote locations. It empowers package developers to incorporate these "roaming" objects, which may surpass the 5MB limit, into their packages. Additionally, it facilitates dataset updates independent of package updates through functions that retrieve data from remote sources. https://github.com/FinYang/roam

Speakers

Yangzhuoran Fin Yang

PhD Candidate, Monash University

Yangzhuoran Fin Yang is a PhD candidate in the Department of Econometrics and Business Statistics at Monash University. His PhD project is on the use of transformations of time series to improve forecasting. Fin is active in research software development, (co)authoring open source... Read More →

Tuesday July 9, 2024 13:35 - 13:40 CEST
Pinzgau + Tennegau

R workflow + deployment + production, Lightning Talk

13:40 CEST

Checklist Improves Collaboration, Quality and Visibility of Your Code - Thierry Onkelinx, Research Institute for Nature and Forest

The checklist package is a set of rules for R packages and R source code projects. The ruleset covers several topics: folder structure, filename conventions, spelling, code style, citation metadata, licence, contribution guidelines, ... Adherence to a common set of rules within an organisation facilitates collaboration between its members. Enforcing citation metadata and an open source licence improves the visibility of projects. Automated checks via GitHub Actions detect problems as soon as possible. Checklist is based on the rcmdcheck, lintr, pkgdown, codemetar and hunspell packages. Where applicable, we use the same rules for projects and packages. The maintainer can choose which parts of the ruleset apply to a project. In the case of an R package, the entire ruleset is mandatory. Publishing code on Zenodo is easy if you link to your GitHub repository. Each release on GitHub triggers a new version on Zenodo with a specific DOI. A GitHub action creates a new release for each new version of the package. Documentation and source code is available on https://inbo.github.io/checklist

Speakers

Thierry Onkelinx

statistician, Research Institute for Nature and Forest

statistician at the Research Institute for Nature and Forest

Tuesday July 9, 2024 13:40 - 13:45 CEST
Pinzgau + Tennegau

R workflow + deployment + production, Lightning Talk

13:45 CEST

verdepcheck - A Tool for Dependencies Check - Pawel Rucki & André Veríssimo, Roche

A proper dependency management is critical to assure a good experience of your package users. Package API incompatibilities, breaking changes or incorrect minimal dependency versions might lead into various compatibility issues on the user end.

In this talk I will introduce you to the newly created product (a package and associated GitHub Action) designed for package developers that will help you to detect and solve these issues earlier.

Speakers

Pawel Rucki

Ms, Roche

Pawel graduated in 2015 from University of Warsaw, Econometrics and Quantitative Economics. Working with R for almost 10 years now, Pawel applied it in the field of geospatial data analysis, credit risk assessment, financial provisions calculation and clinical trial data analysis... Read More →

André Veríssimo

PhD, Roche

Tuesday July 9, 2024 13:45 - 13:50 CEST
Pinzgau + Tennegau

Efficient programming, Lightning Talk

14:10 CEST

R for Streamlined Research: Spotlight on Data Collection - Agustin Perez Santangelo, Appsilon

In this session, I will explore R's potential for data collection, a less explored aspect of the language. While R is widely recognized for its robust data analysis and reporting capabilities, its utility for data collection, particularly through R Shiny apps, is less commonly discussed. I aim to shed light on this aspect, demonstrating how R can serve as an end-to-end solution for academic workflows, especially in fields studying human behavior such as experimental psychology, social sciences, and behavioral economics. I will walk through two case studies from published papers where R was not only used for data analysis but also for data collection. These examples illustrate how an R Shiny app served as an online experiment platform, enabling efficient and effective data gathering. The goal of this session is to broaden the perspective of R users and to inspire academics to consider R as a comprehensive tool for their research journey. From data collection to data analysis, and all the way to authoring and publishing, R can streamline the process, making research more efficient and reproducible.

Speakers

Agustin Perez Santangelo

Mr, Appsilon

I am a molecular biologist and cognitive scientist from Argentina. Currently, I work as a software engineer (mainly using R and R Shiny) at Appsilon, I enjoy translating ideas into code.

Tuesday July 9, 2024 14:10 - 14:30 CEST
Pinzgau + Tennegau

Social sciences

14:30 CEST

Diagnostic Modeling for Educational and Psychological Assessment - Jake Thompson, Accessible Teaching, Learning, & Assessment Systems (ATLAS); University of Kansas

Diagnostic classification models are psychometric models that estimate the presence or absence of discrete fine-grained attributes. Due to the categorical nature of the latent variables, assessments using diagnostic models can provide highly reliable results with fewer items, reducing the burden on respondents. In addition, the fine-grained nature of the constructs facilitates the reporting of results that are more informative and actionable than a single overall score. The attributes can represent, for example, student proficiency on educational skills, or the presence of psychological traits or disorders, making these model useful in a variety of contexts. In this session, we will discuss the general properties of diagnostic models and describe how to analyze psychological and educational assessment data with diagnostic models using the R package measr. We will show how the R package measr, which interfaces with the popular Stan language, can be used to easily estimate diagnostic models and evaluate model performance (e.g., model fit, reliability). Finally, we’ll discuss how to draw inferences from the results to answer substantive research questions.

Speakers

Jake Thompson

Assistant Director of Psychometrics, Accessible Teaching, Learning, & Assessment Systems (ATLAS); University of Kansas

W. Jake Thompson is the Assistant Director of Psychometrics for Accessible Teaching, Learning, and Assessment Systems at the University of Kansas and the lead psychometrician for the Dynamic Learning Maps Alternate Assessment and Pathways for Instructionally Embedded Assessment. His... Read More →

Tuesday July 9, 2024 14:30 - 14:50 CEST
Pinzgau + Tennegau

Social sciences

14:50 CEST

Xmap: Unified Tools for Ex-Post Data Harmonisation - Cynthia A Huang, Monash University

Social science research often involves harmonising data from multiple sources. For example, analysts often must resolve differences between country-specific occupation classification standards to compare labour statistics from multiple countries. Harmonised datasets involve both domain expertise and technical data-wrangling skills. Unfortunately, details of the harmonisation logic are often lost in the idiosyncrasies of bespoke data preparation scripts and ad-hoc documentation, making it difficult for others to validate or reuse harmonisation efforts. The {xmap} package addresses these challenges with a new framework and tools for data harmonisation using 'crossmap' tables. The crossmap framework unifies and simplifies the specification, implementation, validation, and documentation of recoding, aggregating and splitting operations. Crossmaps extend existing crosswalk/look-up table approaches to support one-to-many and many-to-many relationships between alternative classification standards, in addition to one-to-one and many-to-one recoding. The package also provides built-in safeguards to avoid data leakage and graph-based methods for standardised documentation.

Speakers

Cynthia A Huang

PhD Candidate, Monash University

Cynthia Huang is a PhD Candidate in the Department of Econometrics and Business Statistics at Monash University. She completed her undergraduate and honours degrees in Economics at the University of Melbourne. Her research focuses on principles and methods for using complex and alternative... Read More →

Tuesday July 9, 2024 14:50 - 15:10 CEST
Pinzgau + Tennegau

Social sciences

15:10 CEST

DropR: Analyze and Visualize Dropout in Research - Annika Tave Overlander, University of Konstanz

In this talk we present dropR, a tool to analyze and visualize dropout especially from internet-based research. Among other features, dropR turns input from datasets into visual displays of (1) dropout curves, (2) percent remaining, and (3) dropout statistics between different conditions. It calculates parameters relevant to dropout and survival analysis, such as Chi Square values for points of difference, initial drop, confidence bands, and percent remaining in stable states. With automated inferential components, it identifies critical points in dropout and critical differences between dropout curves for various experimental conditions and generates corresponding statistical analysis. Survival tests include Chi Square, Kaplan-Meier Estimation and Rho family tests. The visual displays in the associated Shiny app are interactive so users caneasily identify regions within a display for further analysis in demo data as well as custom data provided by the user. It produces accessible - e.g. color-blind friendly - output (e.g. pdf, png) that is publication ready. dropR is made from researchers for researchers and is currently available at https://github.com/mbannert/dropR.

Speakers

Annika Tave Overlander

M.Sc., University of Konstanz

Annika Tave Overlander began her Ph.D. in Psychological Methods in December 2023. Her research focuses on the development of online tools to assist both researchers and students in acquiring the necessary skills for proper statistical analysis. She is committed to Open Science practices... Read More →

Tuesday July 9, 2024 15:10 - 15:30 CEST
Pinzgau + Tennegau

Social sciences

15:30 CEST

Handling Data from Social Science Surveys with the 'Memisc' Package - Martin Elff, Zeppelin Universität, Friedrichshafen

While R provides an excellent infrastructure for advanced statistical data analysis and graphics, it is by itself not well-suited to help users from the social science to face the typical challenges involved in the preparation of data from social science surveys. This is a reason by many social scientists stick to commercial software packages such as Stata and SPSS. The aim open-source package 'memisc' provides a comprehensive infrastructure for the preparation of social science survey data. It allows dealing with variable labels, value labels, and user-defined missing values. It provides easy ways to recode data and to produce data codebooks. It thus allows social scientists to become independent of commercial software packages.

Speakers

Martin Elff

Prof. Dr., Zeppelin Universität, Friedrichshafen

Martin Elff is a professor of political sociology at Zeppelin University (Friedrichshafen, Germany). He is the author of "Data Management with R: A Guide for Social Scientists" (Sage Publications) and of three R packages published on CRAN. He has published research articles on electoral... Read More →

Tuesday July 9, 2024 15:30 - 15:50 CEST
Pinzgau + Tennegau

Social sciences

11:30 CEST

Maintaining the I/O Infrastructure of R: Ten Years of `Rio` and `ReadODS` - Chung-hong Chan, GESIS – Leibniz-Institut für Sozialwissenschaften

In this proposed talk, I will talk about my experience in maintaining the "boring", but arguably important, part of R: the Input and Output (I/O) infrastructure. The foci will be two packages I am currently maintaining and recently have their respective tenth anniversary: `rio` and `readODS`. In this proposed talk, I will briefly talk about how the (chaotic) I/O infrastructure of R looked like ten years ago. Then, I will talk about how the package `rio` simplifies I/O tasks with only two functions: import() and export(). I will also talk about the package `readODS`, which is designed as a silent family member of `rio` for reading and writing OpenDocument Spreadsheets (ODS), a truly open format that has been adopted by various government agencies such as NATO and EU. Then, I will talk about what has been changed in the last ten years by `rio` and `readODS`. For example, `readODS` has a performance gain of over 1000x and is the significantly faster and usable ODS reading and writing option than the offerings for Python, Julia, and Javascript. Finally, I will give an outlook of what the future of I/O infrastructure of R would look like.

Speakers

Chung-hong Chan

Senior researcher, GESIS – Leibniz-Institut für Sozialwissenschaften

Dr. Chung-hong Chan (PhD University of Hong Kong, 2018) is Senior Researcher in the Department of Computational Social Science, GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany, and External Fellow at the Mannheim Center for European Social Research, University... Read More →

Wednesday July 10, 2024 11:30 - 11:50 CEST
Pinzgau + Tennegau

Research software engineering

11:50 CEST

Statistical Software Engineering : a Statistician’s Technical Journey in R - Audrey Yeo, Roche

Showcasing my one year journey as a statistician in statistical software engineering, required learnings and future outlooks.

Speakers

Audrey Yeo

Statistical Software Engineer && Biostatistician, Roche

Wednesday July 10, 2024 11:50 - 12:10 CEST
Pinzgau + Tennegau

Research software engineering

12:10 CEST

Engineering a Reliable R Package for Regulatory Use Using "Rpact" as an Example - Friedrich Pahlke & Gernot Wassmer, RPACT

In the ever-evolving world of clinical trial design, the R package "rpact" (available on CRAN and GitHub) has emerged as a pivotal tool for confirmatory adaptive clinical trials, crafted specifically to meet the stringent demands of regulatory requirements. This presentation will dive into the core concepts and challenges encountered over the past six years since the project's inception, which began with successful crowdfunding. Our solution was a robust validation framework inspired by GAMP 5 principles, incorporating comprehensive validation documentation, tools, and utility packages from the outset. This approach enabled high automation levels in the validation process, making development feasible with a minimal team. A key concept in our methodology is the use of template-based unit tests. These templates not only generate "testthat" test cases but also enable automation of the creation of test plans and references to function specifications, with the test protocol linking back to individual test cases. This seamless integration of testing and documentation has made "rpact" a trusted and highly accepted package in the pharmaceutical industry.

Speakers

Friedrich Pahlke

CEO, RPACT

Friedrich Pahlke, with a PhD from the University of Lübeck (2008), has been an independent consultant in computer science, data science, and biostatistics since 2008. Previously, he was a Research Fellow at Lübeck's Institute of Medical Biometry and Statistics. As RPACT's co-founder... Read More →

Gernot Wassmer

CEO, RPACT

Gernot Wassmer, PhD, is a statistician and co-founder of RPACT. He received his PhD in 1993 at the Institute of Statistics, University of Munich, and was a Research Fellow at the Institute for Epidemiology, GSF Neuherberg, and the Institute of Medical Statistics, University of Cologne... Read More →

Wednesday July 10, 2024 12:10 - 12:30 CEST
Pinzgau + Tennegau

Research software engineering

15:00 CEST

Autovi: Automated Assessment of Residual Plots Using Computer Vision - Weihao Li, Monash University

Visual assessment of residual plots is crucial for evaluating linear regression model assumptions and fit, but accurately interpreting these plots can be challenging. The 'autovi' package provides an automated solution in R by leveraging computer vision models. Taking a residual plot as input, 'autovi' approximates a distance metric that quantifies the divergence of the actual residual distribution from the reference distribution expected under correct model specification. This approximated distance enables formal statistical tests and provides a holistic approach to collectively assess different model assumptions. This talk will introduce the functionality of 'autovi', demonstrate its performance across diverse regression scenarios, and discuss opportunities to extend the package.

Speakers

Weihao Li

Mr, Monash University

Weihao (Patrick) Li, a third-year PhD student in the Department of Econometrics and Business Statistics at Monash University, is actively engaged in research focused on automating visual inference for residual diagnostics. Patrick completed a Bachelor of Commerce majoring in Business... Read More →

Wednesday July 10, 2024 15:00 - 15:20 CEST
Pinzgau + Tennegau

Machine learning and AI

15:20 CEST

Mlr3mbo: Modern and Flexible Bayesian Optimization - Lennart Schneider, LMU Munich & Munich Center for Machine Learning (MCML)

Bayesian Optimization has emerged as the de facto standard for optimizing computationally intensive black-box functions. Such functions, characterized by the lack of availability of any information beyond the output value for a given input, present significant challenges in domains ranging from hyperparameter optimization in machine learning to applied sciences such as chemical engineering, material sciences, and drug discovery. mlr3mbo offers a modern and versatile approach to Bayesian Optimization in R as part of the mlr3 ecosystem. It not only provides ready-to-use optimization algorithms but also provides the essential building blocks necessary for the easy development of custom Bayesian Optimization algorithms. This flexibility extends to supporting both single and multi-objective optimization problems, along with handling mixed search spaces that include continuous, categorical, and conditional variables. In this talk, we will showcase mlr3mbo and its key features with practical demonstrations on hyperparameter optimization in machine learning, illustrating its potential to boost efficiency and effectiveness.

Speakers

Lennart Schneider

PhD Student, LMU Munich & Munich Center for Machine Learning (MCML)

Lennart Schneider is pursuing his PhD at LMU Munich's Chair of Statistical Learning and Data Science and the Munich Center for Machine Learning (MCML), under the guidance of Prof. Dr. Bernd Bischl. His research primarily focuses on Hyperparameter Optimization, Neural Architecture... Read More →

Wednesday July 10, 2024 15:20 - 15:40 CEST
Pinzgau + Tennegau

Machine learning and AI

15:40 CEST

ML-Based Imputation Methods in R Package VIM: Performance and Considerations - Johannes Gussenbauer & Alexander Kowarik, Statistics Austria; Nina Niederhametner, Statistik Austria

Missing data poses a pervasive issue in statistical analysis across various domains. Ignoring missing values or using incongruous imputation methods can introduce bias and decrease the validity of statistical results. To overcome the challenge of missing data imputation, we propose the use of novel machine learning algorithms: The R package VIM (Visualization and Imputation of Missing Values) has incorporated machine learning (ML)-based imputation methods, including xgboost and transformer models. This presentation will elucidate the recent advancements in VIM, with a special emphasis on the performance of these ML models in handling missing data, comparing them to more conventional imputation methods, and highlight their advantages and disadvantages. Through real-world examples, we aim to demonstrate the effectiveness of our models in improving accuracy and reliability.

Speakers

Alexander Kowarik

Head of Statistical methods and survey methodology, Statistics Austria

Dr. Alexander Kowarik is head of the methods unit at Statistics Austria with more than 10 years of experience working at a NSI. He is an active contributor to the R open source community with a focus on official statistics application.

Johannes Gussenbauer

Methodologist, Statistics Austria

I studied Mathematics at the Universtiy of Technology in Vienna and am working as a methodoligst at Statistics Austria since 2017. My main topics at work cover imputation, calibration and error estimation for surveys as well as text classification using R. I contribute to various... Read More →

Nina Niederhametner

Methodologist, Statistics Austria

Nina Niederhametner started working as a methodologist at Statistik Austria in November 2023, where her main work centers around imputation and classification using large language models. She also specializes in data privacy and anonymization with special focus on synthetic data... Read More →

Wednesday July 10, 2024 15:40 - 16:00 CEST
Pinzgau + Tennegau

Machine learning and AI

10:30 CEST

PACTA: Empowering the Climate Finance Transition with R - Alex Axthelm, RMI

In the urgent pursuit of climate action, the need for effective financial tools to drive sustainable investment has become paramount.
The Paris Agreement Capital Transition Assessment (PACTA) is a forward-looking, science based analysis helping shift capital flows in greener directions and enabling the financial sector to contribute to the goals of the Paris Agreement.
PACTA offers free tools supporting investors in determining the alignment of their portfolios and loan books with widely accepted climate scenarios.
To date more than 1500 institutions have assessed their portfolios with PACTA, analyzing assets totalling over US$100T.

PACTA also equips governments and regulators to assess the climate alignment of their regulated entities, both individually and at the level of an entire sector.
Our team has supported more than a dozen government entities and regulatory bodies in assessing the climate alignment of their financial sectors.

PACTA, written in R and freely available under the MIT license, stands as a powerful tool in the fight against climate change.
Join us to explore its transformative potential and contribute to the advancement of sustainable finance.

Speakers

Alex Axthelm

Thursday July 11, 2024 10:30 - 10:35 CEST
Pinzgau + Tennegau

Economics + finance + insurance + business, Lightning Talk

10:35 CEST

tRialblazing – advantages of using R in large clinical trials - Piotr Starnawski, Novo Nordisk A/S

Pharmaceutical industry programming has for many years been characterized by "one programming language - take it or leave it". This is reflected in persistent use of established standard programs and closed source languages, due to their prevalence within the field.
However, the transition to open source is well underway and the advantages of using modern languages, such as R, are becoming more common and accepted. Programming of datasets for large clinical trials in R greatly benefits from using i) modern, scalable infrastructure; ii) large speed gains from parallelization paired with new file formats; iii) integrated version control, and iv) DevOps solutions, just to name a few advantages. The nature of open source itself enables tapping into community solutions, e.g. the pharmaverse packages, and, in return, contributing to them with internally developed code.
This presentation will outline the challenges we have been facing while transitioning to R in Novo Nordisk, the expected and often unexpected gains resulting from that change and the direction, in our opinion, that clinical trial programming is headed towards.

Speakers

Piotr Starnawski

Thursday July 11, 2024 10:35 - 10:40 CEST
Pinzgau + Tennegau

Biostatistics + epidemiology + bioinformatics, Lightning Talk

11:30 CEST

Squat: Statistics for Quaternions Over Time - Aymeric Stamm, CNRS

The study of rotational movement is of paramount importance in robotics as well as in health science. The statistical unit behind measurements of rotational motion is a sequence of 3-dimensional rotations that evolve over time. The goal of {squat} is to provide accessibility to extensions of common statistical methods for the analysis of rotation-valued time series and functional data. The package relies on the Quaternion class from the Eigen library accessed through the {RcppEigen} package. It provides dedicated classes for a single curve as well as a set of curves. Currently, it supports centring, standardisation, visualisation (powered by {ggplot2} and, optionally, {gganimate}), mean and median computation, random sampling, exponential and logarithmic maps to go back and forth from and to the tangent space respectively, smoothing, resampling, distance matrix computation, clustering methods (hierarchical, k-means and dbscan) and principal component analysis. Clustering and PCA also have their dedicated visualisation tools. The package has a dedicated website (https://lmjl-alea.github.io/squat/index.html) as well as a public Github repository (https://github.com/LMJL-Alea/squat/).

Speakers

Aymeric Stamm

Research Engineer, CNRS

I’m Aymeric (pronounced M-Rick). I am a research engineer specialised in statistical information. My theoretical research revolves around developing novel statistical methods for analysing complex data, such as manifold-valued data, network-valued data, topological data, connectome... Read More →

Thursday July 11, 2024 11:30 - 11:50 CEST
Pinzgau + Tennegau

Statistical modelling

11:50 CEST

Introducing the 'Gasmodel' Package for Generalized Autoregressive Score Models - Vladimír Holý, Prague University of Economics and Business

I present the 'gasmodel' package, designed to facilitate the estimation, forecasting, and simulation of a broad range of generalized autoregressive score (GAS) models. GAS models are a class of observation-driven time series models that employ the score to dynamically update time-varying parameters of the underlying probability distribution. The package supports diverse data types, offers a rich selection of distributions, provides flexible options for specifying dynamics, and allows for the incorporation of exogenous variables.

Speakers

Vladimír Holý

Dr., Prague University of Economics and Business

Vladimír Holý is an assistant professor at the Prague University of Economics and Business. His area of expertise is time series analysis.

Thursday July 11, 2024 11:50 - 12:10 CEST
Pinzgau + Tennegau

Statistical modelling

12:10 CEST

Recalibration of Gaussian Neural Network Regression Models: The RecalibratiNN Package - Carolina Musso, Instituto de Pesquisa e Estatística do Distrito Federal

Machine learning has significantly enhanced prediction performance; however, the estimation of uncertainty in these predictions is still a challenge. This issue is particularly pronounced in Artificial Neural Networks (ANNs), where predictions often suffer from poor calibration. Although some methods are available for recalibration, choosing and implementing the appropriate one can be challenging. To address this issue, we introduce the R package recalibratiNN that provides a computational implementation of a quantile-based post-processing technique for recalibration. The current version of the package includes functions specifically designed for recalibrating Gaussian models (i.e., where the ANN was trained with the Mean Squared Error (MSE) loss function). The method can be applied at any representation layer of the network. The package is based on the technique presented in the recent study "Model-Free Recalibration of Neural Networks" (https://arxiv.org/abs/2403.05756) by the co-authors Ricardo Torres, Gabriel Reis and Guilherme Rodrigues, among other authors. It leverages information from cumulative probabilities, enabling the generation of Monte Carlo samples from the recalibrated predictive distribution and facilitating both local and global recalibration efforts. The recalibratriNN package also features diagnostic functions to help visualize miscalibration issues. It is readily available on both GitHub (https://github.com/cmusso86/recalibratiNN) and CRAN (https://cran.r-project.org/web/packages/recalibratiNN/).

Speakers

Carolina Musso

PhD, Instituto de Pesquisa e Estatística do Distrito Federal

Biologist and Statistician, with a phD in Ecology and specialization in Data Science. Working as data analysis in public service for the past seven years.

Thursday July 11, 2024 12:10 - 12:30 CEST
Pinzgau + Tennegau

Statistical modelling

12:30 CEST

Neural Network-Based Text Classification for International Standardized Codes Using R - Nina Niederhametner, Statistik Austria & Johannes Gussenbauer, Statistics Austria

International standard classifications such as ISCO (for Occupation), ISCED (for Education) and COICOP (for Consumption) serve as pivotal statistical frameworks for the organization and classification of information. In official statistical practices, adherence to these codes is essential for thorough analysis and comparison of findings. Survey respondents typically provide information in an unstructured free textual format, requiring subsequent assignment to standardized code. This process is often done manually, resulting in time-consuming laborious tasks. In our talk, we propose an approach that automates the classification of textual data into various standardized codes using simple mathematical techniques combined with neural network-based language models, utilizing the R libraries TensorFlow and Keras. Additionally, we illustrate the development of application programming interfaces (APIs) using plumber, and the deployment of our models through posit connect, establishing accessibility to a broad user base.

Speakers

Johannes Gussenbauer

Methodologist, Statistics Austria

Nina Niederhametner

Methodologist, Statistics Austria

Thursday July 11, 2024 12:30 - 12:50 CEST
Pinzgau + Tennegau

Text data and NLP