useR! 2024: Full Schedule

In Person
8 - 11 July, 2024
Learn more and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for useR! 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Summer Time (UTC+02:00). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

The virtual program will take place on 2 July. Please see the virtual schedule page for more information.

13:30 CEST

Screening and Random Projection Tools for Regression Analysis in R - Laura Vana-Guer, TU Wien

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Random projection is a powerful and important tool for dimensionality reduction where a set of high-dimensional points is linearly mapped onto a lower dimension. Random projection matrices can be rapidly generated and are oblivious to the data distribution, maintain interpretability and are equipped with theoretical guarantees on preserving the geometry of the original space with a high probability. When employed in a supervised setting, they can provide a significant reduction in computational cost. However, they tend to overfit so it is desirable to first eliminate the unimportant predictors and then perform the random projection and estimate the model on the space of the reduced (i.e., projected) predictors. Moreover, to reduce the uncertainty from the random projection, ensembles can be built. In this work we propose an R package which implements a variety of random projection and screening tools for regression in high-dimensional settings. The functionality of the package is presented using simulated and real data examples.

Speakers

Laura Vana-Guer

PhD, TU Wien

Laura's work focuses on developing methods and statistical software for the analysis of complex data structures such as high-dimensional and multivariate data. She is the co-author of several R packages including mvord, an R package for the analysis of multivariate ordinal data, and... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Big and high-dimensional data, Poster Session

13:30 CEST

Dupseqr: Disentangling Genomic Aberrations Made Easy - Ekaterina Akimova & Philine Hoven, Laboratory for Immunological and Molecular Cancer Research

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Aberrant repair of DNA double strand breaks is a prominent feature of various cancers. It can result in deletions, duplications, translocations and insertions. In our previous work, we analyzed amplicon-sequencing data with our custom pipeline to detect templated insertions at the DNA damage sites (Akimova et al. 2021, doi:10.1093/nar/gkab051). Here we present the dupseqr, an R package, which summarizes several functions for a sequential tracing of insertions, duplications and inversions. Dupseqr comprises, on the one hand, the existing bash commands in a pipe function for the pre-processing of the FASTQ files and BLAST search, followed by precise trimming and filtering of mapped sequences in order to identify insertions. On the other hand, it includes a novel function to detect and depict duplications and inversions directly from your DNA sequences, whereas the input and the output can be adjusted depending on your initial data structure and your final goal. All in all, dupseqr provides a quick possibility to elucidate aberrations, such as short duplications, inversions and insertions from distant genomic sites using the sequencing data.

Speakers

Ekaterina Akimova

Dr. rer. nat., Laboratory for Immunological and Molecular Cancer Research

I completed my PhD at LIMCR, investigating DNA damage in cancer. During this time, I worked on various projects, including the development of R-based analysis pipelines, and found my passion for coding. In 2023, I finished the doctorate, but continued my research endeavours as a PostDoc... Read More →

Philine Hoven

MSc, Laboratory for Immunological and Molecular Cancer Research

I am a PhD student of Natural and Life Sciences, currently working on the characterization of templated sequences insertions in the cancer background. My work involves wet lab techniques as well as data analytics with R.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

Exploring the Within-Individual Variability of Human Motor Learning Using GAMLSS - Julia Wood, The University of Queensland

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The neural correlates of learning are frequently explored in neuroscience research, typically through learning-induced changes in the mean of a response variable. Motor skill learning can enhance neural communication between the brain and the trained muscle. This communication is typically assessed by inducing muscle contractions in the trained pathway and measuring changes in mean size over time, with larger measurements suggesting enhanced communication. Motor learning may also improve the efficiency of this communication, possibly reflected by more consistent muscle contractions and a reduction in the within-individual variability of these measurements over time. This study explored how motor skill learning and a subsequent intervention (active vs. placebo) influenced changes in the mean size and within-individual variability of these measurements. Effects were estimated by fitting a location and scale model using the GAMLSS package in R. GAMLSS fits a distributional model, which can estimate all parameters for the specified distribution. The results and analysis pipeline from this study will be discussed, emphasising the utility of the GAMLSS model in this research.

Speakers

Julia Wood

Miss, The University of Queensland

After working as an R&D chemist for several years, I became intrigued by why we sleep and how we form new memories. This inspired me to pursue a doctoral path in human sleep and memory research. During my PhD, I have discovered deep interests in data analysis, statistical modelling... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

MINT+: Web App with R Brains for SDTM Automation - Magdalena Krochmal & Adam Forys, Roche

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

In the realm of clinical research, a web application known as MINT+ is revolutionizing the process of SDTM automation. At its core, MINT+ utilizes a set of R-packages to power the entire solution. Its intuitive React UI empowers users to create custom SDTM mapping specifications, accommodating diverse study requirements. Leveraging DocumentDB for data storage, MINT+ enables easy metadata sharing and facilitates reuse across studies, significantly reducing workload and improving accuracy.
During this session, we will explore the R-based components that power MINT+ and are responsible for data processing and backend processes. The "rmint.sdtm" automates SDTM mappings, "rsaffron.api" serves as the backend API, and "roak" allows customization of mappings. Users can address complex scenarios that often arise in the SDTM mapping creation process, making R packages the preferred choice for overcoming industry challenges.
With advanced algorithms, a user-friendly interface, and seamless integration, MINT+ streamlines SDTM creation workflow, greatly reducing the time and effort required.

Speakers

Magdalena Krochmal

Senior Data Scientist, Roche

Magdalena Krochmal is a Senior Data Scientist based in Basel, Switzerland. With a background in biomedical engineering and a Ph.D. in bioinformatics, she has spent three impactful years at Roche. Magdalena is an expert R developer specializing in SDTM automation. Her work centers... Read More →

Adam Forys

Mr., Roche

Adam is a Principal Data Scientist at Roche. He is dedicated to building R packages that empower teams working on SDTM. He is committed to collaboration and enjoys guiding others in overcoming technical obstacles and optimizing their data science workflows.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

Use of R in Calibration of Infectious Disease Models - Nicole Swartwood, Harvard TH Chan School of Public Health

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Calibration approaches are commonly used in infectious disease modeling, but there has been little study to describe the use of these techniques within the field. Furthermore, R is increasingly used by epidemiologists to understand disease dynamics. As part of a larger scoping review investigating the distribution of calibration methods for models of HIV, TB, and malaria, we will collect data on programming languages and packages/libraries cited in published manuscripts. We aim to identify with which calibration strategies R is most commonly used and ultimately identify any gaps in and potential for development in the available calibration packages within R. We also aim to identify any association with disease, model goal, and or reducibility.

Speakers

Nicole Swartwood

Senior Research Analyst, Harvard TH Chan School of Public Health

Nicole Anne Swartwood is a infectious disease modeler at the Harvard TH Chan School of Public Health. Her work focuses on tuberculosis and COVID-19 in the United States. She co-founded the Harvard R User Group and remains as a co-organizer. She is passionate about empowering junior... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Biostatistics + epidemiology + bioinformatics, Poster Session

13:30 CEST

The Rbanism Community: Empowering Urbanists to Use Research Software Effectively and with Confidence - Claudiu Forgaci, Delft University of Technology

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The Rbanism community aims to empower urbanism researchers, students, educators and practitioners to use open-source software and related open-science practices effectively and with confidence. It raises awareness, stimulates engagement and builds capacity by demonstrating the benefits of reproducibility, automation and scalability. Rbanism was initiated in 2021 by a group of R users in the Department of Urbanism at TU Delft, and it has scaled up to an international community of 70+ members. Our mission is to cultivate scientific computing, data science, computational thinking and software management skills applied to urbanism. To that end, our activities include workshops, many of which are carried out as part of the Carpentries, challenges with prizes, and meetups. In addition to in-person activities, we organise online events open to our international community members. These various forms of engagement follow our commitment to inclusion and accessibility. The Rbanism community is supported by the Netherlands eScience Center, the Open Science Community Delft, as well as the Department of Urbanism and Central Library at TU Delft. Website: rbanism.org

Speakers

Claudiu Forgaci

Assistant Professor of Urban Design and Analytics, Delft University of Technology

I am an assistant professor of urban design and analytics at TU Delft, passionate about asking spatial and non-spatial questions with R. I co-initiated Rbanism, a community of R users that aims to empower urbanism researchers, students, educators and practitioners to use open-source... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Community and outreach, Poster Session

13:30 CEST

Confusion Matrices of Any Size with Number-Based Color Intensities Visualized Easily with R! - Lubomír Štěpánek, First Faculty of Medicine, Charles University, Prague & Faculty of Informatics and Statistics, Prague University of Economics and Business

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

A confusion matrix is a crucial tool in evaluating predictive models and comparing predicted values against actual observations. While R offers several packages such as caret, mlearning, ConfusionTableR, and others for constructing confusion matrices, customization options for color representations are often limited, although asked in papers and reports both by publication and business practice. Common methods like heatmap(), called on top of the table() function, can produce misleading color shades that do not accurately reflect the underlying data. Other solutions may require extensive coding and user-own-defined fingers-on solutions, such as using ggplot2 or similar packages, and may be time-consuming. To address this gap, we have developed a versatile graphical function that allows users to easily customize the visualization of confusion matrices with just a single line of code when called. This function can be seamlessly integrated into R workflows and has the potential to be further developed into a standalone R package for broader use. The source code and examples for this functionality can be found on our GitHub repository, https://github.com/lstepanek/confusionMatrices.

Speakers

Lubomír Štěpánek

Dr., First Faculty of Medicine, Charles University, Prague & Faculty of Informatics and Statistics, Prague University of Economics and Business

I hold M.Sc. and Ph.D. degrees in Statistics, an M.D. in General Medicine, and I'm pursuing a Ph.D. in Biomedical Informatics. As an assistant professor at Charles University and Prague University of Economics and Business, I specialize in survival analysis, machine learning, computational... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Data visualisation, Poster Session

13:30 CEST

Openstatsguide - Minimum Viable Good Practices for High Quality Statistical Software Packages - Daniel Sabanés Bové, RCONIS

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The success of the R programming language is largely due to its ease of creating and sharing R packages. We propose an opinionated framework called “openstatsguide”, published on openstatsware.org/guide.html, which can guide R package developers towards a minimum set of good practices. As far as we know from our literature search, this is the first attempt at providing a small and concise set of rules for package developers. This applies not just to R, but can also be used for functionally oriented programming languages used in data science, and we give examples for R, Python, and Julia. Rather than a full and detailed how-to guide, we keep “openstatsguide” short and on a high level, thus lowering the entry point for novice and seasoned developers alike. Our hope is that this guide can increase the adoption of software engineering good practices in the statistics community. In this talk we describe the motivation and scope of “openstatsguide”, relationship with existing work, the set of good practices, the maintenance model and ideas for future complementary guides produced by the openstatsware.org working group.

Speakers

Daniel Sabanés Bové

Ph.D., RCONIS

Daniel Sabanés Bové studied statistics and obtained his PhD in 2013. He started his career with 5 years in Roche as a biostatistician, then worked 2 years at Google as a Data Scientist, before rejoining Roche in 2020, where he founded and led the Statistical Engineering team. Daniel... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Efficient programming, Poster Session

13:30 CEST

Tidy and Reproducible Projects with the Cookiecutter R Package - Felix Henninger, Ludwig Maximilian University of Munich

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Best practices for reproducible analyses help to make our work easier and more reliable. However, there is frequently an initial hurdle to overcome to set up an analysis environment well, and this task becomes progressively harder as work takes shape and gains in complexity. To solve this, we present cookiecutter, an R package and RStudio plugin following the popular Python standard (Greenfield et al., 2022) for creating project templates. It helps create structured work environments that adhere to best practices and build on common helpers (e.g. workflow tools), while leaving room for flexibility and customisation through a guided setup wizard. Users with more specialised needs can adapt, create and (optionally) publish their own templates, contributing back to the wider data science community. Our goal is to encourage researchers and analysts to structure their projects from the get-go, by using accessible templates that support them in creating uncluttered projects and organised workflows. Ultimately, we hope that this will increase the adoption of best practices, and more robust research generally.

Speakers

Felix Henninger

Research Software Engineer, Ludwig Maximilian University of Munich

Felix makes better science easier. He builds tools, educates and advocates, to help improve how we collect and analyse data. Felix is currently a graduate student and Research Software Engineer at the Social Data Science and AI Lab (SODA), Ludwig Maximilian University of Munich.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Efficient programming, Poster Session

13:30 CEST

A New Correlation-Based Fuzzy Cluster Validity Index with UniversalCVI R Package - Onthada Preedasawakul, King Mongkut’s University of Technology Thonburi

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes (CVI) have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. In this study, we introduce a fuzzy CVI known as the Wiroonsri–Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. Overall, the WP index outperforms most of the traditional indexes in terms of efficiency and detecting secondary options. Moreover, our index remains effective even when the fuzziness parameter m is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.

Speakers

Onthada Preedasawakul

Undergraduate student in Statistics, King Mongkut’s University of Technology Thonburi

Onthada Preedasawakul is currently pursuing her B.A. in Statistics from the Department of Mathematics at the Faculty of Science in King Mongkut’s University of Technology Thonburi, Bangkok, Thailand. She is a student member of the Mathematics and Statistics with Applications (MaSA... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Machine learning and AI, Poster Session

13:30 CEST

SpICE: An Interpretable Method for Spatial Data - Natalia da Silva, Universidad de la República, UDELAR

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Statistical learning methods are widely utilised in tackling complex problems due to their flexibility, good predictive performance and ability to capture complex relationships among variables. One of the main drawbacks of statistical learning is the lack of interpretability of the results. Having interpretable statistical learning methods is necessary for obtaining a deeper understanding of these models. Specifically in problems in which spatial information is relevant, combining interpretable methods with spatial data can help to provide a better understanding of the problem and an improved interpretation of the results. This presentation focused on individual conditional expectation plot (ICE-plot), a model-agnostic method for interpreting statistical learning models and combining them with spatial information. An ICE-plot extension is proposed in which spatial information is used as a restriction to define spatial ICE (SpICE) curves (https://github.com/natydasilva/SpICE).

Speakers

Natalia da Silva

Assistant Professor, Universidad de la República, UDELAR

I am an Assistant Professor in the Department of Statistics at the Universidad de la República. I earned my Ph.D. degree in Statistics from Iowa State University in July 2017, under the supervision of Di Cook and Heike Hofmann. My research interests include supervised learning methods... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Machine learning and AI, Poster Session

13:30 CEST

Using Statistical Models to Generate Optimization Problems - Florian Schwendinger, Quintik - Technologies

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Optimization benchmark sets are commonly used to evaluate the quality and speed of optimization solvers. These problems are typically collected from real world applications. We suggest using statistical models to automatically generate optimization problems. This has the advantages that for statistical models the data generating process is typically well known therefore it is easy to generate data for the model and then transform the data into an optimization problem. Furthermore, for statistical models, properties like convexity and unboundedness are typically well known.

Speakers

Florian Schwendinger

Dipl.-Ing. PhD, Quintik - Technologies

Wrote several R packages to different topics.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Numerical methods, Poster Session

13:30 CEST

CRANhaven - Your backup repository for recently archived CRAN packages - Lluís Revilla, IrsiCaixa & Henrik Bengtsson, University of California San Francisco (UCSF)

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The Comprehensive R Archive Network (CRAN) provides the R community with more than 20,000 well-tested community-contributed R packages. One cornerstone of R is trust and correctness, which is why all CRAN packages undergo a rich set of checks - when first submitted but also daily.

R introduces new checks regularly, which means existing packages may start failing. If issues are severe enough, the CRAN Team asks the maintainer to submit a corrected version within, typically, two weeks. If not updated in time, the package is “archived” and is no longer available via traditional installation methods. As there is no public notice ahead of time, archiving of packages is a sudden, disruptive, and sometimes also blocking event for users and developers, resulting in wasted time and resources.

We have studied the archival-unarchival of CRAN packages. We will present the most common reasons for packages being archived, and how often and when they are unarchived. Based on these findings, we propose CRANhaven (https://www.cranhaven.org) - a package repository designed to mitigate the negative impact that suddenly archived packages have on the community.

Speakers

Lluís Revilla

Dr, IrsiCaixa

Bioinformatician at IrsiCaixa. Interested in R packages quality and R repositories.

Henrik Bengtsson

Henrik Bengtsson, University of California San Francisco (UCSF)

UCSF, R Foundation, R Consortium, MSC in Computer Science, PhD in Mathematical Statistics, Applied, large-scale research in Bioinformatics and Genomics. R since 2000.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Open and reproducible science, Poster Session

13:30 CEST

Get Rolling with R in the Public Sector - Thomas Knecht & Philipp Bosch, Statistical Office of the Canton of Zurich

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Creating an R process for publishing data and deploying it for other departments sounds like a mundane task and probably not worth mentioning. In the public administration of the Canton of Zurich this is still the exception.

Based on a recently finished collaborative project we show why this is a milestone on our journey towards a more digitized and data driven administration and how this transformation unfolded over the last decade.

Without giving too much away: Building an internal community around R has proven to be at least as important as configuring proxies & Git configurations in coordination with a central IT department.

Today, as a result of our efforts, the majority of the 10.000 employees of the Canton of Zurich are able to install R out of the box from our central IT department.

Speakers

Thomas Knecht

Data Scientist, Statistical Office of the Canton of Zurich

Climbing mountains - crunching data.

Philipp Bosch

Data Scientist, Statistical Office of the Canton of Zurich

Computational Political Scientist by ❤️Data Scientist by training & job.Data4Good activist @CorrelAid.21st century public servant @Kanton Zürich.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Public sector and NGO, Poster Session

13:30 CEST

Open Time Series Initiative – Human-Friendly, Machine-Readable Time Series - Matthias Bannert & Minna Heim, ETH Zurich

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Because publications by public data providers focus on a broader audience, their datasets are often not convenient to use for research.
To mitigate this problem, the opentimeseries R package provides the time series and official statistics communities with reusable code to conveniently source data from public sources. By splitting data and metadata into two different files, a long format CSV file for the data and a JSON file for multi-lingual metainformation, the package generates output that is inclusive to humans (and their favorite spreadsheet software) _and_ convenient to ingest for machines.
This data output is the starting point not only for intertemporal comparisons but also for versioning of time series, as it is needed for real-time analysis or evaluation of forecasts. The package open-sources a data ingestion framework, proven through its longtime usage in monitoring the Swiss economy at the KOF Swiss Economic Institute at ETH Zurich, for the first time. We explicitly chose the R ecosystem with its great documentation and boiler plating tools to encourage dataset maintenance and community contributions across different fields that use public data for research.

Speakers

Minna Heim

Ms., RSEED at KOF Lab at ETH Zurich

Minna Heim is an economics student at the University of St. Gallen and works as a research assistant and for organisational development at the Research Software Engineering and Economic Data (RSEED) Section at KOF Lab at ETH Zurich.

Matthias Bannert

Dr., Research Software Engineering and Economic Data (RSEED) at KOF Lab, ETH Zurich

Matthias Bannert gained his data science and data engineering at ETH Zurich in more than a decade of working for the KOF Swiss Economic Institute. Today, he works as a data engineering expert advisor at cynkra and supports ETH as a section lead in the innovation-minded KOF Lab. In... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Public sector and NGO, Poster Session

13:30 CEST

An R-Dominated Workflow to Produce 850.000 Feedback Reports to Schools in (Almost) Real Time - Gabriele von Eichhorn & Elisabeth Rothe & Moritz Friedrich, Federal Institute for Quality Assurance of the Austrian School System (IQS); Roman Freunberger, http

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Educational Large-Scale Assessment in Austria has undergone a major change in 2022, when the system was changed from one to multiple yearly tests for several subjects and grades (iKMPLUS). The major challenge for the test developers was the immediate feedback of test results to test takers and teachers. Keeping high psychometric standards, we used a mixture of pre-calibrating the test booklets for scaling and cohort-specific reference scores and automatically sourced R scripts for coding, analysing and reporting the test data. R was used in all these processes with TAM for IRT-scaling, dplyr and tidyr for convenient data wrangling, doParallel for handling the workload and R Markdown and ggplot2 for reporting and monitoring. In sum, our R-based process of reporting nation-wide test results, for primary and secondary school pupils, produces 850.000 reports every year for teachers, pupils and principals. Here, we want to present our main principles and experiences of reporting test results under an R-based programming approach with special emphasis on the underlying psychometric analyses, the subsequent automated generation of graphs and the process in general.

Speakers

Elisabeth Rothe

Dipl.-Psych., Federal Institute for Quality Assurance of the Austrian School System (IQS)

Elisabeth Rothe has worked as a Psychometrician at IQS (Federal Institute for Quality Assurance of the Austrian School System) since 2018. Her focus has been on test design, psychometric evaluation of study designs, standard setting and reporting for various audiences. She was leader... Read More →

Gabriele von Eichhorn

Psychometrician, Federal Institute for Quality Assurance of the Austrian School System (IQS)

Gabriele von Eichhorn obtained a Master’s degree in Psychology and a Bachelor’s degree in Educational Science from the University of Salzburg. Having gained experience in different psychometric and diagnostic environments, she is currently a Psychometrician at the Federal Institute... Read More →

Moritz Friedrich

MSc, Federal Institute for Quality Assurance of the Austrian School System (IQS)

study of psychology, psychometrician at IQS

Roman Freunberger

PhD, https://www.iqs.gv.at/

study of psychology, psychometrician at IQS

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Quarto and reporting, Poster Session

13:30 CEST

Distributed GxP Workloads for R - Magnus Mengelbier, Limelogic AB

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The broad and constantly evolving GxP use of R within Life Sciences is powerful. As the user base grows across the organization and R capabilities are added and evolved, you are not just managing a single environment of a particular use case. The workloads naturally become distributed across multiple environments with different architectures tailored to their peculiar role and use in the business.

We consider a set of common environments and their architectures and how a little bit of {plumber} can enable a simple-to-manage R architecture across dissimilar environments, even those that do not currently or simply cannot support the use of R. This new approach is easily extendable to Good Clinical Practice, and any of the other GxP domains, with a few simple processes and controls.

Speakers

Magnus Mengelbier

Managing Director, Limelogic AB

Magnus is currently the Managing Director of Limelogic, a contributor, collaborator and independent consultant based in southern Sweden with over 25 years of experience in the Life Science industry. A keen advocate of simple programming approaches with a focus on GxP, compliance... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

R workflow + deployment + production, Poster Session

13:30 CEST

A reproducible analysis of CRAN Task Views to understand the state of an R package ecosystem - Hugo Gruson, data.org

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The research community is increasingly aware of the need to apply software engineering best practices to scientific software. This however doesn't mean that we should discard the huge ecosystem of existing tools with large, well-established, user bases. Instead, efforts should be dedicated to integrate best practices in existing tools where possible. But this can only be done if we have a clear idea of the current state of the ecosystem, with its gaps and needs.
In this presentation, I will describe the analysis we have conducted on the ecosystem of R packages for Epidemiology, as represented by the CRAN Task View in Epidemiology. It allows us to draw a picture of where efforts to support this ecosystem should focus. This also informs future training needs for this research community, and maps a path for external contributions to packages that wish it.
Importantly, this analysis is made reproducible and applicable to any CRAN Task View out of the box, which allows research and software communities from other fields to conduct the same assessment on their own domain.
The live analysis is available at https://epiverse-connect.github.io/ctv-analysis/

Speakers

Hugo Gruson

Lead Software Architect, data.org

Hugo is a professional developer and happy R community member. He has developed and maintains packages across many fields, such as evolutionary biology, epidemiology, statistics or reproducible science and contributes to the community via blog posts or pull requests to existing packages.Over... Read More →

HugoGruson useR poster pdf

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Research software engineering, Poster Session

13:30 CEST

Ambiorix - a web framework for R - John Coene, The Y Company

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The {ambiorix} package is a web framework for R inspired by express.js which allows building traditional web application, and RESTful APIs.

Speakers

John Coene

Co-Founder, The Y Company

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

StatLearning: A Shiny App for Practicing Statistical Hypothesis Testing - Juan Claramunt, Leiden University

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

With this poster, we want to introduce StatLearning, our shiny app for practicing diverse statistical tests. This app is a step forward in digital exercising. While many digital exercises only provide data and questions, StatLearning provides an individualized learning path. To do so, we analyzed the different types of students and their preferred learning styles (reading, watching, listening, etc.). Besides, apart from the data and questions, we provide the users with statistical definitions and help windows. These windows include written explanations and videos. Moreover, we consider the diverse student's abilities, allowing students to skip steps if their statistical abilities are advanced while providing extra help for students with lower skills. This way, we aim for students with low abilities to reach advanced skills by practicing. We aim to introduce StatLearning to other R users at the conference who might find it useful for their courses. Furthermore, we would like to receive feedback to improve our app. We have used this app for three years and keep improving it yearly. Therefore, any new suggestions are welcomed.

Speakers

Juan Claramunt

Specialist in sciencific information, Leiden University

Bachelor in Mathematics at Universidad de Cantabria, Utrecht University & Brown University.Master in Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences, & European Master in Official Statistics (Utrecht University).Scientific information specialist at... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

Volcano.View: Building Dashboards That Aren't Slow - Michael Galanakis, Hasselt University / Novo Nordisk

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

We introduce the R package volcano.view that allows users to create interactive dashboards for proteomics data. Proteomics results are high-dimensional, making interactive dashboards crucial for navigating the thousands of proteins. However, the many data points that need to render also introduced slow load times when using the R package Shiny. We therefore showcase an alternative approach using the JavaScript libraries React and D3. We compared the performance between volcano.view and shiny using Google's Lighthouse tool. The package was developed to visualize results from the SomaScan proteomics assay. The package is organized into modules that can add different functionality. It can include gene set enrichment results, as wells compare results from different studies. Although, the package was developed to be used on proteomics data, it can easily be used more broadly. Although, using React to develop your dashboard is more time consuming and requires more training for data scientists when compared to Shiny. It also offers tools for greatly improving performance. Researchers should be aware of this trade-off between performance and development time.

Speakers

Michael Galanakis

Mr, Hasselt University / Novo Nordisk

Michael is an industrial PhD fellow working with Novo Nordisk and Hasselt university. He has experience working as a statistician in both clinical trials and epidemiological studies, where he has co-authored 9 peer-reviewed publications. He completed his bachelor's in mathematics... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

WebApp Studio for productionizing shiny applications, rmakrdown and R Plumber APIs

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Webapp studio that gives the possibility to R developpers to build, test and deploy entreprise level shiny dashboards, rmakrdown reports and plumber APIs on a large scale (> 100 users) in a very intuitive fashion

Speakers

Farid Azouaou

CTO, thaink²

A Data passionate with more than 10 years experience in the fields of analytics & AI for industry. I worked for different companies in different fields, the last one was at Mercedes-Benz Mobility before landing at thaink² as a CTO & Co-founder.I humbly consider myself an expert in... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Shiny + dashboards + web apps, Poster Session

13:30 CEST

SurveymonkeyR : Tools for Communicating with Surveymonkey's API - Yasuto Nakano, Kwansei Gakuin University

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

The purpose of this presentation is to propose an useful tool for social researchers using the online survey service SurveyMonkey(surveymonkey.com). That is surverymonekyR. surverymonekyR is a library package containing functions to perform tasks such as authenticating a user, creating surveys and retrieving data from the SurveyMonkey's API. surverymonekyR offers an effective option for a data lifecycle of social surveys, from creating questionnaires to obtaining and analyzing data, and finally publishing the results, within the R environment. Many individuals involved in social surveys operate online survey services via GUI in web browsers. While this method is user-friendly, it becomes inefficient for repetitive tasks. The functions included in surverymonekyR provide efficient and reproducible survey environments for social scientists who may not be proficient at API operations.

Speakers

Yasuto Nakano

Prof. Dr., Kwansei Gakuin University

professor of sociology, Ph.D. https://researchmap.jp/yasuto.nakano?lang=en

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Social sciences, Poster Session

13:30 CEST

R for Spatio-Temporal Handling of Moving Polygons - Lorena Abad, University of Salzburg

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Data cubes are structures to store and analyse spatio-temporal data in raster and vector format. Typical examples of spatio-temporal vector data are weather stations collecting data over time, or administrative polygons where historical data is aggregated per zone. A less explored use case for data cubes are moving polygons. Example of moving polygons would be spatial representations of glacier retreat, emergence of volcanic lava flows or the changes of a city boundary over time. In this contribution, I introduce the handling of polygons that evolve and move over time using vector data cubes. The implementation in R makes use of the packages {stars} and {cubble} as ways to represent data in array and tabular formats. The advantage of vector data cubes in both formats is the ability to apply common array operations, but also tidy data wrangling techniques to explore and analyse data. Temporal analyses can be performed using packages like {tsibble}, while spatial analyses can be performed using {sf} methods. Further, more complex spatio-temporal analyses like change detection can be performed using {stampr}. Visualization techniques using {ggplot2} and {tmap} are also explored.

Speakers

Lorena Abad

MSc., University of Salzburg

Doctoral researcher at the Department of Geoinformatics - Z_GIS of the University of Salzburg. Part of the research groups Risk, Hazard and Climate and EO Analytics. I focus on the analysis of big Earth observation data to map and monitor landscape dynamics and I am researching the... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Spatial data and maps, Poster Session

13:30 CEST

A guide to R packages for synthetic data generation - Michael Kammer, University of Vienna

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Statistical method development is partly driven through applications and the complexities of real world datasets. But we all know that sharing these datasets is often difficult because of legal, ethical or practical concerns, thus making the creation of synthetic data closely reproducing the real world data an attractive option circumventing such issues. Similarly, generating realistic data is important for method comparison studies that are crucial for establishing the evidence base for statistical methods.

Yet there seems to be little consensus on how to actually code data generators. As a first step to make coding of simulations more accessible, we provide a systematic scoping review of existing R packages to support data generation (results publicly available on osf). We will also include our own package that aims to complement the existing ecosystem by building a library of interesting data generators derived from real-world datasets.

A single tool is not enough to fit all needs, so we will discuss how these tools help you to support open science principles by facilitating sharing of data from your own research, or by generating data for your own methods development.

Speakers

Michael Kammer

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Combining probabilistic forecasts with the `gamstackr` package - Euan Enticott, University of Bristol

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Ensemble models are increasingly popular tools for capturing heterogeneous information and improving predictive performance. We will present the `gamstackr` R package, which provides tools for aggregating or `stacking` the probabilistic forecasts produced by different models or `experts`. In particular, the package implements a versatile, easy-to-use framework for probabilistic stacking that allows to control the experts’ weights via additive models containing fixed, random or smooth effects. It also provides statistical and computational scalability in the number of experts by exploiting context-specific relationships between them.

We will illustrate the typical workflow of the `gamstackr` package, that is how to: create a heterogeneous set of experts, build and fit several types of stacking models and visualise the ensemble weights and their relationship with the covariates. The package is currently available at https://github.com/eenticott/gamstackr.

Speakers

Euan Enticott

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

CompInt: A Package for Interpretable and Comparable Reporting of Effect Sizes - Hannah Schulz-Kümpel, Department of Statistics, LMU Munich

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Ever struggled with how to report and explain the results of a statistical model you just fit? Do not worry, the CompInt R-package is here to help you with this more than common problem! In fact, misinterpretations of statistical significance and classical effect measures like odds ratios are widespread, even among researchers familiar with their definitions. More than that, trying to compare or accumulate the results from several different models, as is the goal of multi-analyst studies and Meta-analysis, there currently really does not exist a uniform gold standard. Based on [Kümpel & Hoffmann](https://arxiv.org/pdf/2211.02621.pdf), the CompInt package implements a general reporting framework, allowing for the consistent derivation of effect size measure definitions and visualization techniques aimed at maximizing the interpretability and comparability of regression results. This session will highlight the importance of transparent reporting, explain the possible specifications of the framework, and generally showcase the applications of the CompInt package.

Speakers

Hannah Schulz-Kümpel

M.Sc., Department of Statistics, LMU Munich

After receiving her Bachelor's in Mathematics from Heidelberg University and Master's in Statistics from LMU Munich, Hannah Schulz-Kümpel is now a PhD student at the ‘Konrad Zuse School of Excellence in Reliable AI’ (relAI) under the supervision of Bernd Bischl.

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Improving the Modeling of Binary Regression Based on New Proposals for Statistical Diagnostics - Alejandra Andrea Tapia Silva, Pontificia Universidad Católica de Chile

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Binary regression models using logit or probit link functions have been widely employed in examining the relationship between binary responses and covariates. However, misspecification of the link function can result in poor model fit and compromise the significance of covariate effects. In this study, we present a local influence diagnostic method associated with a new family of link functions that allows evaluating the sensitivity of symmetric links towards asymmetric ones. This new family offers a comprehensive model that encompasses nested symmetric cases. Furthermore, we present a local influence diagnostic method to evaluate the sensitivity of odds ratios. Monte Carlo simulations are performed to evaluate both the performance of the diagnostic method and the parameter estimation of the overall model, complemented by illustrations using medical data related to menstruation and respiratory problems. The results confirm the effectiveness of our proposal, highlighting the critical role of statistical diagnostics in modeling.

Speakers

Alejandra Andrea Tapia Silva

Dr., Pontificia Universidad Católica de Chile

"I'm Alejandra, an assistant professor in the Statistics Department at PUC, Chile. I'm part of R-Ladies and I love statistical modeling, R, cats, art, and David Bowie."

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

miniSize: An R package to calculate the minimal sample size in balanced ANOVA models - Bernhard Spangl, BOKU University

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

We consider balanced one-way, two-way, and three-way ANOVA models to test the hypothesis that the fixed factor A has no effect. The other factors are fixed or random. For most of these models (including all balanced 1-way and 2-way ANOVA models) an exact F-test exists.

Given a prespecified power, miniSize allows the user to compute the minimal sample size of the above mentioned ANOVA models, i.e. the minimal number of experiments needed.

This is achieved by the determination of the noncentrality parameter for the exact F-test, a description of its minimal value by a sharp lower bound, and thus a guarantee of the worst-case power for the F-test. Additionally, we provide a structural result for the minimal sample size that we call "pivot" effect.

We will present the newly developed R package "miniSize" and give some examples of how to use its functionality to calculate the minimal sample size.

Speakers

Bernhard Spangl

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

Multilevel Regression with Projection Pursuit Tree - Eun-Kyung Lee & Seowoo Jung, Ewha Womans University

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Multilevel regression and post-stratification (MRP; Gelman & Hill, 2006; Gelman et al., 2020) are developed to process data from demographically diverse groups in complex survey designs. To obtain a representative estimate for a specific group, a multilevel regression model combines an individual-level model using individual-level data and a population-level model using group-level data. MRP is divided into two stages. The first step is the multilevel regression step, which estimates a stratified model divided into an individual model and a population model into an individual-response model using priors for parameters. The multilevel regression model is intended to calculate estimates for each class used for later post-stratification. In the individual model, only variables that enable post-stratification can be used. In this study, the existing problem of MRP, which uses only categorical variables that can be used for post-stratification, was solved by proposing a method incorporating a projection pursuit tree and implementing it in R.

Speakers

Eun-Kyung Lee

Professor, Ewha Womans University

Eun-Kyung Lee is a Professor in the Statistics Department. She earned a Ph.D., majoring in Statistical Computation and Visualization of Multi-variate Data at Iowa State University in the U.S. Currently, she's engaging in projects in medical statistics and statistical computing ar... Read More →

Seowoo Jung

Multilevel regression with projection pursuit tree, Ewha Womans University

- Bachelor’s Degree in Statistics, Ewha Womans University (2019-2023) - Master of Science in Statistics, Ewha Womans University (2023~)

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session

13:30 CEST

SCM: An R package for Generalized Additive Modelling of Covariance Matrices - Vincenzo Gioia, University of Trieste

Wednesday July 10, 2024 13:30 - 15:00 CEST

TBD

Coupling additive mean vector and covariance matrix modelling for multivariate Gaussian models is a complex task, requiring methodological choices on the model structure, scalability of the model fitting procedures, and a set of tailored inferential and model-checking tools. The SCM (Smoothing for Covariance matrix Modelling) R package enables smooth additive modelling of the elements of the mean vector and of an unconstrained parametrisation of the covariance matrix, while ensuring computational scalability by exploiting model sparsity and using the efficient linear algebra routines provided by the RcppArmadillo package. It also leverages the well-developed inferential methods and the visualization tools provided by the mgcv and mgcViz R packages.

In this talk, we will illustrate the modelling capabilities of the SCM package and we will provide useful insights into the data modelling process on several real-world applications. In particular, we will provide an overview of the main aspects of the model building and checking phases, as well as insights on how to interpret the model output. The SCM package is currently available at https://github.com/VinGioia90/SCM/.

Speakers

Vincenzo Gioia

Ph.D., University of Trieste

Vincenzo Gioia is a research assistant at the Department of Economic, Business, Mathematical and Statistical Sciences, University of Trieste, Italy.He received a PhD in Managerial and Actuarial Sciences from the University of Udine in 2023.His research interests range from asymptotic... Read More →

Wednesday July 10, 2024 13:30 - 15:00 CEST
TBD

Statistical modelling, Poster Session