High Performance Data Analysis with R

Course/Event Essentials

Event/Course Start
Event/Course End
Event/Course Format
Mixed
Live (synchronous)

Venue Information

Country: Czech Republic
Venue Details: Click here

Training Content and Scope

Scientific Domain
Level of Instruction
Intermediate
Sector of the Target Audience
Research and Academia
Industry
Public Sector
HPC Profile of Target Audience
Data Scientists
Language of Instruction

Other Information

Supporting Project(s)
EuroCC/CASTIEL
Event/Course Description

Annotation

This course is focused on data analysis and modeling in R statistical programming language. The first day of the course will introduce how to approach a new dataset to get a better understanding of the data and its features. Modeling based on the modern set of packages jointly called TidyModels will be shown afterward. This set of packages strives to make the modeling in R as simple and as reproducible as possible.

The second day is focused on increasing the efficiency of computation by introducing Rcpp for seamless integration of C++ code into R code. A simple example of CUDA usage with Rcpp will be shown. In the afternoon, the section on parallelization of the code with future and/or MPI will be presented.

Benefits for the attendees, and what they will learn:

  • What are the first steps to understanding a new dataset
  • Prepare data for the modeling
  • Creation of the standard modeling workflow using modern R packages
  • To speed up code by using C++
  • Parallelization of the code and execution of the code on a cluster

Level

intermediate

Language

English

Prerequisites

Some experience with programming in R, knowledge of dplyr is an advantage.

Tutor

Tomáš Martinovič obtained his Ph.D. in computational sciences at IT4Innovations, VSB - Technical University of Ostrava in 2018. From 2015 to 2018 he worked in a team focused on the analysis of complex dynamical systems, where he worked on scalable implementations of algorithms from the field of nonlinear time series analysis. Since the start of 2022, he leads a team focused on machine learning/AI and operations research with the defined objective of research and transfer of knowledge in cooperation with industry.

Acknowledgments

                                      

This event was supported by the EuroCC project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. The JU receives support from the European Union’s Horizon 2020 research and innovation program and Germany, Bulgaria, Austria, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, the United Kingdom, France, the Netherlands, Belgium, Luxembourg, Slovakia, Norway, Switzerland, Turkey, Republic of North Macedonia, Iceland, Montenegro. This project has received funding from the Ministry of Education, Youth and Sports of the Czech Republic (ID: MC2101).