MODULE 1: BIG DATA INTRODUCTION
Big Data Basics: Definitions, Trends
Data, information, knowledge: data types, data origin, dark data
Data-driven business models, use cases, success stories
Legal aspects: Data Ownership, Data Protection, Copyright, Contract design, Trade secrets
Takeaway: Basic knowledge of the topic and current business model innovations. Participants are subsequently able to initiate data-driven innovation projects in their own company.
MODULE 2: DATA SCIENCE Basics of statistics: terms, definitions, basic concepts
Data acquisition: Batch vs. Stream, Micro-batching, CAP
Data pre-processing and integration: ETL, Messaging queues, Outliers, Missing values
Data analysis: Machine Learning: Supervised & Unsupervised, Regression, Classification, Clustering, Bias
Data visualization: possibilities and variants
Takeaway: Overview in the field of data science and knowledge of relevant methods. Participants are able to decide in individual cases which practices are relevant for use cases and suitable for solving the problem or fulfilling the requirement.
MODULE 3: BIG DATA TECHNOLOGIES Basic technologies: Data Management Platform Lifecycle
Apache Hadoop Ecosystem: Hadoop & Ecosystem, HDFS, MapReduce, YARN
Apache Spark: Framework, Architecture, Libraries
NoSQL: Concepts, Column, Key-Value, Document, Graph
Tools and Suites: Open Source vs. Commercial, Enterprise Ready Tools, Cloud vs. On Premise
Takeaway: Knowledge of the current technology ecosystem. Ability to select suitable technologies and tools to solve the problem or to best meet the requirements.
Type of methodology: Combination of lecture and hands-on
Participants receive the certificate of attendance: Yes
Paid training activity for participants: Yes, for all
Participants prerequisite knowledge: No prerequisite knowledge