CentraleSupélec/SDI/Metz

Home > The courses >

Computational models of big data

This course aims the students learn to develop powerful data analysis applications, using Spark environment on distributed platforms (clusters and clouds). Distributed file systems like HDFS will be studied, as well as programming model and algorithmics of Spark extended map-reduce, and "scaling" criteria and metric will be introduced. Finally, many experiments will be conducted during labs on clusters and clouds, and the designed and implemented solutions will be evaluated according to the performances reached on use cases, and to their capability to "scale".

Emergence of Big Data technologies : motivations, industrial needs, main players.
Hadoop software stack, architecture and operation of its distributed file system (HDFS)
Spark distributed computing architecture and deployment mechanism
Spark programming model, Spark’s extended map-reduce algorithmics
Optimization of algorithms and codes on distributed architectures
Architecture et environnement d’analyse de données sur Cloud
Experiments and performance measures
Performance criteria and metrics

Course sequencing