Big Data for IT

Course Banner
Data Science Program Banner

Course Description

The main objective of this course is to familiarize the students with recent technological advancements in manipulating, storing, and analyzing big data. The emphasis of the course will be on practicing different components of Apache Spark, as the most important big data framework. Students will gain hands-on experience through multiple practices on Spark SQL, Spark ML (Machine Learning) API and Spark Streaming. In addition, topics in analyzing huge amount of textual content using Spark NLP and Elasticsearch technology will be covered as well.
 

Objectives

Objectives
  • Definition of big data, applications, motivating examples.
  • The position of Spark in Hadoop echo system, the main component of Apache Spark, and the limitation of using Hadoop in processing big data.
  • Scala Programming with practices on Spark applications
  • The main differences between RDDs, Dataframes, and Datasets in Spark
  • Use cases revealing the important role of Spark SQL in data exploration
  • Indexing and searching big textual content using Elasticsearch with practical exaples.
  • Spark ML library with use case scenarios on regression and recommendation systems.
  • Spark NLP in text preprocessing and analysis.
  • Introduction to Tensorflow and Keras to classify huge textual content and apply that for sentiment analysis.
  • Data Stream and Apache Kafka

Course Topics

Course Topics
  • Session-1: Introduction to Big Data and Apache Spark
  • Session-2: Big Data and Information Retrieval
  • Session-3: Machine Learning Using Apache Spark
  • Session-4: Big Data Frameworks and NLP
  • Session-5: Streaming in Apache Spark

Course Mode

Course Mode
Blended (Online and Face-to-Face)

Course Level

Course Level
Intermediate

Course Language

Course Language
English
Arabic

Course Category