Spark Online Training

About the Course

The MKR Infotech Apache Spark & Scala course will enable learners to understand how Spark enables in-memory data processing and runs much faster than Hadoop MapReduce. Learners learn about RDDs, different APIs which Spark offers such as Spark Streaming, MLlib, SparkSQL, GraphX. This MKR Infotech course is an integral part of a developer’s learning path.

Course Objectives

After completing the Apache Spark & Scala course, you will be able to:
1) Understand Scala and its implementation
2) Apply Control Structures, Loops, Collection, and more.
3) Master the concepts of Traits and OOPS in Scala
4) Understand functional programming in Scala
5) Get an insight into the big data challenges
6) Learn how Spark acts as a solution to these challenges
7) Install Spark and implement Spark operations on Spark Shell
8) Understand the role of RDDs in Spark
9) Implement Spark applications on YARN (Hadoop)
10) Stream data using Spark Streaming API
11) Implement machine learning algorithms in Spark using MLlib API
12) Analyze Hive and Spark SQL architecture
13) Implement SparkSQL queries to perform several computations
14) Understand GraphX API and implement graph algorithms
15) Implement Broadcast variable and Accumulators for performance tuning

Who should go for this Course?

This course is a foundation to anyone who aspires to embark into the field of big data and keep abreast of the latest developments around fast and efficient processing of ever-growing data using Spark and related projects. The course is ideal for:
1. Big Data enthusiasts
2. Software architects, engineers and developers
3. Data Scientists and analytics professionals

What are the pre-requisites for this Course?

A basic understanding of functional programming and object oriented programming will help. Knowledge of Scala will definitely be a plus, but is not mandatory.

Why learn Apache Spark?

In this era of ever growing data, the need for analyzing it for meaningful business insights is paramount. There are different big data processing alternatives like Hadoop, Spark, Storm and many more. Spark, however is unique in providing batch as well as streaming capabilities, thus making it a preferred choice for lightening fast big data analysis platforms.
  • WEEK 1:SCALA (Object Oriented and Functional Programming)
  • What is .Net?
  • Scala Background, Scala Vs Java and Basics
  • Interactive Scala – REPL, data types, variables, expressions, simple functions
  • Running the program with Scala Compiler
  • Explore the type lattice and use type inference
  • Define Methodsand Pattern Matching.
  • Scala Environment Set up
  • Scala set up on Windows.
  • Scala set up on UNIX
  • Functional Programming
  • What is Functional Programming.
  • Differences between OOPS and FPP
  • Collections (Very Important for Spark)
  • Iterating, mapping, filtering and counting
  • Regular expressions and matching with them.
  • Maps, Sets, group By, Options, flatten, flat Map
  • Word count, IO operations,file access, flatMap
  • Object Oriented Programming
  • Classes and Properties
  • Objects, Packaging and Imports.
  • Traits
  • Objects, classes, inheritance, Lists with multiple related types, apply
  • Integrations
  • What is SBT?
  • Integration of Scala in Eclipse IDE
  • Integration of SBT with Eclipse
  • Week: 2SPARK CORE
  • Batch versus real-time data processing
  • Introduction to Spark, Spark versus Hadoop
  • Architecture of Spark.
  • High-level Architecture: Workers,Cluster Managers,Driver Programs,Executors,Tasks
  • Coding Spark jobs in Scala
  • Data Sources
  • Exploring the Spark shell -> Creating Spark Context
  • RDD Programming
  • Operations on RDD: Transformations,  Actions, Loading Data and Saving Data. Key Value Pair RDD. Persistence
  • Lazy Operations: Action Triggers Computation
  • Caching
  • RDD Caching Methods,RDD Caching Is Fault Tolerant,Cache Memory Management
  • Spark Jobs
  • Shared Variables,Broadcast Variables,Accumulators
  • Configuring and running the Spark cluster.
  • Exploring to Multi Node Spark Cluster
  • Cluster management
  • Submitting Spark jobs and running in the cluster mode
  • Developing Spark applications in Eclipse
  • Tuning and Debugging Spark
  • Two Projects using Core Spark
  • Introduction of Spark Streaming
  • Architecture of Spark Streaming
  • Processing Distributed Log Files in Real Time
  • Introducing Spark Streaming
  • Spark Streaming Is a Spark Add-on
  • High-Level Architecture
  • Data Stream Sources, Receiver,   Destinations
  • Application Programming Interface (API)
  • StreamingContext
  • Basic Structure of a Spark Streaming Application
  • Discretized Stream (DStream)
  • Creating a DStream
  • Processing a Data Stream
  • Output Operations
  • Window Operation
  • Discretized streams RDD
  • Applying Transformations and Actions on Streaming Data
  • Integration with Flume and Kafka.
  • Integration with Cassandra
  • Monitoring streaming jobs
  • Use case with spark core and spark Streaming
  • Introduction to  Apache Spark SQL
  • Understanding the Catalyst optimizer
  • How it works…,Analysis, Logical plan optimization,Physical planning,Code generation
  • Creating HiveContext
  • Inferring schema using case classes
  • Programmatically specifying the schema
  • The SQL context
  • Importing and saving data
  • Processing the Text files,JSON and Parquet Files
  • Data Frames
  • Using Hive
  • Application Programming Interface (API)
  • Key Abstractions,Creating DataFrames,Processing Data Programmatically with SQL/HiveQL
  • Processing Data with the DataFrame API
  • Saving a DataFrame
  • Built-in Functions
  • Aggregate,Collection,Date/Time,Math,String,Window
  • UDFs and UDAFs
  • Interactive Analysis Example
  • Interactive Analysis with Spark SQL JDBC Server
  • Local Hive Metastore server
  • Loading and saving data using the Parquet format
  • Loading and saving data using the JSON format
  • Loading and saving data from relational databases
  • Loading and saving data from an arbitrary source
  • Integrating With Hive
  • Integrating With MySQl.
  • Introduction to Machine Learning
  • Types of Machine Learning.
  • Introduction to Apache Spark MLLib Algorithms.
  • Machine Learning Data Types and working with MLLib.
  • Regression and Classification Algorithms
  • Decision Trees in depth
  • Classification with SVM, Naïve Bayes
  • Clustering with K-Means
  • Getting Started with Machine Learning Using MLlib
  • Creating vectors
  • Creating a labeled point
  • Calculating summary statistics
  • Calculating correlation
  • Doing hypothesis testing
  • Creating machine learning pipelines using ML
  • Supervised Learning with MLlib – Regression
  • Using linear regression
  • Supervised Learning with MLlib – Classification
  • Doing classification using logistic regression
  • Doing classification using decision trees
  • Doing classification using Random Forests
  • Doing classification using Gradient Boosted Trees
  • Doing classification with Naïve Bayes
  • Unsupervised Learning with MLlib
  • Clustering using k-means
  • Dimensionality reduction with principal component analysis
  • Building the Spark server
  • Introducing Graphs
  • Introducing GraphX
  • Graph Processing with Spark
  • Undirected Graphs,Directed Graphs,Directed Multigraphs,Property Graphs
  • Introducing GraphX
  • GraphX API
  • Data Abstractions
  • Creating a Graph,Graph Properties,Graph Operators
  • Cluster Managers
  • Standalone Cluster Manager
  • Architecture
  • Setting Up a Standalone Cluster
  • Running a Spark Application on a Standalone Cluster
  • Apache Mesos
  • Architecture
  • Setting Up a Mesos Cluster
  • Running a Spark Application on a Mesos Cluster
  • YARN
  • Architecture
  • Running a Spark Application on a YARN Cluster
  • Learning Cassandra
  • Getting started with architecture
  • Installing Cassandra.
  • Communicating with Cassandra.
  • Creating a database.
  • Create a table
  • Inserting Data
  • Modelling Data.
  • Creating an Application with Web.
  • Updating and Deleting Data.
  • Introduction to Spark and Cassandra Connectors.
  • Spark With Cassandra -> Set up.
  • Creating Spark Context to connect the Cassandra
  • Creating Spark RDD on the Cassandra Data base.
  • Performing Transformation and Actions on the Cassandra RDD.
  • Running Spark Application in Eclipse to access the data in the Cassandra.
  • Introduction to Amazon Web Services.
  • Building 4 Node Spark Multi Node Cluster in Amazon Web Services.
  • Deploying in Production with Mesos and YARN.
  • Two REAL TIME PROJECTS Covering all the above concepts.

About instructors?

All our instructors are working professionals from the Industry and have at least 5-6  yrs of relevant experience in Spark Online Training. They are subject matter experts and are trained by MKR Infotech for providing online training so that participants get a great learning experience.

LIVE video streaming?

Yes, the classes are conducted via LIVE Video Streaming, where you can interact with the instructor. You can go through our sample class recording on this page and understand the quality of instruction and the way the class is conducted.

Backup Classes?

You can attend the missed session, in any other live batch. Please note, access to the course material will be available for lifetime once you have enrolled into the cours

Course Certification?

Yes, we provide our own Certification. At the end of your course, you will work on Spark . You will receive project specifications which will help you to create an Spark Program.Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading.If your project is not approved in 1st attempt, you can take extra assistance for any of your doubts to understand the concepts better and reattempt the Project free of cost.

Practicle Sessions?

For your practical work, we will help you set-up the Java environment on your system along with Spark Software. This will be a local access for you. The detailed step by step installation guides will be present in your LMS which will help you to install and set-up the environment. The support team will help you through the process.

Recorded sessions?

All your class recordings and other content like PPT’s and PDF’s etc. are uploaded on the LMS, to which you have a lifetime access.

Course Duration?

Spark online training course at MKR Infotech is an 45 hours course.

Android setUp Enivoronment?

Your system should have a 4GB RAM, a processor better than core 2 duo and operating system can be of 32bit or 64 bit.


You can give us a CALL at +91 9948382584 / 040 42036333 OR email at

MKR Infotech Certification Process:

At the end of your course, you will work on an android application . You will receive project specifications which will help you to create an android application.

Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading.

If your project is not approved in 1st attempt, you can take extra assistance for any of your doubts to understand the concepts better and reattempt the Project free of cost.

This Content will be placed soon