• Course
  • Vendor

This course is offered in support of the Python programming language but can also be offered for R or Java with advance notice and planning.

  • Course Start Date: 2021-06-07
  • Time: 10:00:00 - 18:00:00
  • Duration: 3 Day(s)
  • Location: Virtual
  • Delivery Method(s): Virtual Instructor Led

Course Outline

Pre-Requisites

This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment. Take Before: Students should have attended the course(s) below, or should have basic skills in these areas: TTPS4800      Introduction to Python Programming TTSQLB3        Introduction to SQL (Basic familiarity is needed, not in-depth SQL skills)

Lessons

Course Overview

 

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends.  With advanced libraries like Mahout and MLib for Machine Learning, GraphX or Neo4J for rich data graph processing as well as access to other NOSQL data stores, Rule engines and other Enterprise components, Spark is a lynchpin in modern Big Data and Data Science computing.

 

Geared for experienced developers, Spark Developer | Introduction to Spark for Big Data, Hadoop & Machine Learning provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions.  Students will leave this course armed with the skills they require to begin working with Spark in a practical, real world environment.

This course is offered in support of the Python programming language but can also be offered for R or Java with advance notice and planning. Our team will work with you to coordinate the languages, tools and environment that will work best for your organization and needs. Please inquire for details.

Course Objectives

This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. Throughout the course students will be led through a series of progressively advanced topics, where each topic consists of lecture, group discussion, comprehensive hands-on lab exercises, and lab review.

This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools.  Working in a hands-on learning environment, students will explore:

  • Spark Ecosystem
  • Spark Shell
  • Spark Data structures (RDD, DataFrame, Dataset)
  • Spark SQL
  • Modern data formats and Spark
  • Spark API
  • Spark & Hadoop & Hive
  • Spark ML overview
  • GraphX
  • Time-permitting: Spark Streaming
  • Time-permitting: Optional Capstone Workshop (Time-Permitting)

Course Agenda

Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We’ll work with you to tune this course and level of coverage to target the skills you need most.

Spark Introduction

  • Big data, Hadoop, Spark
  • Spark concepts and architecture
  • Spark components overview
  • Labs: installing and running Spark

The first look at Spark

  • Spark shell
  • Spark web UIs
  • Analyzing dataset – part 1
  • Labs: Spark shell exploration

Spark Data structures

  • Partitions
  • Distributed execution
  • Operations: transformations and actions
  • Labs: Unstructured data analytics using RDDs

Caching

  • Caching overview
  • Various caching mechanisms available in Spark
  • In memory file systems
  • Caching use cases and best practices
  • Labs: Benchmark of caching performance

DataFrames and Datasets

  • DataFrames Intro
  • Loading structured data (JSON, CSV) using DataFrames
  • Using schema
  • Specifying schema for DataFrames
  • Labs: DataFrames, Datasets, Schema

Spark SQL

  • Spark SQL concepts and overview
  • Defining tables and importing datasets
  • Querying data using SQL
  • Handling various storage formats: JSON, Parquet, ORC
  • Labs: querying structured data using SQL; evaluating data formats

Spark and Hadoop

  • Hadoop Primer: HDFS, YARN
  • Hadoop + Spark architecture
  • Running Spark on Hadoop YARN
  • Processing HDFS files using Spark
  • Spark & Hive

Spark API

  • Overview of Spark APIs in Scala / Python
  • The lifecycle of a Spark application
  • Spark APIs
  • Deploying Spark applications on YARN
  • Labs: Developing and deploying a Spark application

Spark ML Overview

  • Machine Learning primer
  • Machine Learning in Spark: MLib / ML
  • Spark ML overview (newer Spark2 version)
  • Algorithms overview: Clustering, Classifications, Recommendations
  • Labs: Writing ML applications in Spark

GraphX

  • GraphX library overview
  • GraphX APIs
  • Create a Graph and navigating it
  • Shortest distance
  • Pregel API
  • Labs: Processing graph data using Spark

Time Permitting Topics

Spark Streaming

  • Streaming concepts
  • Evaluating Streaming platforms
  • Spark streaming library overview
  • Streaming operations
  • Sliding window operations
  • Structured Streaming
  • Continuous streaming
  • Spark & Kafka streaming
  • Labs: Writing spark streaming applications

Workshop

  • Attendees will work on solving real-world data analysis problems using Spark

Course Materials

Student Materials: Each participant will receive a Student Guide with course notes, code samples, software tutorials, step-by-step written lab instructions, diagrams and related reference materials and resource links. Students will also receive the project files (or code, if applicable) and solutions required for the hands-on work.

Hands-On Setup Made Simple! Our dedicated tech team will work with you to ensure our ‘easy-access’ cloud-based course environment is accessible, fully-tested and verified as ready to go well in advance of the course start date, ensuring a smooth start to class and effective learning experience for all participants. Please inquire for details and options.

Related Courses

TAKE BEFORE

Students should have incoming skills equivalent to the course(s) below, or should have attended this / these as a prequisite:

OTHER RELATED COURSES

Below are a few of the popular Related Courses we offer in this space. Please see the complete Course Catalog for additional options and titles.

Cancellation Policy

TBD

Training Location

Virtual Instructor Led Online Training
your home or offce

your city, your province
your country   

About Trivera Technologies LLC

x

Trivera Technologies is a woman-owned IT training education firm that has provides engaging, comprehensive technical training, consulting, mentoring and courseware development and licensing services to hundreds of organizations globally, on an annual basis. Our collaborative, skills-focused, consultative approach to developing and delivering learning helps organizations bring technical teams of all skills-levels up to speed with the latest technologies, tools, skills and best practices surrounding all aspects of application development, from concept through completion, all targeted to their specific needs and goals. 

We offer skills-focused training events onsite, online, or in blended solutions for distributed teams, from small groups to large-scale, worldwide enterprise organizations.  Services include assessment, development and delivery of targeted learning solutions for new-hire cohort programs; skills immersion boot camps and code camps; skills assessment and skills-gap training; enterprise-wide reskilling, upskilling and new-skilling programs; extensive public schedule offerings; mentoring and coaching and much more. 

Areas of specialty include: application development & programming; modern web development and design; CyberSecurity & secure coding; Data Science / AI / Machine Learning / Deep Learning; Python; DevOps; Cloud; Software architecture, design, testing and development; Agile development & Scrum; Networking & Sys Admin; O/S and Tools; project management; business information and data; IT professional skills; ITIL; COMPTIA and much more. 

Training Provider Rating

No Reviews Yet

Course Reviews

No Reviews Yet