Favorites
b/udemy1byELKinG

Apache Spark 2 and 3 using Python 3 (Formerly CCA 175)

This post was published 2 years ago. Download links are most likely obsolete. If that's the case, try asking the uploader to re-upload.

Apache Spark 2 and 3 using Python 3 (Formerly CCA 175)

MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 9.34 GB | Duration: 28h 36m

Data Engineering using Apache Spark 2 or 3 using Python as Programming Language

What you'll learn
All the HDFS Commands that are relevant to validate files and folders in HDFS.
Quick recap of Python which is relevant to learn Spark
Ability to use Spark SQL to solve the problems using SQL style syntax.
Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.
Relevance of Spark Metastore to convert Dataframs into Temporary Views so that one can process data in Dataframes using Spark SQL.
Apache Spark Application Development Life Cycle
Apache Spark Application Execution Life Cycle and Spark UI
Setup SSH Proxy to access Spark Application logs
Deployment Modes of Spark Applications (Cluster and Client)
Passing Application Properties Files and External Dependencies while running Spark Applications

Basic programming skills using any programming language
Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
Minimum memory required based on the environment you are using with 64 bit operating system
4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM
Description
As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Data Frame APIs using Python as Programming language. This course used to be CCA 175 Spark and Hadoop Developer course for the preparation of Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Apache Spark 2 and 3 using Python 3 as it covers industry relevant topics beyond the scope of certification.

About Data Engineering

Data Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.

Course Details

Here is the high level outline of the topics related to this course.

Quick recap of Python

Data Engineering using Spark SQL

Let us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.

Getting Started with Spark SQL

Basic Transformations

Managing Tables - Basic DDL and DML

Managing Tables - DML and Partitioning

Overview of Spark SQL Functions

Windowing Functions

Data Engineering using Spark Data Frame APIs

Spark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.

Data Processing Overview

Processing Column Data

Basic Transformations - Filtering, Aggregations, and Sorting

Joining Data Sets

Windowing Functions - Aggregations, Ranking, and Analytic Functions

Spark Metastore Databases and TablesPlease note that the syllabus is recently changed and now the exam is primarily focused on Spark Data Frames and/or Spark SQL.

Apache Spark Application Development and Deployment Life Cycle

As Apache Spark based Data Engineers we should be familiar about Application Development and Deployment Lifecycle. As part of this section you will learn the complete life cycle of Development and Deployment Life cycle. It includes but not limited to productionizing the code, externalizing the properties, reviewing the details of Spark Jobs and many more.

Apache Spark Application Development Lifecycle

Spark Application Execution Life Cycle and Spark UI

Setup SSH Proxy to access Spark Application logs

Deployment Modes of Spark Applications

Passing Application Properties Files and External Dependencies

All the demos are given on our state of the art Big Data cluster. You can avail one-month complimentary lab access by reaching out to [email protected] with Udemy receipt.

Who this course is for
Any IT aspirant/professional willing to learn Data Engineering using Apache Spark
Python Developers who want to learn Spark to add the key skill to be a Data Engineer

Screenshots

Apache Spark 2 and 3 using Python 3 (Formerly CCA 175)

Homepage

without You and Your Support We Can’t Continue
Thanks for Buying Premium From My Links for Support
Click >>here & Visit My Blog Daily for More Udemy Tutorial. If You Need Update or Links Dead Don't Wait Just Pm Me or Leave Comment at This Post

No comments have been posted yet. Please feel free to comment first!

    Load more replies

    Join the conversation!

    Log in or Sign up
    to post a comment.