Are you interested in being part of the wider roll out for Spark, our new AI-powered, learning chatbot? Register your interest here.
Apache Spark Programming with Databricks
-
Describe the architecture and core components of Apache Spark
-
Implement data transformations using the DataFrame API
-
Optimise Spark queries for performance improvements
-
Apply partitioning strategies to manage large datasets efficiently
-
Use Structured Streaming to process real-time data
-
Implement Delta Lake to enhance data reliability and performance.
Overview
Off the shelf (OTS)
This course provides an in-depth exploration of Apache Spark and Delta Lake on Databricks, focusing on the core architectural components of Spark, the DataFrame API, and Structured Streaming. Participants will learn how to efficiently read, transform, and aggregate data using SparkSQL and the DataFrame API. The course also covers user-defined functions (UDFs), query optimization, partitioning strategies, and the advantages of Delta Lake for improving data pipelines. By the end of the course, learners will be able to execute streaming queries and understand how Delta Lake enhances real-time data processing.
Participants should have:
• Familiarity with Python and fundamental programming concepts, including data types, lists, dictionaries, variables, functions, loops, conditional statements, exception handling, accessing classes, and using third-party libraries.
• Basic knowledge of SQL, including writing queries using SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, and JOIN.
This course is designed for:
• Data engineers and data scientists looking to enhance their Spark programming skills.
• Developers who want to leverage Apache Spark and Delta Lake on Databricks.
• Professionals working with large-scale data processing and real-time analytics.
This course includes:
• Practical exercises using Apache Spark on Databricks.
• Hands-on labs to implement and optimise Spark queries.
• Guided projects focusing on real-time data processing with Structured Streaming and Delta Lake.
This course is not specifically aligned with an exam.
Delivery method
Face to face
Virtual
Course duration
14 hours
Competency level
Working

Delivery method
-
Face to face
-
Virtual
Course duration
14 hours
Competency level
-
Working
