Learning Apache Spark

Utilize Apache Spark to build speedy data pipelines. Interact with your Spark cluster using PySpark, and get started using Databricks' notebook interface.

Learning Apache Spark with PySpark & Databricks

1: Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
Transforming PySpark DataFrames

2: Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
Cleaning PySpark DataFrames

3: Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
Structured Streaming in PySpark

4: Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
Working with PySpark RDDs

5: Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
Join and Aggregate PySpark DataFrames

6: Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.