Spark

Process data at scale with Apache Spark and PySpark. Build pipelines to batch process or stream data in real-time.

Join and Aggregate PySpark DataFrames

Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.

Working with PySpark RDDs

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.

Structured Streaming in PySpark

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.

Cleaning PySpark DataFrames

Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.

Transforming PySpark DataFrames

Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.

Learning Apache Spark with PySpark & Databricks

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.