#Learning Apache Spark

Learning Apache Spark

Utilize Apache Spark to build speedy data pipelines. Interact with your Spark cluster using PySpark, and get started using Databricks' notebook interface.

Learning Apache Spark with PySpark & Databricks

1: Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
Transforming PySpark DataFrames

2: Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
Cleaning PySpark DataFrames

3: Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
Structured Streaming in PySpark

4: Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
Working with PySpark RDDs

5: Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
Join and Aggregate PySpark DataFrames

6: Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.