#Learning Apache Spark

Learning Apache Spark

Utilize Apache Spark to build speedy data pipelines. Interact with your Spark cluster using PySpark, and get started using Databricks' notebook interface.

1: Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
13 min read

2: Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
15 min read

3: Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
18 min read

4: Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
8 min read

5: Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
8 min read

6: Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.
7 min read