#Learning Apache Spark

Learning Apache Spark

Utilize Apache Spark to build speedy data pipelines. Interact with your Spark cluster using PySpark, and get started using Databricks' notebook interface.

Learning Apache Spark with PySpark & Databricks
Post 1

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
Transforming PySpark DataFrames
Post 2

Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
Cleaning PySpark DataFrames
Post 3

Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
Structured Streaming in PySpark
Post 4

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
Working with PySpark RDDs
Post 5

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
Join and Aggregate PySpark DataFrames
Post 6

Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.