Apache

Tutorials for Apache’s suite of big data products. Includes Apache Hadoop, Apache Spark, Apache Kafka, and other critical technologies for any data engineer.

Performing Macro Operations on PySpark DataFrames

Performing Macro Operations on PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.
Working with PySpark RDDs

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
Manage Data Pipelines with Apache Airflow

Manage Data Pipelines with Apache Airflow

Use Apache Airflow to build and monitor better data pipelines.
Structured Streaming in PySpark

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
DataFrame Transformations in PySpark (Continued)

DataFrame Transformations in PySpark (Continued)

Continuing to apply transformations to Spark DataFrames using PySpark.
Becoming Familiar with Apache Kafka and Message Queues

Becoming Familiar with Apache Kafka and Message Queues

An overview of how Kafka works, as well as equivalent message brokers.
Executing Basic DataFrame Transformations in PySpark

Executing Basic DataFrame Transformations in PySpark

Using PySpark to apply transformations to real datasets.
Cleaning PySpark DataFrames

Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques, ranging from dropping problematic rows to selecting important columns.
Learning Apache Spark with PySpark & Databricks

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
From CSVs to Tables: Infer Data Types From Raw Spreadsheets

From CSVs to Tables: Infer Data Types From Raw Spreadsheets

The quest to never explicitly set a table schema ever again.