Data Engineering

The systematic collection and transformation of data via the creation of tools and pipelines.

Using Amazon Redshift as your Data Warehouse

Using Amazon Redshift as your Data Warehouse

Get the most out of Redshift by performance tuning your cluster and learning how to query your data optimally.
Data Warehouses
12 min read
July 30
Performing Macro Operations on PySpark DataFrames

Performing Macro Operations on PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.
Spark
7 min read
June 24
Working with PySpark RDDs

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
Spark
8 min read
June 07
Manage Data Pipelines with Apache Airflow

Manage Data Pipelines with Apache Airflow

Use Apache Airflow to build and monitor better data pipelines.
Apache
13 min read
June 03
Structured Streaming in PySpark

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
Spark
10 min read
May 14
DataFrame Transformations in PySpark (Continued)

DataFrame Transformations in PySpark (Continued)

Continuing to apply transformations to Spark DataFrames using PySpark.
Spark
8 min read
May 07
Becoming Familiar with Apache Kafka and Message Queues

Becoming Familiar with Apache Kafka and Message Queues

An overview of how Kafka works, as well as equivalent message brokers.
Apache
6 min read
May 04
Executing Basic DataFrame Transformations in PySpark

Executing Basic DataFrame Transformations in PySpark

Using PySpark to apply transformations to real datasets.
Spark
9 min read
April 29