Tutorials for Apache Big Data technologies including Apache Spark, Apache Kafka, Apache Airflow, and more critical tools for data engineers.
Perform SQL-like joins and aggregations on your PySpark DataFrames.
Working with Spark's original data structure API: Resilient Distributed Datasets.
Use Apache Airflow to build and monitor better data pipelines.
Become familiar with building a structured stream in PySpark using the Databricks interface.
An overview of how Kafka works, as well as equivalent message brokers.
Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.