Use Google Cloud's Python SDK to insert large datasets into Google BigQuery, enjoy the benefits of schema detection, and manipulating data programmatically.
Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.
Extract and move data between BigQuery and relational databases using PyBigQuery: a connector for SQLAlchemy.
Get the most out of Redshift by performance tuning your cluster and learning how to query your data optimally.
Perform SQL-like joins and aggregations on your PySpark DataFrames.
Working with Spark's original data structure API: Resilient Distributed Datasets.
Use Apache Airflow to build and monitor better data pipelines.
Become familiar with building a structured stream in PySpark using the Databricks interface.
Getting to know Apache Kafka: a horizontally scalable event streaming platform. Learn what makes Kafka critical to high-volume low-latency data pipelines.
Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.