Collect and transform data on a large scale. Build data pipelines, work with a horizontally scalable architecture, or simply scrape and collect data.
Use Google Cloud's Python SDK to insert large datasets into Google BigQuery, enjoy the benefits of schema detection, and manipulating data programmatically.
Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.
Extract and move data between BigQuery and relational databases using PyBigQuery: a connector for SQLAlchemy.
Get the most out of Redshift by performance tuning your cluster and learning how to query your data optimally.
Perform SQL-like joins and aggregations on your PySpark DataFrames.
Working with Spark's original data structure API: Resilient Distributed Datasets.
Use Apache Airflow to build and monitor better data pipelines.
Become familiar with building a structured stream in PySpark using the Databricks interface.