Data Engineering

Collect and transform data on a large scale. Build data pipelines, work with a horizontally scalable architecture, or simply scrape and collect data.

Transforming PySpark DataFrames

Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.

Learning Apache Spark with PySpark & Databricks

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.

Building an ETL Pipeline: From JIRA's REST API to SQL

Building an ETL Pipeline: From JIRA's REST API to SQL

Build a pipeline which extracts raw data from the JIRA's Cloud API, transforms it, and loads the data into a SQL database.

Working With GraphQL Fragments and Mutations

Working With GraphQL Fragments and Mutations

Make your GraphQL queries more dynamic with Fragments, plus get started with Mutations.

The Might GraphQL Orb Serving Data Across the Universe

Building a Client For Your GraphQL API

Now that we have an understanding of GraphQL queries and API setup, it's time to get that data.

Writing Your First GraphQL Query

Writing Your First GraphQL Query

Begin to structure complex queries against your GraphQL API.

Welcome to SQL: Modifying Databases and Tables

Brush up on SQL fundamentals such as creating tables, schemas, and views.

Downcast Numerical Data Types with Pandas

Downcast Numerical Data Types with Pandas

Using an Example Where We Downcast Numerical Columns.

From CSVs to Tables: Infer Data Types From Raw Spreadsheets

From CSVs to Tables: Infer Data Types From Raw Spreadsheets

The quest to never explicitly set a table schema ever again.

Psycopg2: PostgreSQL & Python the Old Fashioned Way

Psycopg2: PostgreSQL & Python the Old Fashioned Way

Connect to a PostgreSQL database and execute queries in Python using the Psycopg2 library.