Hackers and Slackers
Hackers and Slackers

Hackers and Slackers

  • About
  • Series
  • Join
  • Donate
  • Log in
  • Subscribe
  • Python
  • Software
  • DevOps
  • Architecture
  • Data Engineering
  • Pandas
  • Data Analysis
  • SQL
  • Data Science
  • REST APIs
  • JavaScript
  • Flask
  • AWS
  • NodeJS
  • Google Cloud
  • Apache
  • Frontend
  • MySQL
  • Data Vis
  • NoSQL

Spark

Process data at scale with Apache Spark and PySpark. Build pipelines to batch process or stream data in real-time.
Join and Aggregate PySpark DataFrames

Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.
Todd Birchard
Todd Birchard
Jun 24, 2019 • 7 mins
Spark
Working with PySpark RDDs

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
Todd Birchard
Todd Birchard
Jun 6, 2019 • 8 mins
Spark
Structured Streaming in PySpark

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
Todd Birchard
Todd Birchard
May 13, 2019 • 8 mins
Spark
Cleaning PySpark DataFrames

Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
Todd Birchard
Todd Birchard
Apr 27, 2019 • 18 mins
Spark
Transforming PySpark DataFrames

Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
Todd Birchard
Todd Birchard
Apr 26, 2019 • 15 mins
Spark
Learning Apache Spark with PySpark & Databricks

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
Todd Birchard
Todd Birchard
Apr 25, 2019 • 13 mins
Spark

Tags

Python Software DevOps Data Engineering Architecture Pandas Excel Data Analysis SQL Data Science REST APIs JavaScript Flask Code Snippet Corner AWS NodeJS Google Cloud Frontend MySQL Apache Data Vis BI NoSQL GraphQL Spark PostgreSQL ExpressJS ETL Pipelines Tableau PowerBI SQLAlchemy GatsbyJS Powerpivot Machine Learning Automation Big Data Atlassian Mapbox Golang Scraping JAMStack Data Warehouses Plotly Docker Concurrency Hashicorp Django ReactJS SaaS Products Frameworks Java FastAPI Terraform Microsoft

Newsletter

Create an account to receive occasional updates and interact with the community.

Series'

Data Analysis with Pandas 11
Build Flask Apps 11
Learning Apache Spark 6
Google Cloud Architecture 6
Mastering SQLAlchemy 4
GraphQL Tutorials 4
Welcome to SQL 4
Working with MySQL 4
Mapping Data with Mapbox 3
Web Scraping With Python 2
Python Concurrency with Asyncio 2
Getting Started with Django 2
Hackers and Slackers

Community of hackers obsessed with data science, data engineering, and analysis. Openly pushing a pro-robot agenda.

Navigation

    • About
    • Series
    • Join
    • Donate

Series'

  • Data Analysis with Pandas
  • Build Flask Apps
  • Learning Apache Spark
  • Google Cloud Architecture
  • Mastering SQLAlchemy
  • GraphQL Tutorials
  • Welcome to SQL
  • Working with MySQL
  • Mapping Data with Mapbox
  • Web Scraping With Python
  • Python Concurrency with Asyncio
  • Getting Started with Django

Authors

  • Todd Birchard
  • Matthew Alhonte
  • Max Mileaf
  • Ryan Rosado
  • Graham Beckley
  • David Aquino
  • Paul Armstrong
  • Dylan Castillo
©2023 Hackers and Slackers, All Rights Reserved.