Working with PySpark RDDs

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
PowerPivot 3: Managing the Data Model

PowerPivot 3: Managing the Data Model

Analyzing ginormous files with Microsoft PowerPivot.
Manage Data Pipelines with Apache Airflow

Manage Data Pipelines with Apache Airflow

Use Apache Airflow to build and monitor better data pipelines.
Recasting Low-Cardinality Columns as Categoricals

Recasting Low-Cardinality Columns as Categoricals

Downcast strings in Pandas to their proper data-types using HDF5.
PowerPivot 2: What's the Deal with Delimiters?

PowerPivot 2: What's the Deal with Delimiters?

Working with large flat files in PowerPivot.
Removing Duplicate Columns in Pandas

Removing Duplicate Columns in Pandas

Dealing with duplicate column names in your Pandas DataFrame.
Using Hierarchical Indexes With Pandas

Using Hierarchical Indexes With Pandas

Use Panda's Multiindex to make your data work harder for you.
Managing Flask Session Variables

Managing Flask Session Variables

Using Flask-Session and Flask-Redis to store user session variables.