Hackers and Slackers: Data Science for Badasses


12 Posts
Python's king data analysis library.

Importing Excel Datetimes Into Pandas

Pandas & Excel, Part 1

Different file formats are different!  For all kinds of reasons!

A few months back, I had to import some Excel files into a database. In this process I learned so much about the delightfully unique way Excel stores dates & times!  

The basic datetime will be a decimal number, like 43324.909907407404.  The number before the decimal is the day, the number afterwards is the time.  So far, so good - this is pretty common for computers.  The date is

Pandas Author imageMatthew Alhonte August 13

Lazy Pandas and Dask

Picking Low-Hanging Fruit With Dask

Ah, laziness.  You love it, I love it, everyone agrees it's just better.

Flesh-and-blood are famously lazy.  Pandas the package, however, uses Eager Evaluation.  What's Eager Evaluation, you ask?  Is Pandas really judgey, hanging out on the street corner and being fierce to the style choices of people walking by?  Well, yes, but that's not the most relevant sense in which I mean it here.  

Eager evaluation means that once you call pd.read_csv(), Pandas immediately jumps to read

Pandas Author imageMatthew Alhonte August 06

All That Is Solid Melts Into Graphs

Reshaping Pandas dataframes with a real-life example, and graphing it with Altair

Last few Code Snippet Corners were about using Pandas as an easy way to handle input and output between files & databases.  Let's shift gears a little bit!  Among other reasons, because earlier today I discovered a package that exclusively does that, which means I can stop importing the massive Pandas package when all I really wanted to do with it was take advantage of its I/O modules.  Check it out!

So, rather than the entrances & exits, let's

Python Author imageMatthew Alhonte July 30

Automagically Turn JSON into Pandas DataFrames

Let pandas do the heavy lifting for you when turning JSON into a DataFrame.

In his post about extracting data from APIs, Todd demonstrated a nice way to massage JSON into a pandas DataFrame. This method works great when our JSON response is flat, because dict.keys() only gets the keys on the first "level" of a dictionary. It gets a little trickier when our JSON starts to become nested though, as I experienced when working with Spotify's API via the Spotipy library. For example, take a look at a response from their https:

Python Author imageGraham Beckley July 28

Trash Pandas: Messy, Convenient DB Operations via Pandas

(And a way to clean it up with SQLAlchemy)

Let's say you were continuing our task from last week: Taking a bunch of inconsistent Excel files and CSVs, and putting them into a database.

Let's say you've been given a new CSV that conflicts with some rows you've already entered, and you're told that these rows are the correct values.

Explain why Pandas' built-in method wouldn't be good

Pandas' built-in to_sql DataFrame method won't be useful here.  Remember, it writes as a block - if you set the

Pandas Author imageMatthew Alhonte July 23

A Dirty Way of Cleaning Data (ft. Pandas & SQL)

Code Snippet Corner ft. Pandas & SQL

Warning The following is FANTASTICALLY not-secure.  Do not put this in a script that's going to be running unsupervised.  This is for interactive sessions where you're prototyping the data cleaning methods that you're going to use, and/or just manually entering stuff.  Especially if there's any chance there could be something malicious hiding in the data to be uploaded.  We're going to be executing formatted strings of SQL unsanitized code.  Also, this will lead to LOTS of silent failures, which

Pandas Author imageMatthew Alhonte July 16

Extracting Massive Datasets in Python

Abusing APIs for all they’re worth

Taxation without representation. Colonialism. Not letting people eat cake. Human beings rightfully meet atrocities with action in an effort to change the worked for the better. Cruelty by mankind justifies revolution, and it is this writer's opinion that API limitations are one such cruelty.

The data we need and crave is stashed in readily available APIs all around us. It's as though we have the keys to the world, but that power often cones with a few caveats:

  • Your "key"
Python Author imageTodd Birchard July 04

Using Pandas to Make Dealing With DBs Less Of a Hassle

Code Snippet Corner

Manually opening and closing cursors? Iterating through DB output by hand? Remembering which function is the actual one that matches the Python data structure you're gonna be using?

There has to be a better way!

There totally is.

One of Pandas' most useful abilities is easy I/O. Whether it's a CSV, JSON, an Excel file, or a database - Pandas gets you what you want painlessly. In fact,I'd say that even if you don't have the spare bandwidth

Python Author imageMatthew Alhonte July 03

Using Pandas with AWS Lambda

Pandas n Lambdas mixtape dropping soon

In one corner we have Pandas: Python's beloved data analysis library. In the other, AWS: the unstoppable cloud provider we're obligated to use for all eternity. We should have know this day would come.

While not the prettiest workflow, uploaded Python package dependencies for usage in AWS Lambda is typically straightforward. We install the packages locally to a virtual env, package them with our app logic, and upload a neat CSV to Lambda. In some cases this doesn't always work:

AWS Author imageTodd Birchard June 21

Dropping Rows Using Pandas

Clean your datasets the fun Pythonic way.

When cleaning datasets, one of the (many) things you'll want to do is rid yourself of filthy, filthy data. Regardless of the reason, every dataset has its fair share of empty, poorly formatted, or simply irrelevant data entries. In some cases it's best to simply do away with these rows and ensure that only the fittest survive.

Dropping Empty Rows or Columns

If you're simply looking to drop rows or columns containing empty data, you're in luck: Pandas' dropna() method

Python Author imageTodd Birchard April 18