Hackers and Slackers: Data Science for Badasses

Data Science

25 Posts

Automagically Turn JSON into Pandas DataFrames

Let pandas do the heavy lifting for you when turning JSON into a DataFrame.

In his post about extracting data from APIs, Todd demonstrated a nice way to massage JSON into a pandas DataFrame. This method works great when our JSON response is flat, because dict.keys() only gets the keys on the first "level" of a dictionary. It gets a little trickier when our JSON starts to become nested though, as I experienced when working with Spotify's API via the Spotipy library. For example, take a look at a response from their https:

Python Author imageGraham Beckley July 28
Read

Trash Pandas: Messy, Convenient DB Operations via Pandas

(And a way to clean it up with SQLAlchemy)

Let's say you were continuing our task from last week: Taking a bunch of inconsistent Excel files and CSVs, and putting them into a database.

Let's say you've been given a new CSV that conflicts with some rows you've already entered, and you're told that these rows are the correct values.

Explain why Pandas' built-in method wouldn't be good

Pandas' built-in to_sql DataFrame method won't be useful here.  Remember, it writes as a block - if you set the

Pandas Author imageMatthew Alhonte July 23
Read

Data Could Save Humanity if it Weren't for Humanity

A compelling case for robot overlords.

A decade has passed since I stumbled into technical product development. Looking back, I've spent that time almost exclusively in the niche of data-driven products and engineering. While it seems obvious now, I realized in the 2000s that you could generally create two types of product: you could either build a (likely uninspired) UI for existing data, or you could build products which produced new data or interpreted existing data in a new useful way. Betting on the latter seemed

Data Author imageTodd Birchard July 20
Read

Lynx Roundup, July 17th

Scaling a Graph db, presenting survey data, badly presenting data

https://neo4j.com/blog/scale-out-neo4j-using-apache-mesos-and-dc-os/

https://www.r-bloggers.com/presenting-survey-data/

Quantifying stuff has a reputation for creating absurdities.  Some would say that other methods create an equal number of absurdities, except they're just way harder to see.  http://andrewgelman.com/2018/07/03/flaws-stupid-horrible-algorithm-revealed-made-numerical-predictions/

https://flowingdata.com/2018/06/28/why-people-make-bad-charts-and-what-to-do-when-it-happens/

https://github.com/solid/solid

Statistics Author imageMatthew Alhonte July 17
Read