Tables. Cells. Two-dimensional data. We here at Hackers & Slackers know how to talk dirty, but there's one word we'll be missing from our vocabulary today: Pandas.Before the remaining audience closes their browser windows in fury, hear me out. We love Pandas; so much so that tend to recklessly gunsling this 30mb library to perform simple tasks. This isn't always a wise choice. I get it: you're here for data, not software engineering best practices. We all are, but in a landscape where engineers and scientists already produce polarizing code quality, we're all just a single bloated lambda function away from looking like idiots and taking a hit to our credibility. This is a silly predicament when there are plenty of built-in Python libraries at our disposable which work perfectly fine. Python’s built in CSV library can cover quite a bit of data manipulation use cases to achieve the same results of large scientific libraries just as easily.
Basic CSV Interaction
Regardless of whether you're reading or writing to CSVs, there are a couple lines of code which will stay mostly the same between the two.
Before accomplishing anything, we've stated some critical things in these two lines of code:
- All interactions with our CSV will only be valid as long as they live within the
with.open
block (comparable to managing database connections). - We'll be interacting with a file in our directory called
hackers.csv
, for which we only need read (orr
) permissions - We create a
reader
object, which is again comparable to managing databasecursors
if you're familiar. - We have the ability to set the delimiter of our CSV (a curious feature, considering the meaning of C in the acronym CSV.
Iterating Rows
An obvious use case you probably have in mind would be to loop through each row to see what sort of values we're dealing with. Your first inclination might be to do something like this:
That's fine and all, but row
This case returns a simple list - this is obviously a problem if you want to access the values of certain columns by column name, as opposed to numeric index (I bet you do). Well, we've got you covered:
Changing reader
to DictReader
outputs a dictionary per CSV row, as opposed to a simple list. Are things starting to feel a little Panda-like yet?
Bonus: Printing all Keys and Their Values
Let's get a little weird just for fun. Since our rows are dict objects now, we can print our entire CSV as a series of dicts like so:
Skipping Headers
As we read information from CSVs to be repurposed for, say, API calls, we probably don't want to iterate over the first row of our CSV: this will output our key values alone, which would be useless in this context. Consider this:
Whoa! A different approach.... but somehow just as simple? In this case, we leave out reader
altogether (which still works!) but more importantly, we introduce next()
. next(myCsvFile)
immediately skips to the next line in a CSV, so in our case, we simply skip line one before going into our For loop. Amazing.
Writing to CSVs
Writing to CSVs isn't much different than reading from them. In fact, almost all the same principles apply, where instances of the word "read" are more or less replaced with" write. Huh.
We're writing a brand new CSV here: 'hackers.csv' doesn't technically exist yet, but that doesn't stop Python from not giving a shit. Python knows what you mean. Python has your back.
Here, we set our headers as a fixed list set by the column
variable. This is a static way of creating headers, but the same can be done dynamically by passing the keys
of a dict, or whatever it is you like to do.
writer.writeheader()
knows what we're saying thanks to the aforementioned fieldnames
we passed to our writer earlier. Good for you, writer.
But how do we write rows, you might ask? Why, with writer.writerow()
, of course! Because we use DictWriter
similar to how we used DictReader
earlier, we can map values to our CSV with simple column references. Easy.