Tables. Cells. Two-dimensional data. We here at Hackers & Slackers know how to talk dirty, but there's one word we'll be missing from our vocabulary today: Pandas.Before the remaining audience closes their browser windows in fury, hear me out. We love Pandas; so much so that tend to recklessly gunsling this 30mb library to perform simple tasks. This isn't always a wise choice. I get it: you're here for data, not software engineering best practices. We all are, but in a landscape where engineers and scientists already produce polarizing code quality, we're all just a single bloated lambda function away from looking like  idiots and taking a hit to our credibility. This is a silly predicament when there are plenty of built-in Python libraries at our disposable which work perfectly fine. Python’s built in CSV library can cover quite a bit of data manipulation use cases to achieve the same results of large scientific libraries just as easily.

Basic CSV Interaction

Regardless of whether you're reading or writing to CSVs, there are a couple lines of code which will stay mostly the same between the two.

import csv

with open('hackers.csv', 'r', newline='') as myCsvFile:
     reader = csv.reader(myCsvFile, delimiter=',', quotechar='|')
read_csv.py

Before accomplishing anything, we've stated some critical things in these two lines of code:

  • All interactions with our CSV will only be valid as long as they live within the with.open block (comparable to managing database connections).
  • We'll be interacting with a file in our directory called hackers.csv, for which we only need read (or r) permissions
  • We create a reader object, which is again comparable to managing database cursors if you're familiar.
  • We have the ability to set the delimiter of our CSV (a curious feature, considering the meaning of C in the acronym CSV.

Iterating Rows

An obvious use case you probably have in mind would be to loop through each row to see what sort of values we're dealing with. Your first inclination might be to do something like this:

import csv


with open('hackers.csv', 'r', newline='') as myCsvFile:
    reader = csv.reader(myCsvFile, delimiter=',', quotechar='|')
	for row in reader.readlines():
		print('row = ', row)
read_csv.py

That's fine and all, but row This case returns a simple list - this is obviously a problem if you want to access the values of certain columns by column name, as opposed to numeric index (I bet you do). Well, we've got you covered:

import csv


with open('hackers.csv', 'r', newline='') as myCsvFile:
    reader = csv.DictReader(myCsvFile)
	for row in reader.readlines():
		print(row['column_name_1'], row['column_name_2'])
read_csv.py

Changing reader to DictReader outputs a dictionary per CSV row, as opposed to a simple list. Are things starting to feel a little Panda-like yet?

Bonus: Printing all Keys and Their Values

Let's get a little weird just for fun. Since our rows are dict objects now, we can print our entire CSV as a series of dicts like so:

import csv


with open('hackers.csv', 'r', newline='') as myCsvFile:
    reader = csv.DictReader(myCsvFile)
	for row in loc_reader:
            for (k, v) in row.items():
				print(k, ':', v)
read_csv.py

Skipping Headers

As we read information from CSVs to be repurposed for, say, API calls, we probably don't want to iterate over the first row of our CSV: this will output our key values alone, which would be useless in this context. Consider this:

import csv


with open('hackers.csv', 'r') as myCsvFile:
	next(myCsvFile)
	for row in myCsvFile.readlines():
		print(row)
read_csv.py

Whoa! A different approach.... but somehow just as simple? In this case, we leave out reader altogether (which still works!) but more importantly, we introduce next(). next(myCsvFile) immediately skips to the next line in a CSV, so in our case, we simply skip line one before going into our For loop. Amazing.

Writing to CSVs

Writing to CSVs isn't much different than reading from them. In fact, almost all the same principles apply, where instances of the word "read" are more or less replaced with" write. Huh.

import csv


with open('hackers.csv', 'w') as myCsvFile:
    columns = ['column_name_1', 'column_name_2']
    writer = csv.DictWriter(myCsvFile, fieldnames=columns)
    writer.writeheader()
    writer.writerow(
        {'column_name_1': 'Mark', 'column_name_2': 'Twain'}
    )
    writer.writerow(
        {'column_name_1': 'Foo', 'column_name_2: 'Bar'}
    )
write_csv.py

We're writing a brand new CSV here: 'hackers.csv' doesn't technically exist yet, but that doesn't stop Python from not giving a shit. Python knows what you mean. Python has your back.

Here, we set our headers as a fixed list set by the column variable. This is a static way of creating headers, but the same can be done dynamically by passing the keys of a dict, or whatever it is you like to do.

writer.writeheader() knows what we're saying thanks to the aforementioned fieldnames we passed to our writer earlier. Good for you, writer.

But how do we write rows, you might ask? Why, with writer.writerow(), of course! Because we use DictWriter similar to how we used DictReader earlier, we can map values to our CSV with simple column references. Easy.