Reading and Writing to CSVs in Python

Playing with tabular data the native Python way.

Reading and Writing to CSVs in Python
Python Todd Birchard Sep 27th 3 min read

    Tables. Cells. Two-dimensional data. We here at Hackers & Slackers know how to talk dirty, but there's one word we'll be missing from our vocabulary today: Pandas.Before the remaining audience closes their browser windows in fury, hear me out. We love Pandas; so much so that tend to recklessly gunsling this 30mb library to perform simple tasks. This isn't always a wise choice. I get it: you're here for data, not software engineering best practices. We all are, but in a landscape where engineers and scientists already produce polarizing code quality, we're all just a single bloated lambda function away from looking like  idiots and taking a hit to our credibility. This is a silly predicament when there are plenty of built-in Python libraries at our disposable which work perfectly fine. Python’s built in CSV library can cover quite a bit of data manipulation use cases to achieve the same results of large scientific libraries just as easily.

    Basic CSV Interaction

    Regardless of whether you're reading or writing to CSVs, there are a couple lines of code which will stay mostly the same between the two.

    # read_csv.py
    import csv
    
    with open('hackers.csv', 'r', newline='') as myCsvFile:
         reader = csv.reader(myCsvFile, delimiter=',', quotechar='|')

    Before accomplishing anything, we've stated some critical things in these two lines of code:

    Iterating Rows

    An obvious use case you probably have in mind would be to loop through each row to see what sort of values we're dealing with. Your first inclination might be to do something like this:

    # read_csv.py
    import csv
    
    with open('hackers.csv', 'r', newline='') as myCsvFile:
        reader = csv.reader(myCsvFile, delimiter=',', quotechar='|')
    	for row in reader.readlines():
    		print('row = ', row)

    That's fine and all, but row in this case returns a simple list - this is obviously problem if you want to access the values of certain columns by column name, as opposed to numeric index (I bet you do). Well, we've got you covered:

    # read_csv.py
    import csv
    
    with open('hackers.csv', 'r', newline='') as myCsvFile:
        reader = csv.DictReader(myCsvFile)
    	for row in reader.readlines():
    		print(row['column_name_1'], row['column_name_2'])

    Changing reader to DictReader outputs a dictionary per CSV row, as opposed to a simple list. Are things starting to feel a little Panda-like yet?

    Bonus: Printing all Keys and Their Values

    Let's get a little weird just for fun. Since our rows are dict objects now, we can print our entire CSV as a series of dicts like so:

    # read_csv.py
    import csv
    
    with open('hackers.csv', 'r', newline='') as myCsvFile:
        reader = csv.DictReader(myCsvFile)
    	for row in loc_reader:
                for (k, v) in row.items():
    				print(k, ':', v)

    Skipping Headers

    As we read information from CSVs to be repurposed for, say, API calls, we probably don't want to iterate over the first row of our CSV: this will output our key values alone, which would be useless in this context. Consider this:

    # read_csv.py
    import csv
    
    with open('hackers.csv', 'r') as myCsvFile:
    	next(myCsvFile)
    	for row in myCsvFile.readlines():
    		print(row)

    Whoa! A different approach.... but somehow just as simple? In this case, we leave out reader altogether (which still works!) but more importantly, we introduce next(). next(myCsvFile) immediately skips to the next line in a CSV, so in our case, we simply skip line one before going into our For loop. Amazing.

    Writing to CSVs

    Writing to CSVs isn't much different than reading from them. In fact, almost all the same principles apply, where instances of the word "read" are more or less replaced with" write. Huh.

    # write_csv.py
    import csv
    
    with open('hackers.csv', 'w') as myCsvFile:
        columns = ['column_name_1', 'column_name_2']
        writer = csv.DictWriter(myCsvFile, fieldnames=columns)
    
        writer.writeheader()
        writer.writerow({'column_name_1': 'Mark', 'column_name_2': 'Twain'})
        writer.writerow({'column_name_1': 'Foo', 'column_name_2: 'Bar'})

    We're writing a brand new CSV here: 'hackers.csv' doesn't technically exist yet, but that doesn't stop Python from not giving a shit. Python knows what you mean. Python has your back.

    Here, we set our headers as a fixed list set by the column variable. This is a static way of creating headers, but the same can be done dynamically by passing the keys of a dict, or whatever it is you like to do.

    writer.writeheader() knows what we're saying thanks to the aforementioned fieldnames we passed to our writer earlier. Good for you, writer.

    But how do we write rows, you might ask? Why, with writer.writerow(), of course! Because we use DictWriter similarly to how we used DictReader earlier, we can map values to our CSV with simple column references. Easy.

    Todd Birchard's' avatar
    New York City Website
    Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.

    Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.