Working with XML tree data in Python

Make use of Python's native XML library to walk through and extract data

Life is filled with things we don't want to do; you're a developer so you probably understand this to a higher degree than most people. Sometimes we waste weeks of our lives thanks to an unreasonable and unknowledgeable stakeholder. Other times, we need to deal with XML trees.

At some point or another you're going to need to work with an API that returns information in XML format. "Sure," we might think, "I'll just import the standard Python XML package, pick up some syntax nuances, and be on my way." That's what I thought too. Today we're going to look at said library, the XML ElementTree library, and see firsthand why this might not be the case.

As always, the purpose of this is to hopefully save somebody pain. Feel free to stash this in your back pocket until XML becomes a problem for you; I'm doing the same.

It Can't be That Bad

Let's tackle a few things upfront to save a couple hours of confusion.

First off, you know how you've been dot notation to transverse object trees? Yeah, we can't do that with XML. If we're looking for the child of a parent, parent.child simply does not work (no, parent['child'] doesn't work either). I hope you like looping through trees.

Print the value of an item in an XML tree doesn’t show you that item's value, nor does it show children of that item. It instead prints <Element '{http://www.example.com/servicemodel/resources}ItemName' at 0x7fadcf3f83b8>, which is like a Python equivalent of Javascript's [object Object] in terms of usefulness. We can use .text to see the text value instead of an XML element;. Good luck on the other thing, though.

Going Green

Let's get this over with and plant some XML trees. I'm going to assume we're working from an API response here.

import xml.etree.ElementTree as ET

e = ET.fromstring(response.content)

If we were reading an XML file, we'd have to read the file and explicitly search for the root. Even though this doesn’t pertain to us, we should still be aware of this inconsistency to avoid future confusion:

e = ET.parse('data.xml')
root = e.getroot()

When we loop through this tree, we 'll need to be mindful of the 3 ways we can interact with XML data. Let's use this tree as an example:

<beer name="Bud Light">
    <flavor>Water</flavor>
    <type>Frat</type>
    <rank>0</rank>
</beer>
<beer name="PBR">
    <flavor>Urine</flavor>
    <type>Ironic</type>
    <rank>1</rank>
</beer>
<beer name="IPA">
    <flavor>Pretentious</flavor>
    <type>Hipster</type>
    <rank>2</rank>
</beer>
  • item.tag returns the name of the tag. Running this on the first item would return beer, as well as an associated URI.
  • item.attrib() returns the attributes of the selected item ({'name': 'Bud Light'})
  • item.text returns the value of text between the open and close tags, if exists.

Finding Stuff

There's a few ways to find the data we need in an XML tree, the most obvious of which would be searching by index. item[0][1] works, although I have a feeling index-based searching isn't going to be that useful for you.

The .find and .findall Methods

Our library has built in .find and .findall methods for us to work through a tree (returns wither one or all records, as you might have guessed). We search by element name as part of a loop:

for beer in e.findall('beer'):
    name = beer.get('name') # equivalent to .attrib() in this case
    flavor = beer.find('flavor').text
    print(name, " is ", flavor)
    
    
Bud Light is Water    
PBR is Urine
IPA is Pretentious

The .iter() Method

We can loop through all occurrences of a n element name by using .iter().

for beertype in e.findall('beer'):
    print(beertype)
    
 Frat
 Ironic
 Hipster

Using Some Sort of God-Awful Loop

If you're like me you may just skip reading all the documentation altogether, get obscenely frustrated, and create some garbage like this:

for beer in e:
    for properties in beer:
          if item.tag == "{http://www.example.com/beermodel/resources}Type":
               print(type)
                   
Frat
Ironic
Hipster

Now that's pretty awful, but I can't tell you had to live your life. You do you.

In Conclusion

Look, XML just sucks: don't use it if you don't have to. If you do, save yourself some time by coming back to this page.

Author image
New York City Website
Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.

Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.