The first part of understanding any type of software is taking a glance at its file structure. It may seem like an outlandish and redundant statement to make to a generation who grew up on GUIs. GitHub is essentially no more than a GUI for Git, so it’s unsurprisingly that one of the largest company to follow a similar business model recently bought Github for millions.
All that said, a question remains: how do we being to understand closed source applications? If we can’t see the structure behind an app, I suppose we’ll have to build this model ourselves.
Treelib is a Python library that allows you to create a visual tree hierarchy: a simple plaintext representation of parent-child relationships.
Aside from scraping and mapping then intellectual property of others, Treelib comes in handy in situations where we have access to flat information (like a database table) where rows actually relate to one another (such as monolithic content-heavy site).
Treelib prints results like this:
Harry ├── Bill │ └── n1 │ ├── n2 │ └── n3 └── Jane ├── Diane │ └── Mary └── Mark
It’s is a simple library, and only requires knowledge of a few lines of code in order to be used effectively. What’s more is we’re not simply spitting out flat useless data, but rather storing these node relationships in memory. If needed, the trees we build can be modified or used for other the future.
Where da Treez At?
Install the Treelib package:
pip install treelib
In your project, import Treelib:
# trees.py import from treelib import Node, Tree
Create a Tree with Parent Node
The first step in utilizing Treelib is to create a tree object. We need to give our tree a name - this is essentially creating the top-level node that all other nodes will stem from.
In createNode(x, y), X is the value which will be displayed in the node, while Y is the unique identifier for that node. Children will be added to this parent node by referencing the unique identifier.
Note that in trees created with TreeLib, unique identifiers may only occur once. Therefore it is good to follow a sort of GUI system for identifying nodes.
# tree.py # Create tree object tree = Tree() # Create the base node tree.create_node("Confluence", "confluence")
Create Child Nodes
The last necessary part in creating a tree is of course populating the resulting children.
We will once again use create_node to add additional nodes, but these nodes will be associated to parents via parent="x". This will locate existing nodes in the tree by id, and will associate these new nodes to that parent. This is why IDs must be unique per node in the tree.
# tree.py tree.create_node(spaceName, id, parent="confluence")
View the Tree
Finally, you'll want to view the fruits of your labor:
Way to go Johnny Appleseed, that’s pretty much the gist of it. There are additional features in the way Trees can be parse, and the way that nodes store additional data.
Check the official documentation for a full list of features.
If all you care about is printing the file structure of a current directory with zero interest in working with the actual data, you're in luck (at least on Mac, hell if I know anything about Windows).
Mac OSX comes with a brew package named tree which does just what we want:
~/$ brew install tree
Go ahead and explore the various features of tree, such as writing to files or even doing so on a schedule. For now, here's some basic usage:
tree -v -L 1 --charset utf-8