So far in our SQLAlchemy journey, we've covered managing database connections and model creation. So... how do we actually extract the data we want from our database?

SQLAlchemy's ORM query API simplifies the way we write database queries. Instead of writing raw SQL queries, we can construct queries on our SQLAlchemy session by chaining together methods to retrieve data. We're going to dive into SQLAlchemy's extensive query API to get an idea of all the ways we can query our data.

This tutorial will assume you know how to create an SQLAlchemy session. We'll also assume you have some data models you'd like to work with to perform your queries. If you'd like to catch up, check out the previous posts in this series.

Basic Query Syntax

Let's quickly become familiar with the basic structure of SQLAlchemy's query API. We always query on an SQLAlchemy session and pass the name of the model we'd like to query:

records = session.query([MODEL_NAME]).\
			[RESULTS_TYPE]()

The last part of our query determines how many rows to return, and the nature of how those rows are determined:

  • all() will return all records which match our query.
  • first() returns the first record in order of appearance.
  • one() returns a single value (not necessarily the first).
  • scalar() returns a single value if one exists, None if no values exist, or raises an exception if multiple records are returned.
  • get([VALUE(S)]) searches against a model's primary key to return rows where the primary key is equal to the value provided. get() also accepts tuples in the event that multiple foreign keys should be searched. Lastly, get() can also accept a dictionary and will return rows where the columns (dictionary keys) match the values provided.

To create more complex queries, we'd add to our query by chaining methods on our original query:

records = session.query([MODEL_NAME]).\
	    [METHOD_1].\
            [METHOD_2].\
            [RESULTS_TYPE]()

Viewing Returned Rows

If multiple records are returned by a query, you'll have to loop through them to see the results. This isn't necessary when a single record is returned with first():

records = session.query(Customer).all()
for record in records:
    pp.pprint(record)

The SQLAlchemy ORM will return an instance of a class by default, which means the above will result in the following output:

<Customer model 1>
<Customer model 2>
<Customer model 3>
<Customer model 4>
<Customer model 5>
<Customer model 6>
<Customer model 7>
<Customer model 8>
<Customer model 9>
<Customer model 10>

If you're looking to get dictionaries instead, use the built-in __dict__ method:

records = session.query(Customer).all()
for record in records:
    pp.pprint(record.__dict__)

This instead returns dictionary objects for each row:

{   '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x7f98c8221748>,
    'email': 'kpaladini5i@senate.gov',
    'first_name': 'Kenna',
    'id': 199,
    'join_date': datetime.datetime(2019, 4, 19, 0, 0),
    'last_name': 'Paladini',
    'preferred_language': 'Bulgarian'}
{   '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x7f9918192d68>,
    'email': 'rlebrun5j@narod.ru',
    'first_name': 'Rriocard',
    'id': 200,
    'join_date': datetime.datetime(2015, 6, 8, 0, 0),
    'last_name': 'Le Brun',
    'preferred_language': 'Khmer'},
...

Of course, you could also create your own object instead to receive only the columns you want/need:

records = session.query(Customer).all()
for record in records:
    recordObject = {'name': record.name,
                    'position': record.position,
                    'team_name': record.team.name,
                    'team_city': record.team.city}
    print(recordObject)

This outputs something a bit cleaner:

{   'email': 'kpaladini5i@senate.gov',
    'first_name': 'Kenna',
    'join_date': datetime.datetime(2019, 4, 19, 0, 0),
    'last_name': 'Paladini',
    'preferred_language': 'Bulgarian'}
{   'email': 'rlebrun5j@narod.ru',
    'first_name': 'Rriocard',
    'join_date': datetime.datetime(2015, 6, 8, 0, 0),
    'last_name': 'Le Brun',
    'preferred_language': 'Khmer'},
...

Filtering Results

Probably the most common method you'll use on a query is the filter() method. filter() is the equivalent of a SQL WHERE clause to return only rows which match the criteria we want:

records = session.query(Customer).filter(Customer.first_name == 'Carl').all()

filter_by()

We could actually write the above query using the filter_by() method instead like so:

records = session.query(Customer).filter_by(first_name='Carl').all()

Unlike filter(), filter_by() accepts keyword arguments (note the difference in syntax here: filter() checks a conditional against a column object whereas filter_by() finds columns which match the arguments we pass). filter_by() can only search for exact values and serves as a kind of shorthand for simple filtering queries.

like()

We can do more than filter on simple conditionals. SQLAlchemy has a like() method which works in an equivalent manner to SQL's LIKE:

records = session.query(Customer).filter(Customer.first_name.like('J%')).all()

As expected, this will give us all rows where the customer's first name starts with a J:

{   'email': 'jpugsley9@netvibes.com',
    'first_name': 'Jarid',
    'join_date': datetime.datetime(2017, 10, 11, 0, 0),
    'last_name': 'Pugsley',
    'preferred_language': 'Burmese'}
{   'email': 'jdymockek@is.gd',
    'first_name': 'Jeanna',
    'join_date': datetime.datetime(2017, 11, 13, 0, 0),
    'last_name': 'Dymocke',
    'preferred_language': 'Malayalam'}
...

High-level Query Methods

In addition to filter(), there are a few basic methods we should absolutely be familiar with. Each of these corresponds to SQL keywords you're probably familiar with:

  • limit([INTEGER]): Limits the number of rows to a maximum of the number provided.
  • order_by([COLUMN]): Sorts results by the provided column.  
  • offset([INTEGER]): Begins the query at row n.

Performing Joins & Unions

We've touched on JOINs a bit previously, but we're about to kick it up a notch. We have two data models we're working with: one for customers, and one for orders. Each customer

records = session.query(Customer).join(Order, Order.customer_id == Customer.id).all()
for record in records:
    recordObject = {'first_name': record.first_name,
                    'last_name': record.last_name,
                    'email': record.email,
                    'preferred_language': record.preferred_language,
                    'join_date': record.join_date,
                    'orders': []}
    for order in record.order:
        order = {'order_price': order.price,
                 'currency':  order.currency,
                 'purchase_date':  order.purchase_date,
                 'product':  order.product}
        recordObject['orders'].append(order)
        pp.pprint(recordObject)

We perform our JOIN using the join() method. The first parameter we pass is the data model we'll be joining with on the "right." We then specify what we'll be joining "on": the customer_id column of our order model, and the id column of our customer model.

Our outer loop gives us each customer, and our inner loop adds each individual order to the appropriate customer. Check out an example record:

{   'email': 'jtinline16@arizona.edu',
    'first_name': 'Jerry',
    'join_date': datetime.datetime(2016, 10, 27, 0, 0),
    'last_name': 'Tinline',
    'preferred_language': 'Icelandic',
    'orders': [{'currency': 'IDR',
                'order_price': 34.24,
                'product': 'Beer - Corona',
                'purchase_date': datetime.datetime(2019, 5, 5, 0, 0)},
               {'currency': 'GEL',
                'order_price': 25.75,
                'product': 'Creamers - 10%',
                'purchase_date': datetime.datetime(2019, 1, 27, 0, 0)}]}

Our friend Jerry here has two orders: one for some Coronas, and another for creamers. Get at it, Jerry.

Outer JOINs

In addition to simple JOINs, we can perform outer JOINs using the same syntax:

records = session.query([MODEL_1]).outerjoin([MODEL_2]).all()

Unions

We can perform UNIONs and UNION ALLs as well:

unionRecords = [MODEL_1].union([MODEL_2])

To perform a union all, simply replace union() with union_all()!

Aggregate Functions and Stats

As with all SQL-like query languages, we can perform some aggregate stats as well. The following are available to us:

  • count([COLUMN]): Counts the number of records in a column.
  • count(distinct([COLUMN])): Counts the distinct number of records in a column.
  • sum([COLUMN]): Adds the numerical values in a column.

Here's how we'd perform a query which counts the values in a column:

from sqlalchemy import func


records = session.query(func.count(Customer.first_name)).all()
for record in records:
    print(record)

Which outputs:

(200,)

This query can easily be modified to only count distinct values:

from sqlalchemy import func
from sqlalchemy import distinct


records = session.query(func.count(distinct(Customer.first_name))).all()
for record in records:
    print(record)

Using Group_by()

Of course, we can use the group_by() method on queries based around aggregates as well. group_by() works similarly to what we'd expect from SQL and Pandas:

records = session.query(func.count(Customer.first_name)).group_by(Customer.first_name).all()

Mutations

We've spent an awful lot of time going over how to extract data from our database, but haven't talked about modifying our data yet! The last item on our agenda today is taking a look at how to add, remove, and change records using the SQLAlchemy ORM.

Inserting Rows

The first way we can add data is by using the add() method. add() expects an instance of a class (data model specifically) to be passed, and will create a new database row as a result:

newModel = Customer(first_name='Todd',
                        last_name='Birchard',
                        email='fake@example.com',
                        preferred_language='English',
                        join_date=datetime.now())
session.add(newModel)
session.commit()

An alternative way to add data is by using the insert() method. Unlike add(), insert() is actually called on an SQLAlchemy Table object and doesn't rely on receiving a data model. insert() is not part of the ORM:

ins = [TABLE].insert().values(first_name='Todd',
                              last_name='Jack Jones',
                              email='fake@example.com',
                              preferred_language='English',
                              join_date=datetime.now())

Updating

Building on the syntax of insert(), we can drop in the update() method to change the values of an existing record. We chain in the where() method to specify which rows should be updated:

update = [TABLE].update().\
                where([TABLE].c.name == 'Todd').\
                values(email='newemail@example.com')

Deleting

On any query we execute, we can append the delete() method to delete all rows which are contained in that query (be careful!). The below deletes all records where the first_name column contains a value of "Carl":

session.query(Customer).filter(Customer.first_name == 'Carl').delete()

delete() accepts the synchronize_session parameter which determines how deletions should be handled:

  • False won't perform the delete until the session is committed.
  • 'fetch' selects all rows to be deleted and removes matched rows.
  • 'evaluate' will evaluate the objects in the current session to determine which rows should be removed.

Never Stop Exploring

There's a lot we've left out for the sake of simplicity. There are plenty of cool methods left to explore, like the correlate() method for instance. You're armed with enough to be dangerous in SQLAlchemy now, but I encourage anybody to look over the query documentation and find the cool things we didn't speak to in detail here.