Connecting Pandas to a Database with SQLAlchemy

Databases. You love them, you need them, but let's face it... you've already mastered working with them. There's only so much fun to be had in the business of opening database connections, pulling rows, and putting them back where they came from. Wouldn't it be great if we could skip the boring stuff and work with data?

Pandas and SQLAlchemy are a mach made in Python heaven. They're individually amongst Python's most frequently used libraries. Together they're greater than the sum of their parts, thanks to Pandas' built-in SQLAlchemy integration.

Create a SQLAlchemy Connection

As you might imagine, the first two libraries we need to install are Pandas and SQLAlchemy. We need to install a database connector as our third and final library, but the library you need depends on the type of database you'll be connecting to. If you're connecting to MySQL I recommend installing PyMySQL ( pip install pymysql ). If you're connecting to Postgres, go with Psycopg2 ( pip install psycopg2 ). The only time we'll use either of these libraries is when we establish a database connection with SQLAlchemy.

SQLAlchemy URIs

A URI (or connection string), is simply a string containing the information needed to connect to something like a database. Here's an example:

postgres+psycopg2://myuser:mypassword@hackersdb.example.com:5432/mydatabase

Postgres database connection URI

The first part of our string is postgres+psycop2, which is a combination of our target database type and our connector. If you're connecting to MySQL, replace this with mysql+pymysql. In case the rest of the URI isn't self-explanatory, here's a breakdown of each piece of this string:

[DB_FLAVOR]+[DB_PYTHON_LIBRARY]://[USERNAME]:[PASSWORD]@[DB_HOST]:[PORT]/[DB_NAME]

SQLAlchemy URI structure

SQLAlchemy Engines

An "engine" is an object used to connect to databases using the information in our URI. Once we create an engine, downloading and uploading data is as simple as passing this object to Pandas:

from os import environ
from sqlalchemy import create_engine


db_uri = environ.get('SQLALCHEMY_DATABASE_URI')
self.engine = create_engine(db_uri, echo=True)

Configure SQLAlchemy engine

Besides accepting a URI, create_engine() can accept a few optional kwargs as well. I've decided to set echo=True, which will log every query our SQL database executes to the terminal. If your database requires SSL, you may need to utilize the connect_args parameter to pass a certificate.

Believe it or not, we're already done dealing with database setup! From here forward we're able to pull or upload data into Pandas via easy one-liners.

Create a SQL Table From a DataFrame

For our first trick, let's create a SQL table from data in a CSV. I downloaded a CSV containing NYC job data which I'll be using to demonstrate:

job_id	agency	business_title	job_category	salary_range_from	salary_range_to	salary_frequency	work_location	division	job_description	created_at	updated_at
87990	DEPARTMENT OF BUSINESS SERV.	Account Manager		42405	65485	Annual	110 William St. N Y	Strategy & Analytics	Division of Economic & Financial Opportunity (DEFO) Mayor Michael R. Bloomberg and SBS are committed to encouraging a competitive and diverse New York City business environment by promoting the growth and success of minority and women-owned companies. New York Cityâ€™s Minority and Women-owned Business Enterprise (M/WBE) program is designed to help these historically underserved groups become more competitive. JOB DESCRIPTION The Account Manager will provide a range of supportive services to City agency purchasing personnel and private-sector prime contractors to help them comply with M/WBE utilization goals under Local Law 129. The Account Manager will oversee a portfolio of several City agencies and will be responsible for the monitoring and oversight of the strategies which have been broadly laid out for agencies to increase M/WBE utilization. The primary objective for the Account Manager is to help agencies increase the number and dollar value of contracts awarded to M/WBE at various contract levels. Specifically, the Account Manager will seek to bring agencies into compliance with the Citywide utilization goals and other metrics used for measuring agency performance. Each account manager will be responsible for procurements of all sizes and methods for their respective agencies. The Account Manager will report to the Director of Procurement Initiatives. Account Manager Model Each agency has very specific vendor requirements and needs, as well as obstacles to increasing M/WBE Utilization. The account managers will learn what is procured, by what method, how frequently, and how to get more M/WBEs participating in the process. The account manager will leverage their procurement contacts to work directly with program end users to identify needs and obstacles and create appropriate solutions. The Account Managerâ€™s responsibilities will include the following: 1. Research agency procurement practices, requirements, in order to connect M/WBE firms with future procurement opportunities 2. Work with the agency senior staff to implement strategies to increase M/WBE participation 3. Introduce new M/WBE firms to agency staff 4. Assist agency staff with tools to improve performance, including monitoring prime contractor performance relating to M/WBE subcontractor utilization goals 5. Inform agency senior staff of their performance against goals on a regular basis 6. Assist program and procurement staff with program implementation questions as they arise 7. Produce analysis of agency contracts and M/WBE program performance 8. Coordinate resources for agencies as necessary, including networking events, training sessions, etc.	2011-06-24T00:00:00.000	2011-06-24T00:00:00.000
97899	DEPARTMENT OF BUSINESS SERV.	EXECUTIVE DIRECTOR, BUSINESS DEVELOPMENT		60740	162014	Annual	110 William St. N Y	Tech Talent Pipeline	The New York City Department of Small Business Services (SBS) is a vibrant, client-centered agency whose mission is to serve New Yorkâ€™s small businesses, jobseekers and commercial districts. SBS makes it easier for companies in New York City to start, operate, and expand by providing direct assistance to business owners, supporting commercial districts, promoting financial and economic opportunity among minority- and women-owned businesses, preparing New Yorkers for jobs, and linking employers with a skilled and qualified workforce. SBS continues to reach for higher professional standards through innovative systems, new approaches to government, and a strong focus on its employees. NYC Business Solutions is a set of services offered by the NYC Department of Small Business Services to help businesses start, operate, and expand in New York City. All services are offered at no cost and are available to businesses of any size and at any stage. Services can be accessed through the cityâ€™s 7 NYC Business Solutions Centers, 6 Career Centers, and 3 Sector Centers located throughout the 5 boroughs. In 2010, NYC Business Solutions provided services to over 10,000 business customers located throughout the five boroughs. The Executive Director of Business Development will lead agency efforts to acquire new business customers and to increase the number of NYC Business Solutions services utilized by existing customers. The Executive Director is responsible for developing the business development strategy for all NYC Business Solutions services and ensuring effective implementation through the management of internal and external sales resources. The Executive Director will provide direct supervision to 3 Senior Account Managers and will oversee 16 sales teams in the field (70 total field staff). The Executive Director will also lead the professional development program for all staff engaging directly with business customers and is responsible for collaborating with the NYC Business Solutions marketing team to improve brand recognition throughout the five boroughs. Specific responsibilities include: Develop and execute the Agencyâ€™s business development strategy for all NYC Business Solutions services Identify business targets and coordinate sales efforts across sales teams to ensure efficient usage of system-wide resources Manage sales teams to meet their quarterly and annual sales goals through quarterly business development planning meetings and regular check-ins Directly supervise three Senior Account Managers and oversee approximately 70 field staff located at the cityâ€™s 7 NYC Business Solutions Centers, 6 Career Centers, and 3 Sector Centers Track and analyze system sales activities using Oracle CRM On Demand Collaborate with NYC Business Solutions and Workforce1 program management teams to link sales activity and service delivery Identify and create sales tools that enable sales teams to more effectively sell NYC Business Solutions services Design and implement the professional development program for all business facing staff Develop curriculum and lead sales training sessions for new and existing staff Organize and lead industry knowledge sessions with sector experts to deepen sales teamsâ€™ understanding of business prospects Lead sector focused working groups to build industry expertise and disseminate best practices across sales teams Increase awareness of NYC Business Solutions Services throughout the five boroughs of New York City Assist the marketing team to develop brochures, flyers, and advertisements to promote NYC Business Solutions services and events Participate in panel discussions and deliver public presentations at events Establish partnerships with non-profit organizations, government agencies, and the private sector to generate referrals for NYC Business Solutions services Preferred Skills: The ideal candidate will have demonstrated success developing and implementing business driven programs and will have exhibited: Strong management and leadership skills Experience planning, implementing and managing projects involving diverse stakeholders Extensive private or public sector experience in business development and sales The ability to organize and drive projects to timely completion The ability to actively listen and synthesize disparate viewpoints into a shared vision The ability to handle complexity in fast-paced entrepreneurial environments The ability to communicate effectively with a diverse array of internal and external stakeholders The ability to combine attention to detail with a clear understanding of the big picture Outstanding presentation, writing, and communications skills Outstanding analytical, problem solving, presentation and creative thinking abilities Excellent MS Excel, Word and Power Point skills Experience with Oracle CRM On Demand, SalesForce, or other customer relationship management tool preferred but not required Foreign language skills a plus	2012-01-26T00:00:00.000	2012-01-26T00:00:00.000
132292	NYC HOUSING AUTHORITY	Maintenance Worker - Technical Services-Heating Unit	Maintenance & Operations	51907.68	54580.32	Annual	Heating Mgt-Operations	Management Services Department	Under direct supervision, assist in the routine maintenance operation and repair of public buildings, structures, and the equipment they contain; perform related work. Responsibilities include, but are not limited to the following: 1. Perform minor and major repairs to boilers, burners, vacuum tank, pumps, motors and other various heating equipment citywide. 2. Survey and report on existing conditions of heating equipment. 3. Assist Heating Superintendent with periodic reports. 4. Assist skill trades staff. 5. Provide assistance during emergencies. 6. Respond to all heating/hot water service disruptions. Candidates selected must be available to work and travel throughout the five boroughs; and will be required to work rotating shifts, including holidays and weekends. 8:00 AM - 4:00 PM 4:00 PM - 12:00 AM 12:00 AM - 8:00 AM	2013-10-24T00:00:00.000	2013-12-12T00:00:00.000
132292	NYC HOUSING AUTHORITY	Maintenance Worker - Technical Services-Heating Unit	Maintenance & Operations	51907.68	54580.32	Annual	Heating Mgt-Operations	Management Services Department	Under direct supervision, assist in the routine maintenance operation and repair of public buildings, structures, and the equipment they contain; perform related work. Responsibilities include, but are not limited to the following: 1. Perform minor and major repairs to boilers, burners, vacuum tank, pumps, motors and other various heating equipment citywide. 2. Survey and report on existing conditions of heating equipment. 3. Assist Heating Superintendent with periodic reports. 4. Assist skill trades staff. 5. Provide assistance during emergencies. 6. Respond to all heating/hot water service disruptions. Candidates selected must be available to work and travel throughout the five boroughs; and will be required to work rotating shifts, including holidays and weekends. 8:00 AM - 4:00 PM 4:00 PM - 12:00 AM 12:00 AM - 8:00 AM	2013-10-24T00:00:00.000	2013-12-12T00:00:00.000
133921	NYC HOUSING AUTHORITY	Temporary Painter	Maintenance & Operations	35	35	Hourly	DMP-Contract & Analysis Unit	Dept of Management & Planning	Responsibilities of selected candidates will include, but are not limited to the following: 1. Prepare, fill and prime surfaces for painting. 2. Mix paint components and match colors. 3. Apply paint with a brush, roller or spray gun. 4. Apply plaster to surfaces. 5. Work on and from ladders, platforms and scaffolds 6. Rig lines and scaffolds.	2014-01-09T00:00:00.000	2014-01-08T00:00:00.000
133921	NYC HOUSING AUTHORITY	Temporary Painter	Maintenance & Operations	35	35	Hourly	DMP-Contract & Analysis Unit	Dept of Management & Planning	Responsibilities of selected candidates will include, but are not limited to the following: 1. Prepare, fill and prime surfaces for painting. 2. Mix paint components and match colors. 3. Apply paint with a brush, roller or spray gun. 4. Apply plaster to surfaces. 5. Work on and from ladders, platforms and scaffolds 6. Rig lines and scaffolds.	2014-01-09T00:00:00.000	2014-01-08T00:00:00.000
137433	DEPT OF HEALTH/MENTAL HYGIENE	Contract Analyst	Finance, Accounting, & Procurement	50598	85053	Annual	42-09 28th Street	HIV Administration	** OPEN TO PERMANENT PROCUREMENT ANALYSTS ONLY. YOU MUST CLEARLY STATE YOUR CIVIL SERVICE STATUS ON YOUR COVER LETTER. ALL OTHER CANDIDATES WILL NOT BE CONSIDERED. The mission of the Bureau of HIV/ AIDS Prevention and Control (the Bureau) is to prevent new infections and reduce morbidity and mortality among HIV-infected persons. The goals of the Administration Unit of the Bureau is to provide the necessary administrative support and coordination in the areas of contract administration; procurement; human resources management; fiscal administration; and contracts management to enable the program areas (HIV Prevention, HIV Care & Treatment, HIV Testing, Housing, and HIV Epidemiology and Field Services Unit) to function effectively and efficiently in achieving their respective missions. DUTIES WILL INCLUDE BUT NOT BE LIMITED TO: --Development and management of complex contracts and purchases for City, State, and/or Federal grants. -- Manage the lifecycle of a grant budget(s), including, but not limited to, development of grant application budget and associated documents for submission to funders; expenditure tracking; final financial reporting. -- Manage tracking systems for Personnel Services (PS) and Other Than Personnel Services (OTPS), Prepare statistical and narrative reports on a regular basis. -- Review and reconcile budget reports from the DOHMHâ€™s multiple fiscal systems and proprietary tracking systems. -- Analyze complex financial reports and fiscal data., Participate in planning, design and implementation of computer systems to track all aspects of grant budgets. -- Act as liaison to other DOHMH offices in order to effectively execute budget activities, including: budget, claiming, internal accounting, grants. -- Conduct ad-hoc reports or analysis as requested by the Assistant Director for Fiscal Administration and/or the Director of Administration.	2013-12-09T00:00:00.000	2013-12-09T00:00:00.000
138531	DEPT OF ENVIRONMENT PROTECTION	Associate Chemist	Health Public Safety, Inspections, & Enforcement	50623	75083	Annual	96-05 Horace Harding Expway	DWOC Labs-Lefrak	Working in the Distribution Water Quality Operations Division, Organic Section, Queens, New York, the Associate Chemist II will report to the Organic Lab Section Supervisor and will perform and oversee complex testing of drinking water samples for trace organic contaminants by one or more gas chromatographic methods. This person will be responsible for maintaining the capability of the organic section to perform testing by their assigned methods, including: training analysts, maintaining stocks of equipment and supplies, diagnosing instrument problems and performing routine maintenance on gas chromatographs. The Associate Chemist will also draft or review analytical reports, Standard Operating Procedures (SOPs), and final reports to external agencies for his/her assigned methods. They shall participate in analysis of proficiency samples from NYS Department of Health Environmental Laboratory Approval Program (ELAP) and other Proficiency Test (PT) suppliers.	2013-12-20T00:00:00.000	2014-07-25T00:00:00.000
151131	NYC HOUSING AUTHORITY	Cost Estimating Manager	Engineering, Architecture, & Planning	90000	110000	Annual	CP Cap Plan-Technical Planning	Capital Planning Dept	Reporting to the Deputy Director of Technical Planning, with latitude for independent judgment, the Cost Estimating Manager is responsible for oversight and management of the day-to-day operations of the Cost Estimating Section of Capital Planning. Responsibilities include, but are not limited to the following: 1. Supervise cost estimating staff. 2. Plan, assign and review work of subordinates. 3. Oversee the preparation of various levels of cost estimates for the planning, design, construction, remodeling or repair of buildings, mechanical systems or various installations. 4. Monitor quantity takeoffs and cost evaluations in relation to contract drawings, specifications or contract changes. 5. Engage in and monitor complex research; conduct studies and investigations related to estimating functions. 6. Ensure price library meets with current cost variations. 7. Review contract estimates prepared by professional consultants and make recommendations for acceptance, rejection or modification. 8. Evaluate staff performance and take corrective action if necessary. NOTE: Employees serving in the title of or who meet the qualification requirements for Administrative Project Manager will be considered.	2014-06-25T00:00:00.000	2014-07-07T00:00:00.000

We're going to create a DataFrame from this CSV, as we've done a million times before:

import pandas as pd


jobs_df = pd.read_csv('data/nyc-jobs.csv')

Read and format data from CSV.

We now have a DataFrame ready to be saved as a SQL table! We can accomplish this with a single method built in to all DataFrames called to_sql(). As the name suggests, to_sql() allows us to upload our DataFrame to a SQL database as a SQL table. Let's see it in action:

from sqlalchemy.types import Integer, Text, String, DateTime

...

jobs_df.to_sql(
    'nyc_jobs',
    engine,
    if_exists='replace',
    index=False,
    chunksize=500,
    dtype={
        "job_id": Integer,
        "agency": Text,
        "business_title": Text,
        "job_category":  Text,
        "salary_range_from": Integer,
        "salary_range_to": Integer,
        "salary_frequency": String(50),
        "work_location": Text,
        "division/work_unit": Text,
        "job_description": Text,
        "posting_date": DateTime,
        "posting_updated": DateTime
    }
)

Save DataFrame to SQL table

There's quite a bit happening here! to_sql() attempts to create a table with the name nyc_jobs in the database associated with engine. These two positional arguments are technically the only required parameters we need to pass, but it's a very good idea to take advantage of Pandas' ability to be more specific in table creation:

if_exists: This argument specifies what to do in the situation where a database table with the name nyc_jobs already exists in the database. By default, Pandas will throw an error, which isn't very useful unless we only care about creating this table the first time. Passing replace to this argument will drop the existing table and replace it with the data & data types associated with the current DataFrame. append will keep the existing table the same, but append all rows in the DataFrame to the existing table.
schema: Accepts the name of the Postgres schema to save your table in.
index: When True, the resulting table will honor your DataFrame's index to create a column with the appropriate key in your database.
chunksize: Passing a number to this parameter will attempt to upload your data as a stream of "chunks" n rows at a time, as opposed to all at once. Passing a chunksize is useful for particularly large datasets which may be at risk of interruption during upload.
dtype: Passing a Python dictionary to dtype lets us explicitly set the datatypes of each column in our database, where each key is the column name and each value is the data type (I highly recommend doing this). You'll notice we import various data types from sqlalchemy.types, which we then associate with each column's name. If the target SQL table doesn't exist yet, passing these datatypes will ensure that each SQL column is created with the appropriate data constraint, as opposed to each column rendered simply as "text." If a target SQL table does exist, these data types must match the types of the existing table, or you'll receive a SQL error during the upload.

Since we set SQLAlchemy's echo parameter to True, I'm able to see exactly what my database does with this DataFrame:

2020-06-11 23:49:21,082 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2020-06-11 23:49:21,082 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,396 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2020-06-11 23:49:21,396 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,432 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2020-06-11 23:49:21,432 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,470 INFO sqlalchemy.engine.base.Engine show collation where "Charset" = 'utf8mb4' and "Collation" = 'utf8mb4_bin'
2020-06-11 23:49:21,470 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,502 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2020-06-11 23:49:21,502 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,523 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2020-06-11 23:49:21,523 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,537 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_bin AS anon_1
2020-06-11 23:49:21,537 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,587 INFO sqlalchemy.engine.base.Engine DESCRIBE "nyc_jobs"
2020-06-11 23:49:21,588 INFO sqlalchemy.engine.base.Engine {}
2020-06-11 23:49:21,654 INFO sqlalchemy.engine.base.Engine ROLLBACK
2020-06-11 23:49:21,691 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE nyc_jobs (
	job_id INTEGER, 
	agency TEXT, 
	business_title TEXT, 
	job_category TEXT, 
	salary_range_from INTEGER, 
	salary_range_to INTEGER, 
	salary_frequency VARCHAR(50), 
	work_location TEXT, 
	division TEXT, 
	job_description TEXT, 
	created_at DATETIME, 
	updated_at DATETIME
)

SQLAlchemy creating a table from a Pandas DataFrame

Just as we described, our database uses CREATE TABLE nyc_jobs to create a new SQL table, with all columns assigned appropriate data types.

Create DataFrame from SQL Table

Loading data from a database into a Pandas DataFrame is surprisingly easy. To load an entire table, use the read_sql_table() method:

table_df = pd.read_sql_table(
    table_name,
    con=engine
)

Convert SQL table to Pandas DataFrame

The first two parameters we pass are the same as last time: first is our table name, and then our SQLAlchemy engine. The above snippet is perhaps the quickest and simplest way to translate a SQL table into a Pandas DataFrame, with essentially no configuration needed! Interestingly, Pandas is still oblivious to the dtype of each column we've pulled despite having pulled from a database, as we can see with print(table_df.info()):

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3123 entries, 0 to 3122
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   job_id             3123 non-null   int64 
 1   agency             3123 non-null   object
 2   business_title     3123 non-null   object
 3   job_category       3121 non-null   object
 4   salary_range_from  3123 non-null   int64 
 5   salary_range_to    3123 non-null   int64 
 6   salary_frequency   3123 non-null   object
 7   work_location      3123 non-null   object
 8   division           3123 non-null   object
 9   job_description    3123 non-null   object
 10  created_at         3123 non-null   object
 11  updated_at         3123 non-null   object
dtypes: int64(3), object(9)
memory usage: 292.9+ KB
None

Output of print(table_df.info())

The read_sql_table() method can accept far more arguments than the two we passed. Here's an example where we read a SQL table and force some explicit things to happen:

...

table_df = pd.read_sql_table(
    "nyc_jobs",
    con=engine,
    schema='public',
    index_col='job_id',
    coerce_float=True,
    columns=[
        'job_id',
        'business_title',
        'job_category',
        'posting_date',
        'posting_updated'
    ],
    parse_dates=[
        'created_at',
        'updated_at'
    ],
    chunksize=500
)

Loading a SQL table with explicit values

Some arguments should look familiar from when we ran to_sql() earlier. schema and chunksize have the same meanings as they did previously. We also have a few new arguments as well:

index_col: We can select any column of our SQL table to become an index in our Pandas DataFrame, regardless of whether or not the column is an index in SQL. We can pass the name of a single column as a string, or a list of strings representing the names of multiple columns.
coerce_float: When set to True, Pandas will look at columns containing numbers and attempt to convert these columns to floating point numbers. This attribute is set to True by default.
columns: Passing a list of column names to this attribute will create a DataFrame from only the columns we provide (similar to a SQL select on x columns).
parse_dates: When moving data into Pandas we need to explicitly state which columns should be considered DateTime columns.

Create DataFrames From Query Results

There will probably be times where you're just looking for a subset of data in a table as opposed to the entire table. In this scenario we can use read_sql(), which creates a DataFrame from the results of a SQL query you run on a table:

...

sql_df = pd.read_sql(
    "SELECT * FROM nyc_jobs",
    con=engine,
    parse_dates=[
        'created_at',
        'updated_at'
    ]
)

SQL query to Pandas DataFrame

This time around our first parameter is a SQL query instead of the name of a table. We can modify this query to select only specific columns, rows which match criteria, or anything else you can do with SQL.

That's all folks! If you're interested, the source is up on Github here: