Importing Excel Datetimes Into Pandas, Part I

Parse dates and times from Excel .xlsx correctly when using Pandas.

Matthew Alhonte

Aug 13, 2018 • 1 min read

Different file formats are different! For all kinds of reasons!

A few months back, I had to import some Excel files into a database. In this process I learned so much about the delightfully unique way Excel stores dates & times!

The basic datetime will be a decimal number, like 43324.909907407404. The number before the decimal is the day, the number afterwards is the time. So far, so good - this is pretty common for computers. The date is often the number of days past a certain date, and the time is the number of seconds.

So, let's load our excel sheet! Pandas of course has a painless way of doing this.

import pandas as pd

dfRaw = pd.read_excel("hasDates.xlsx", sheet_name="Sheet1")

dfRaw["dateTimes"]

	0
0	43324.909907
1	43324.909919
2	43324.909931
3	43324.909942
4	43324.909954

Sadly, we can't yet convert these. Different Excel files start at different dates, and you'll get a very wrong result if you use the wrong one. Luckily, there are tools that'll go into the file and extract what we need! Enter xlrd:

import xlrd

book = xlrd.open_workbook("hasDates.xlsx")
datemode = book.datemode

xlrd also has a handy function for turning those dates into a datetime tuple that'll play nicely with Python.

dfRaw["dateTimes"].map(lambda x: 
          xlrd.xldate_as_tuple(x, datemode))

	0
0	(2018, 8, 12, 21, 50, 16)
1	(2018, 8, 12, 21, 50, 17)
2	(2018, 8, 12, 21, 50, 18)
3	(2018, 8, 12, 21, 50, 19)
4	(2018, 8, 12, 21, 50, 20)

And once we've got that, simple enough to convert to proper datetimes!

import datetime

dfRaw["dateTimes"].map(lambda x: 
          datetime.datetime(*xlrd.xldate_as_tuple(x, 
                                                  datemode)))

	0
0	2018-08-12 21:50:16
1	2018-08-12 21:50:17
2	2018-08-12 21:50:18
3	2018-08-12 21:50:19
4	2018-08-12 21:50:20

Stick around for Part 2, where we look at some messier situations.

Related Posts