Different file formats are different! For all kinds of reasons!
A few months back, I had to import some Excel files into a database. In this process I learned so much about the delightfully unique way Excel stores dates & times!
The basic datetime will be a decimal number, like 43324.909907407404
. The number before the decimal is the day, the number afterwards is the time. So far, so good - this is pretty common for computers. The date is often the number of days past a certain date, and the time is the number of seconds.
So, let's load our excel sheet! Pandas of course has a painless way of doing this.
import pandas as pd
dfRaw = pd.read_excel("hasDates.xlsx", sheet_name="Sheet1")
dfRaw["dateTimes"]
0 | |
---|---|
0 | 43324.909907 |
1 | 43324.909919 |
2 | 43324.909931 |
3 | 43324.909942 |
4 | 43324.909954 |
Sadly, we can't yet convert these. Different Excel files start at different dates, and you'll get a very wrong result if you use the wrong one. Luckily, there are tools that'll go into the file and extract what we need! Enter xlrd
:
import xlrd
book = xlrd.open_workbook("hasDates.xlsx")
datemode = book.datemode
xlrd
also has a handy function for turning those dates into a datetime
tuple that'll play nicely with Python.
dfRaw["dateTimes"].map(lambda x:
xlrd.xldate_as_tuple(x, datemode))
0 | |
---|---|
0 | (2018, 8, 12, 21, 50, 16) |
1 | (2018, 8, 12, 21, 50, 17) |
2 | (2018, 8, 12, 21, 50, 18) |
3 | (2018, 8, 12, 21, 50, 19) |
4 | (2018, 8, 12, 21, 50, 20) |
And once we've got that, simple enough to convert to proper datetimes!
import datetime
dfRaw["dateTimes"].map(lambda x:
datetime.datetime(*xlrd.xldate_as_tuple(x,
datemode)))
0 | |
---|---|
0 | 2018-08-12 21:50:16 |
1 | 2018-08-12 21:50:17 |
2 | 2018-08-12 21:50:18 |
3 | 2018-08-12 21:50:19 |
4 | 2018-08-12 21:50:20 |
Stick around for Part 2, where we look at some messier situations.