Importing Excel Datetimes Into Pandas II

Pandas and Excel Pt. 2

What if, like during my data import task a few months back, the dates & times are in separate columns?  This gives us a few new issues.  Let's import that Excel file!

import pandas as pd
import xlrd
import datetime

df = pd.read_excel("hasDatesAndTimes.xlsx", sheet_name="Sheet1")

book = xlrd.open_workbook("hasDatesAndTimes.xlsx")
datemode = book.datemode

And let's see that time variable!

df["Time"]
Time
0 0.909907
1 0.909919
2 0.909931
3 0.909942
4 0.909954
df["Time"].map(lambda x: xlrd.xldate_as_tuple(x, datemode))
Time
0 (0, 0, 0, 21, 50, 16)
1 (0, 0, 0, 21, 50, 17)
2 (0, 0, 0, 21, 50, 18)
3 (0, 0, 0, 21, 50, 19)
4 (0, 0, 0, 21, 50, 20)

So far, so good....

df["Time"].map(lambda x: datetime.datetime(*xlrd.xldate_as_tuple(x, 
                                              datemode)))
ValueError: year 0 is out of range

Agh!  Plain datetime won't let us have 0 as our year.

We'll want two packages to fix this.  One is an awesome package for handling dates called arrow.  In order for arrow to recognize what we want it to, though, we'll need some more manipulations - I'll be using the pipe function from toolz in order to make that more readable.

Pipe lets us have a nice workflow where we just list the transformations we want, and our value will be "piped" sequentially through each one.

fns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
     lambda x: x[3:6],
      lambda x: map(str, x),
      lambda x: "-".join(x),
       lambda x: arrow.get(x, "H-m-s"),
       lambda x: x.format('HH:mm:ss')
      ]

Let's see a blow-by-blow of how one of our values gets transformed by that.

fnRanges = [fns[:i+1] for i in range(len(fns))]
[pipe(0.909907, *x) for x in fnRanges]

[(0, 0, 0, 21, 50, 16),
 (21, 50, 16),
 <map at 0x7f105151af98>,
 '21-50-16',
 <Arrow [0001-01-01T21:50:16+00:00]>,
 '21:50:16']

The first function takes us from an Excel datetime to a datetime tuple.

The next extracts just the time variables.

We then map that all to a string (which shows up as nothing because map is evaluated lazily).

Then we put a dash between all those elements so it'll be easier to parse as a time.

Then arrow consumes the value, with the format we specified.

And finally gives us a neatly-formatted time, ready to be consumed by a database!

Helper Functions

def mapPipe(ser, fns):
    return ser.map(lambda a: pipe(a, *fns),
        na_action="ignore" )

mapPipe(df['Time'],
   fns)
Time
0 21:50:16
1 21:50:17
2 21:50:18
3 21:50:19
4 21:50:20

Dates are a bit easier - though the pipe syntax is still helpful!

dateFns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
      lambda x: arrow.get(*x),
      lambda x: x.format('YYYY-MM-DD')
      ]
mapPipe(df['Date'],
       dateFns)
Date
0 2018-08-12
1 2018-08-12
2 2018-08-12
3 2018-08-12
4 2018-08-12

Put it all together....

(df.assign(Date = mapPipe(df['Date'],
       dateFns))
   .assign(Time = mapPipe(df['Time'],
       fns)))
Date Time
0 2018-08-12 21:50:16
1 2018-08-12 21:50:17
2 2018-08-12 21:50:18
3 2018-08-12 21:50:19
4 2018-08-12 21:50:20
Author image
Center of the Universe Website
Super villain in somebody's action hero movie. Experienced a radioactive freak accident at a young age, which rendered him part-snake and strangely adept at Python.
Author image
Center of the Universe

Super villain in somebody's action hero movie. Experienced a radioactive freak accident at a young age, which rendered him part-snake and strangely adept at Python.