What if, like during my data import task a few months back, the dates & times are in separate columns? This gives us a few new issues. Let's import that Excel file!
import pandas as pd
import xlrd
import datetime
df = pd.read_excel("hasDatesAndTimes.xlsx", sheet_name="Sheet1")
book = xlrd.open_workbook("hasDatesAndTimes.xlsx")
datemode = book.datemode
And let's see that time variable!
df["Time"]
Index | Time |
---|---|
0 | 0.909907 |
1 | 0.909919 |
2 | 0.909931 |
3 | 0.909942 |
4 | 0.909954 |
df["Time"].map(lambda x: xlrd.xldate_as_tuple(x, datemode))
So far, so good....
df["Time"].map(lambda x: datetime.datetime(*xlrd.xldate_as_tuple(x,
datemode)))
ValueError: year 0 is out of range
Agh! Plain datetime won't let us have 0 as our year.
We'll want two packages to fix this. One is an awesome package for handling dates called arrow
. In order for arrow
to recognize what we want it to, though, we'll need some more manipulations - I'll be using the pipe
function from toolz
in order to make that more readable.
Index | Time |
---|---|
0 | (0, 0, 0, 21, 50, 16) |
1 | (0, 0, 0, 21, 50, 17) |
2 | (0, 0, 0, 21, 50, 18) |
3 | (0, 0, 0, 21, 50, 19) |
4 | (0, 0, 0, 21, 50, 20) |
Pipe lets us have a nice workflow where we just list the transformations we want, and our value will be "piped" sequentially through each one.
fns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
lambda x: x[3:6],
lambda x: map(str, x),
lambda x: "-".join(x),
lambda x: arrow.get(x, "H-m-s"),
lambda x: x.format('HH:mm:ss')
]
Let's see a blow-by-blow of how one of our values gets transformed by that.
fnRanges = [fns[:i+1] for i in range(len(fns))]
[pipe(0.909907, *x) for x in fnRanges]
[(0, 0, 0, 21, 50, 16),
(21, 50, 16),
<map at 0x7f105151af98>,
'21-50-16',
<Arrow [0001-01-01T21:50:16+00:00]>,
'21:50:16']
The first function takes us from an Excel datetime to a datetime tuple.
The next extracts just the time variables.
We then map that all to a string (which shows up as nothing because map
is evaluated lazily).
Then we put a dash between all those elements so it'll be easier to parse as a time.
Then arrow
consumes the value, with the format we specified.
And finally gives us a neatly-formatted time, ready to be consumed by a database!
Helper Functions
def mapPipe(ser, fns):
return ser.map(lambda a: pipe(a, *fns),
na_action="ignore" )
mapPipe(df['Time'],
fns)
Index | Time |
---|---|
0 | 21:50:16 |
1 | 21:50:17 |
2 | 21:50:18 |
3 | 21:50:19 |
4 | 21:50:20 |
Dates are a bit easier - though the pipe
syntax is still helpful!
dateFns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
lambda x: arrow.get(*x),
lambda x: x.format('YYYY-MM-DD')
]
mapPipe(df['Date'],
dateFns)
Index | Date |
---|---|
0 | 2018-08-12 |
1 | 2018-08-12 |
2 | 2018-08-12 |
3 | 2018-08-12 |
4 | 2018-08-12 |
Put it all together....
(df.assign(Date = mapPipe(df['Date'],
dateFns))
.assign(Time = mapPipe(df['Time'],
fns)))
Index | Date | Time |
---|---|---|
0 | 2018-08-12 | 21:50:16 |
1 | 2018-08-12 | 21:50:17 |
2 | 2018-08-12 | 21:50:18 |
3 | 2018-08-12 | 21:50:19 |
4 | 2018-08-12 | 21:50:20 |