What if, like during my data import task a few months back, the dates & times are in separate columns? This gives us a few new issues. Let's import that Excel file!
import pandas as pd
import xlrd
import datetime
df = pd.read_excel("hasDatesAndTimes.xlsx", sheet_name="Sheet1")
book = xlrd.open_workbook("hasDatesAndTimes.xlsx")
datemode = book.datemodeAnd let's see that time variable!
df["Time"]
| Index | Time |
|---|---|
| 0 | 0.909907 |
| 1 | 0.909919 |
| 2 | 0.909931 |
| 3 | 0.909942 |
| 4 | 0.909954 |
df["Time"].map(lambda x: xlrd.xldate_as_tuple(x, datemode))So far, so good....
df["Time"].map(lambda x: datetime.datetime(*xlrd.xldate_as_tuple(x,
datemode)))
ValueError: year 0 is out of rangeAgh! Plain datetime won't let us have 0 as our year.
We'll want two packages to fix this. One is an awesome package for handling dates called arrow. In order for arrow to recognize what we want it to, though, we'll need some more manipulations - I'll be using the pipe function from toolz in order to make that more readable.
| Index | Time |
|---|---|
| 0 | (0, 0, 0, 21, 50, 16) |
| 1 | (0, 0, 0, 21, 50, 17) |
| 2 | (0, 0, 0, 21, 50, 18) |
| 3 | (0, 0, 0, 21, 50, 19) |
| 4 | (0, 0, 0, 21, 50, 20) |
Pipe lets us have a nice workflow where we just list the transformations we want, and our value will be "piped" sequentially through each one.
fns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
lambda x: x[3:6],
lambda x: map(str, x),
lambda x: "-".join(x),
lambda x: arrow.get(x, "H-m-s"),
lambda x: x.format('HH:mm:ss')
]Let's see a blow-by-blow of how one of our values gets transformed by that.
fnRanges = [fns[:i+1] for i in range(len(fns))]
[pipe(0.909907, *x) for x in fnRanges]
[(0, 0, 0, 21, 50, 16),
(21, 50, 16),
<map at 0x7f105151af98>,
'21-50-16',
<Arrow [0001-01-01T21:50:16+00:00]>,
'21:50:16']The first function takes us from an Excel datetime to a datetime tuple.
The next extracts just the time variables.
We then map that all to a string (which shows up as nothing because map is evaluated lazily).
Then we put a dash between all those elements so it'll be easier to parse as a time.
Then arrow consumes the value, with the format we specified.
And finally gives us a neatly-formatted time, ready to be consumed by a database!
Helper Functions
def mapPipe(ser, fns):
return ser.map(lambda a: pipe(a, *fns),
na_action="ignore" )
mapPipe(df['Time'],
fns)| Index | Time |
|---|---|
| 0 | 21:50:16 |
| 1 | 21:50:17 |
| 2 | 21:50:18 |
| 3 | 21:50:19 |
| 4 | 21:50:20 |
Dates are a bit easier - though the pipe syntax is still helpful!
dateFns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
lambda x: arrow.get(*x),
lambda x: x.format('YYYY-MM-DD')
]
mapPipe(df['Date'],
dateFns)| Index | Date |
|---|---|
| 0 | 2018-08-12 |
| 1 | 2018-08-12 |
| 2 | 2018-08-12 |
| 3 | 2018-08-12 |
| 4 | 2018-08-12 |
Put it all together....
(df.assign(Date = mapPipe(df['Date'],
dateFns))
.assign(Time = mapPipe(df['Time'],
fns)))| Index | Date | Time |
|---|---|---|
| 0 | 2018-08-12 | 21:50:16 |
| 1 | 2018-08-12 | 21:50:17 |
| 2 | 2018-08-12 | 21:50:18 |
| 3 | 2018-08-12 | 21:50:19 |
| 4 | 2018-08-12 | 21:50:20 |