Importing Excel Datetimes Into Pandas, Part II

Importing Excel Datetimes Into Pandas, Part II

Import dates & times from Excel .xlsx files into Pandas!

    What if, like during my data import task a few months back, the dates & times are in separate columns?  This gives us a few new issues.  Let's import that Excel file!

    import pandas as pd
    import xlrd
    import datetime
    
    df = pd.read_excel("hasDatesAndTimes.xlsx", sheet_name="Sheet1")
    
    book = xlrd.open_workbook("hasDatesAndTimes.xlsx")
    datemode = book.datemode
    
    

    And let's see that time variable!

    df["Time"]
    
    Index Time
    0 0.909907
    1 0.909919
    2 0.909931
    3 0.909942
    4 0.909954
    df["Time"].map(lambda x: xlrd.xldate_as_tuple(x, datemode))
    

    So far, so good....

    df["Time"].map(lambda x: datetime.datetime(*xlrd.xldate_as_tuple(x, 
                                                  datemode)))
    ValueError: year 0 is out of range
    

    Agh!  Plain datetime won't let us have 0 as our year.

    We'll want two packages to fix this.  One is an awesome package for handling dates called arrow.  In order for arrow to recognize what we want it to, though, we'll need some more manipulations - I'll be using the pipe function from toolz in order to make that more readable.

    Index Time
    0 (0, 0, 0, 21, 50, 16)
    1 (0, 0, 0, 21, 50, 17)
    2 (0, 0, 0, 21, 50, 18)
    3 (0, 0, 0, 21, 50, 19)
    4 (0, 0, 0, 21, 50, 20)

    Pipe lets us have a nice workflow where we just list the transformations we want, and our value will be "piped" sequentially through each one.

    fns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
         lambda x: x[3:6],
          lambda x: map(str, x),
          lambda x: "-".join(x),
           lambda x: arrow.get(x, "H-m-s"),
           lambda x: x.format('HH:mm:ss')
          ]
    

    Let's see a blow-by-blow of how one of our values gets transformed by that.

    fnRanges = [fns[:i+1] for i in range(len(fns))]
    [pipe(0.909907, *x) for x in fnRanges]
    
    [(0, 0, 0, 21, 50, 16),
     (21, 50, 16),
     <map at 0x7f105151af98>,
     '21-50-16',
     <Arrow [0001-01-01T21:50:16+00:00]>,
     '21:50:16']
    

    The first function takes us from an Excel datetime to a datetime tuple.

    The next extracts just the time variables.

    We then map that all to a string (which shows up as nothing because map is evaluated lazily).

    Then we put a dash between all those elements so it'll be easier to parse as a time.

    Then arrow consumes the value, with the format we specified.

    And finally gives us a neatly-formatted time, ready to be consumed by a database!

    Helper Functions

    def mapPipe(ser, fns):
        return ser.map(lambda a: pipe(a, *fns),
            na_action="ignore" )
    
    mapPipe(df['Time'],
       fns)
    
    Index Time
    0 21:50:16
    1 21:50:17
    2 21:50:18
    3 21:50:19
    4 21:50:20

    Dates are a bit easier - though the pipe syntax is still helpful!

    dateFns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
          lambda x: arrow.get(*x),
          lambda x: x.format('YYYY-MM-DD')
          ]
    mapPipe(df['Date'],
           dateFns)
    
    Index Date
    0 2018-08-12
    1 2018-08-12
    2 2018-08-12
    3 2018-08-12
    4 2018-08-12

    Put it all together....

    (df.assign(Date = mapPipe(df['Date'],
           dateFns))
       .assign(Time = mapPipe(df['Time'],
           fns)))
    
    Index Date Time
    0 2018-08-12 21:50:16
    1 2018-08-12 21:50:17
    2 2018-08-12 21:50:18
    3 2018-08-12 21:50:19
    4 2018-08-12 21:50:20
    Matthew Alhonte's' avatar
    Center of the Universe
    Super villain in somebody's action hero movie. Experienced a radioactive freak accident at a young age, which rendered him part-snake and strangely adept at Python.
    Matthew Alhonte's' avatar
    Center of the Universe @MattAlhonte

    Super villain in somebody's action hero movie. Experienced a radioactive freak accident at a young age, which rendered him part-snake and strangely adept at Python.