Visualization is one of the crucial foundations of practicing data science, and analyzing information on a timeline can be vital to understanding its behavior. However, our understanding and structure of time is difficult to translate into raw computing — we can recognize that the phrase "8/1/2022" is the day that follows "7/31/2022", but without help, a computer would just see those as two different strings.
To help,
pandas
introduced the
to_datetime
function to convert strings (like the ones above) to
Timestamp
objects, which have many attributes that help translate our human perception of dates and times for the computer’s sake.
import pandas as pd
list_1 = ['7/1/2021', '7/4/2021', '8/20/2021', '9/2/2021', '4/1/2050']
list_2 = ['Presentation', 'Class', 'Presentation', 'Event', 'Class']
list_3 = [25, 0, 50, 48, 1000000]
myDF = pd.DataFrame(zip(list_1, list_2, list_3), columns=['date', 'type', 'attendance'])
print(myDF.dtypes)
The date column is now saved with the data type that we want. If you recall, we had an odd-looking
format
argument. The idea is that we are telling
pandas
how to parse the date: month first, then day, then year, separated by forward slashes. If the dates had dashes or some other separator, we could indicate this in
format
to ensure
pandas
is reading it correctly.
format
does
pay attention to capitalization. For example,
%y
means the date excludes century, while
%Y
includes century (think 13 compared to 2013). There are
helpful tables
to explain the different formats, and we’ll include the main ones here:
The
Timedelta
and
DateOffset
objects are simple: they are time/date objects
different from date times
that can be supplemented a difference between dates (4 days from now, 4 months ago, etc) to find the desired future/past date. The key difference is that
Timedelta
works with absolute time (1 day = 24 hours), while
DateOffset
works with calendar time, accounting for things such as daylight savings time.
DateOffset
also lacks an array class corresponding to
timedelta64[ns]
.
day_example = pd.Timedelta(days=6)
week_example = pd.Timedelta(weeks=14)
print(day_example)
print(week_example)
DateOffset
is useful due to its broader acceptance of time parameters and use of calendar logic. We often want to know the dates of things weeks, months, and years in advance, and it’s inconvenient to translate those to number of days for use with
Timedelta
. Additionally, 4 months from January 15th is informally understood to be May 15th, and
DateOffset
understands this where
Timedelta
does not.
Timedelta
can be interpreted by
DataFrames
and
Series
, while
DateOffset
cannot and is cast as a simple
object
. Additionally, any
DateOffset
time measurements equal to or shorter than an hour function like
Timedelta
.
date type attendance month
0 2021-07-01 Presentation 25 7
1 2021-07-04 Class 0 7
2 2021-08-20 Presentation 50 8
3 2021-09-02 Event 48 9
4 2050-04-01 Class 1000000 4
date type attendance month year
0 2021-07-01 Presentation 25 7 2021
1 2021-07-04 Class 0 7 2021
2 2021-08-20 Presentation 50 8 2021
3 2021-09-02 Event 48 9 2021
4 2050-04-01 Class 1000000 4 2050
date type attendance weekday
0 2021-07-01 Presentation 25 Thursday
1 2021-07-04 Class 0 Sunday
2 2021-08-20 Presentation 50 Friday
3 2021-09-02 Event 48 Thursday
4 2050-04-01 Class 1000000 Friday
date type attendance
0 2021-07-08 Presentation 25
1 2021-07-11 Class 0
2 2021-08-27 Presentation 50
3 2021-09-09 Event 48
4 2050-04-08 Class 1000000
Click to see solution
one_year_later = myDF['date'] + pd.offsets.DateOffset(years=1)
myDF['end_of_school'] = pd.to_datetime({'month': 5, 'day': 31, 'year':one_year_later.dt.year})
myDF['days_until_school_is_over'] = myDF['end_of_school'] - myDF['date']
print(myDF)
date type attendance end_of_school days_until_school_is_over
0 2021-07-01 Presentation 25 2022-05-31 334 days
1 2021-07-04 Class 0 2022-05-31 331 days
2 2021-08-20 Presentation 50 2022-05-31 284 days
3 2021-09-02 Event 48 2022-05-31 271 days
4 2050-11-01 Class 1000000 2051-05-31 211 days
Purdue University, The Data Mine, Hillenbrand Hall, 1301 Third Street, West Lafayette, IN 47906-4206, (765) 494-0325
Contact The Data Mine at
[email protected]
for accessibility issues with this page |
Accessibility Resources