添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Visualization is one of the crucial foundations of practicing data science, and analyzing information on a timeline can be vital to understanding its behavior. However, our understanding and structure of time is difficult to translate into raw computing — we can recognize that the phrase "8/1/2022" is the day that follows "7/31/2022", but without help, a computer would just see those as two different strings.

To help, pandas introduced the to_datetime function to convert strings (like the ones above) to Timestamp objects, which have many attributes that help translate our human perception of dates and times for the computer’s sake.

import pandas as pd
list_1 = ['7/1/2021', '7/4/2021', '8/20/2021', '9/2/2021', '4/1/2050']
list_2 = ['Presentation', 'Class', 'Presentation', 'Event', 'Class']
list_3 = [25, 0, 50, 48, 1000000]
myDF = pd.DataFrame(zip(list_1, list_2, list_3), columns=['date', 'type', 'attendance'])
print(myDF.dtypes)

format Argument

The date column is now saved with the data type that we want. If you recall, we had an odd-looking format argument. The idea is that we are telling pandas how to parse the date: month first, then day, then year, separated by forward slashes. If the dates had dashes or some other separator, we could indicate this in format to ensure pandas is reading it correctly.

format does pay attention to capitalization. For example, %y means the date excludes century, while %Y includes century (think 13 compared to 2013). There are helpful tables to explain the different formats, and we’ll include the main ones here:

Time Differences: Timedelta and DateOffset

The Timedelta and DateOffset objects are simple: they are time/date objects different from date times that can be supplemented a difference between dates (4 days from now, 4 months ago, etc) to find the desired future/past date. The key difference is that Timedelta works with absolute time (1 day = 24 hours), while DateOffset works with calendar time, accounting for things such as daylight savings time. DateOffset also lacks an array class corresponding to timedelta64[ns] .

day_example = pd.Timedelta(days=6) week_example = pd.Timedelta(weeks=14) print(day_example) print(week_example)

DateOffset is useful due to its broader acceptance of time parameters and use of calendar logic. We often want to know the dates of things weeks, months, and years in advance, and it’s inconvenient to translate those to number of days for use with Timedelta . Additionally, 4 months from January 15th is informally understood to be May 15th, and DateOffset understands this where Timedelta does not.

Timedelta can be interpreted by DataFrames and Series , while DateOffset cannot and is cast as a simple object . Additionally, any DateOffset time measurements equal to or shorter than an hour function like Timedelta .

        date          type  attendance  month
0 2021-07-01  Presentation          25      7
1 2021-07-04         Class           0      7
2 2021-08-20  Presentation          50      8
3 2021-09-02         Event          48      9
4 2050-04-01         Class     1000000      4
        date          type  attendance  month  year
0 2021-07-01  Presentation          25      7  2021
1 2021-07-04         Class           0      7  2021
2 2021-08-20  Presentation          50      8  2021
3 2021-09-02         Event          48      9  2021
4 2050-04-01         Class     1000000      4  2050
        date          type  attendance   weekday
0 2021-07-01  Presentation          25  Thursday
1 2021-07-04         Class           0    Sunday
2 2021-08-20  Presentation          50    Friday
3 2021-09-02         Event          48  Thursday
4 2050-04-01         Class     1000000    Friday
        date          type  attendance
0 2021-07-08  Presentation          25
1 2021-07-11         Class           0
2 2021-08-27  Presentation          50
3 2021-09-09         Event          48
4 2050-04-08         Class     1000000

Suppose myDF.date contains exclusively days from the first semester of an academic year, and each year ends on May 31st. Create the end_of_school column using the date column and DateOffset , which contains the last day of school for that academic year. Then create days_until_school_is_over , a column that contains the number of days between date and end_of_school .

Click to see solution
one_year_later = myDF['date'] + pd.offsets.DateOffset(years=1)
myDF['end_of_school'] = pd.to_datetime({'month': 5, 'day': 31, 'year':one_year_later.dt.year})
myDF['days_until_school_is_over'] = myDF['end_of_school'] - myDF['date']
print(myDF)
        date          type  attendance end_of_school days_until_school_is_over
0 2021-07-01  Presentation          25    2022-05-31                  334 days
1 2021-07-04         Class           0    2022-05-31                  331 days
2 2021-08-20  Presentation          50    2022-05-31                  284 days
3 2021-09-02         Event          48    2022-05-31                  271 days
4 2050-11-01         Class     1000000    2051-05-31                  211 days

Purdue University, The Data Mine, Hillenbrand Hall, 1301 Third Street, West Lafayette, IN 47906-4206, (765) 494-0325

Contact The Data Mine at [email protected] for accessibility issues with this page | Accessibility Resources