The Complex yet Powerful World of DateTime in Data Science
I still remember coming across my first DateTime variable when I was learning Python. It was an e-commerce project where I had to figure out the supply chain pipeline – the time it takes for an order to be shipped, the number of days it takes for an order to be delivered, etc. It was quite a fascinating problem from a data science perspective.
The issue – I wasn’t familiar with how to extract and play around with the date and time components in Python.
There is an added complexity to the DateTime features, an extra layer that isn’t present in numerical variables. Being able to master these DateTime features will help you go a long way towards becoming a better (and more efficient) data scientist. It’s definitely helped me a lot!
And the date and time features are ubiquitous in data science projects. Think about it – they are a rich source of valuable information, and hence, can give some deep insights about any dataset at hand. Plus the amount of flexibility they offer when we’re performing feature engineering – priceless!
In this article, we will first have a look at how to handle date and time features with Python’s DateTime module and then we will explore Pandas functions for the same!
Note: I assume you’re familiar with Python and the Pandas library. If not, I highly recommend taking the awesome free courses below:
Table of Contents
- The Importance of the Date-Time Component
- Working with Dates in Python
- Working with Time in Python
- DateTime in Python
- Updating old dates
- Extracting Weekday from DateTime
- What week is it?
- Leap year or not? Use the calendar!
- The Different Datetime formats
- Advanced DateTime formatting with Strptime & Strftime
- DateTime with Pandas
- DateTime and Timedelta objects in Pandas
- Date range in Pandas
- Making DateTime features in Pandas
The Importance of the Date-Time Component
It’s worth reiterating, dates and times are a treasure trove of information and that is why data scientists love them so much.
Before we dive into the crux of the article, I want you to experience this yourself. Take a look at the date and time right now. Try and imagine all kinds of information that you can extract from it to understand your reading habit. The year, month, day, hour, and minute are the usual suspects.
But if you dig a little further, you can determine whether you prefer reading on weekdays or weekends, whether you are a morning person or a night owl (we are in the same boat here!), or whether you accumulate all the interesting articles to read at the end of the month!
Clearly, the list will go on and you will gradually learn a lot about your reading habits if you repeat this exercise after collecting the data over a period of time, say a month. Now imagine how useful this feature would be in a real-world scenario where information is collected over a long period of time.
Date and time features find importance in data science problems spanning industries from sales, marketing, and finance to HR, e-commerce, retail, and many more. Predicting how the stock markets will behave tomorrow, how many products will be sold in the upcoming week, when is the best time to launch a new product, how long before a position at the company gets filled, etc. are some of the problems that we can find answers to using date and time data.
This incredible amount of insight that you can unravel from the data is what makes date and time components so fun to work with! So let’s get down to the business of mastering date-time manipulation in Python.
Working with Dates in Python
The date class in the DateTime module of Python deals with dates in the Gregorian calendar. It accepts three integer arguments: year, month, and day. Let’s have a look at how it’s done:
You can see how easy it was to create a date object of datetime class. And it’s even easier to extract features like day, month, and year from the date. This can be done using the day, month, and year attributes. We will see how to do that on the current local day date object that we will create using the today() function:
Working with Time in Python
time is another class of the DateTime module that accepts integer arguments for time up to microseconds and returns a DateTime object:
You can extract features like hour, minute, second, and microsecond from the time object using the respective attributes. Here is an example:
This is just the tip of the iceberg. There is so much more we can do with DateTime features in Python and that’s what we’ll look at in the next section.
DateTime in Python
So far, we have seen how to create a date and a time object using the DateTime module. But the beauty of the DateTime module is that it lets you dovetail both the properties into a single object, DateTime!
datetime is a class and an object in Python’s DateTime module, just like date and time. The arguments are a combination of date and time attributes, starting from the year and ending in microseconds.
So, let’s see how you can create a DateTime object:
Or you could even create an object on the local date and time using the now() method:
You can go on and extract whichever value you want to from the DateTime object using the same attributes we used with the date and time objects individually.
Next, let’s look at some of the methods in the DateTime class.
Updating old Dates
First, we’ll see how to separate date and time from the DateTime object using the date() and time() methods. But you could also replace a value in the DateTime objects without having to change the entire date using the replace() method:
Weekday from DateTime
One really cool thing that you can do with the DateTime function is to extract the day of the week! This is especially helpful in feature engineering because the value of the target variable can be dependent on the day of the week, like sales of a product are generally higher on a weekend or traffic on StackOverflow could be higher on a weekday when people are working, etc.
The weekday() method returns an integer value for the day of the week, where Monday is 0 and Sunday is 6. But if you wanted it to return the weekday value between 1 and 7, like in a real-world scenario, you should use isoweekday():
What Week is it?
Alright, you know the day of the week, but do you know what week of the year is it? This is another very important feature that you can generate from the given date in a dataset.
Sometimes the value of the target variable might be higher during certain times of the year. For example, the sales of products on e-commerce websites are generally higher during vacations.
You can get the week of the year by slicing the value returned by the isocalendar() method:
Leap Year or Not? Use Calendar!
Want to check whether it is a leap year or not? You will need to use the isleap() method from the calendar module and pass the year as an attribute:
Congratulations – you are living in a leap year! What did you do with the extra day? Oh, you missed it? Don’t worry! Just take a day this month and do the stuff that you love! But where are you going? You got your calendar right here!
Not free this month? You can have a look at the entire calendar for the year:
Pretty cool, right? Plan your year wisely and take out some time to do the things you love!
The Datetime module lets you interchange the format of DateTime between a few options.
First up is the ISO format. If you wanted to create a DateTime object from the string form of the date in ISO format, use the fromisoformat() method. And if you intended to do the reverse, use the isoformat() method:
If you wanted to convert DateTime into a string format, you could use the ctime() method. This returns the date in a string format. And if you wanted to extract just the date from that, well, you would have to use slicing:
And if none of these functions strike your fancy, you could use the format() method which lets you define your own format:
Wait – what are these arguments I passed to the function? These are called formatted string codes and we will look at them in detail in the next section.
Advanced DateTime Formatting with Strptime & Strftime
These functions are very important as they let you define the format of the DateTime object explicitly. This can give you a lot of flexibility with handling DateTime features.
strptime() creates a DateTime object from a string representing date and time. It takes two arguments: the date and the format in which your date is present. Have a look below:
You define the format using the formatting codes as I did above. There are a number of formatting codes and you can have a look at them in the documentation.
The stftime() method, on the other hand, can be used to convert the DateTime object into a string representing date and time:
But you can also extract some important information from the DateTime object like weekday name, month name, week number, etc. which can turn out to be very useful in terms of features as we saw in previous sections.
So far, we have seen how to create a DateTime object and how to format it. But sometimes, you might have to find the duration between two dates, which can be another very useful feature that you can derive from a dataset. This duration is, however, returned as a timedelta object.
As you can see, the duration is returned as the number of days for the date and seconds for the time between the dates. So you can actually retrieve these values for your features:
But what if you actually wanted the duration in hours or minutes? Well, there is a simple solution for that.
timedelta is also a class in the DateTime module. So, you could use it to convert your duration into hours and minutes as I’ve done below:
Now, what if you wanted to get the date 5 days from today? Do you simply add 5 to the present date?
Not quite. So how do you go about it then? You use timedelta of course!
timedelta makes it possible to add and subtract integers from a DateTime object.
DateTime in Pandas
We already know that Pandas is a great library for doing data analysis tasks. And so it goes without saying that Pandas also supports Python DateTime objects. It has some great methods for handling dates and times, such as to_datetime() and to_timedelta().
DateTime and Timedelta objects in Pandas
The to_datetime() method converts the date and time in string format to a DateTime object:
You might have noticed something strange here. The type of the object returned by to_datetime() is not DateTime but Timestamp. Well, don’t worry, it is just the Pandas equivalent of Python’s DateTime.
We already know that timedelta gives differences in times. The Pandas to_timedelta() method does just this:
Here, the unit determines the unit of the argument, whether that’s day, month, year, hours, etc.
Date Range in Pandas
To make the creation of date sequences a convenient task, Pandas provides the date_range() method. It accepts a start date, an end date, and an optional frequency code:
Instead of defining the end date, you could define the period or number of time periods you want to generate:
Making DateTime Features in Pandas
Let’s also create a series of end dates and make a dummy dataset from which we can derive some new features and bring our learning about DateTime to fruition.
Perfect! So we have a dataset containing start date, end date, and a target variable:
We can create multiple new features from the date column, like the day, month, year, hour, minute, etc. using the dt attribute as shown below:
Our duration feature is great, but what if we would like to have the duration in minutes or seconds? Remember how in the timedelta section we converted the date to seconds? We could do the same here!
Great! Can you see how many new features we created from just the dates?
Now, let’s make the start date the index of the DataFrame. This will help us easily analyze our dataset because we can use slicing to find data representing our desired dates:
Awesome! This is super useful when you want to do visualizations or any data analysis.
I hope you found this article on how to manipulate date and time features with Python and Pandas useful. But nothing is complete without practice. Working with time series datasets is a wonderful way to practice what we have learned in this article.