Anyone who has ever been exposed to the data, knows that time series data is arguably the most abundant type of datum that we deal with on a routine basis. Data that is indexed with date, time and/or both is thereby classified as a timeseries dataset. Often, it may be helpful to render our timeseries as a monthly and hourly heatmap visualization. Such powerful visualizations are supremely helpful in being able to digest data that is otherwise presented in form that may not be ingested into our highly visual selves. These renderings, will usually depict hour horizontally, month vertically, and will utilize color to communicate the intensity of the value the underlying cell represents. Here, we are going to transform a randomly generated timeseries dataset into an interactive heatmap useful some of Python’s most powerful bindings.
Python aside, we will be availing ourselves of Plotly, Pandas and Streamlit — some of the most formidable workhouses of the data science community. I gather that many will be more than acquainted with Pandas; Plotly and Streamlit on the other hand may not ring as many bells. The following offers a quick recap of each:
Pandas is without any shred of doubt one of the most effective bindings in Python when it comes to processing data. Pandas enables you to perform a whole slew of transformations on your set, all by invoking a couple of short commands. In our application we will use Pandas to read/write our data from/into a CSV files and to regroup our timestamps into months and hours of the day.
Streamlit is the unsung hero of Python libraries. It is a pure Python web framework that allows you develop and deploy your applications as web apps without writing a single line of HTML or CSS, kid you not. For me personally, I started using Streamlit in the summer of 2020 and since then I do not recall ever NOT using it for the scripts that I have written since. Streamlit allows you to instantaneously render your applications with an elegant and highly interactive user interface. For this application we will be using Streamlit to depict our heatmap and data frame on a local browser.
First thing’s first, go ahead and install the following packages on an Anaconda environment of your choosing.
Each package can be installed by typing the following corresponding command into Anaconda prompt.
pip install plotly
We will be using this randomly generated dataset, that has a column for the date, hour and value as shown below.
The date is formatted as follows:
While the hour is formatted as:
You can format your date and/or hour using any other formatting that suits your needs, but you will have to make sure that you declare it in your script as explained in the following section.
Before we can go any further, we need to preprocess our dataset to ensure that the dates and hours are in a format that may be processed further.
Initially, we need to remove any trailing decimal places from the values in our hour column and add leading zeroes in case the time is less than a whole hour, i.e. 12:00AM quoted as 0. Subsequently, we need to append our dates to hours and parse them in a format that is comprehensible by using the datetime.strptime binding in Python. Finally, we can transform the dates into months and the hours into the 12 hour format by using the strftime function:
In order to use other datetime formatting’s please refer to this article.
Once our data has been preprocessed, we can then use the powerful groubpy class in Pandas to regroup our dataset into time averaged values for months and hours as shown below:
Please note that other methods may be used instead of averaging, i.e summations, maximum or minimum by changing:
Now that we have regrouped our data into months and hours, we first need to transform our Pandas data frame into a dictionary and then an array that can be input into Plotly to create our heatmap.
Declare a dictionary and please make sure to add all of the months of the year without the truncation shown below:
Subsequently, we will insert the values from our Pandas data frame into the dictionary, and will use it to create an array of arrays corresponding to the averaged values of each month and hour respectively, as shown below:
Finally, we will render our Plotly heatmap using the array previously created:
You may find it convenient to download your regrouped month vs. hour data frame as a CSV file. If so please use the following function to create a downloadable file in your Streamlit app.
This function’s arguments — name and df correspond to the name of the downloadable file and data frame that needs to be converted to a CSV file respectively.
Finally, we can combine everything together in the form of a Streamlit application that will render the heatmap, data frame and a link to download our regrouped data as a CSV file.
You can run your final app, by typing the following commands in Anaconda prompt. First, change your root directory to where your source code is saved:
Then type the following to run your app:
streamlit run file_name.py
And there you go, an interactive rendering that enables you to visualize your timeseries dataset as month vs. hour heatmap.