Introduction

Anyone who has ever been exposed to the data, knows that time series data is arguably the most abundant type of datum that we deal with on a routine basis. Data that is indexed with date, time and/or both is thereby classified as a timeseries dataset. Often, it may be helpful to render our timeseries as a monthly and hourly heatmap visualization. Such powerful visualizations are supremely helpful in being able to digest data that is otherwise presented in form that may not be ingested into our highly visual selves. These renderings, will usually depict hour horizontally, month vertically, and will utilize color to communicate the intensity of the value the underlying cell represents. Here, we are going to transform a randomly generated timeseries dataset into an interactive heatmap useful some of Python’s most powerful bindings.

Programming Stack

Python aside, we will be availing ourselves of Plotly, Pandas and Streamlit — some of the most formidable workhouses of the data science community. I gather that many will be more than acquainted with Pandas; Plotly and Streamlit on the other hand may not ring as many bells. The following offers a quick recap of each:

1. Pandas

Pandas is without any shred of doubt one of the most effective bindings in Python when it comes to processing data. Pandas enables you to perform a whole slew of transformations on your set, all by invoking a couple of short commands. In our application we will use Pandas to read/write our data from/into a CSV files and to regroup our timestamps into months and hours of the day.

2. Plotly

Plotly is a robust and agile data visualization library that is specifically tailored towards tools for machine learning and data science. While it is based on plotly.js which itself is a native Javascript binding, it has been expanded to support Python, R and other popular scripting languages. By employing a few lines of JSON in your Python script, you can easily invoke interactive visualizations including but not limited to line charts, histograms, radar plots, heatmaps and more. In this instance, we will be using Plotly, to render our month vs. hour heatmap.

3. Streamlit

Streamlit is the unsung hero of Python libraries. It is a pure Python web framework that allows you develop and deploy your applications as web apps without writing a single line of HTML or CSS, kid you not. For me personally, I started using Streamlit in the summer of 2020 and since then I do not recall ever NOT using it for the scripts that I have written since. Streamlit allows you to instantaneously render your applications with an elegant and highly interactive user interface. For this application we will be using Streamlit to depict our heatmap and data frame on a local browser.

Installing Packages

First thing’s first, go ahead and install the following packages on an Anaconda environment of your choosing.

https://towardsdatascience.com/media/3605c3a091738389bac4cd1e1763c89c

Each package can be installed by typing the following corresponding command into Anaconda prompt.

pip install plotly

Dataset

We will be using this randomly generated dataset, that has a column for the date, hour and value as shown below.

Image by author.

The date is formatted as follows:

YYYYMMDD

While the hour is formatted as:

HHMM

You can format your date and/or hour using any other formatting that suits your needs, but you will have to make sure that you declare it in your script as explained in the following section.

Data Preprocessing

Before we can go any further, we need to preprocess our dataset to ensure that the dates and hours are in a format that may be processed further.

Initially, we need to remove any trailing decimal places from the values in our hour column and add leading zeroes in case the time is less than a whole hour, i.e. 12:00AM quoted as 0. Subsequently, we need to append our dates to hours and parse them in a format that is comprehensible by using the datetime.strptime binding in Python. Finally, we can transform the dates into months and the hours into the 12 hour format by using the strftime function:

https://towardsdatascience.com/media/b7db15e752ef95e2440b786ecb4e3239

In order to use other datetime formatting’s please refer to this article.

Once our data has been preprocessed, we can then use the powerful groubpy class in Pandas to regroup our dataset into time averaged values for months and hours as shown below:

https://towardsdatascience.com/media/44b7c8fe18cc6dac353b17fd5355335e

Please note that other methods may be used instead of averaging, i.e summations, maximum or minimum by changing:

df.groupby(['Month','Hour'],sort=False,as_index=False).mean()

to

df.groupby(['Month','Hour'],sort=False,as_index=False).sum()

Heatmap Function

Now that we have regrouped our data into months and hours, we first need to transform our Pandas data frame into a dictionary and then an array that can be input into Plotly to create our heatmap.

Declare a dictionary and please make sure to add all of the months of the year without the truncation shown below:

https://towardsdatascience.com/media/743fe96ba03aa02422a7efbbbde70d16

Subsequently, we will insert the values from our Pandas data frame into the dictionary, and will use it to create an array of arrays corresponding to the averaged values of each month and hour respectively, as shown below:

https://towardsdatascience.com/media/6bc4356485a3738f1692a9076c55eace

Finally, we will render our Plotly heatmap using the array previously created:

https://towardsdatascience.com/media/ccc5f8a7ba8e1460dcc84446eb53e8e1

Download CSV

You may find it convenient to download your regrouped month vs. hour data frame as a CSV file. If so please use the following function to create a downloadable file in your Streamlit app.

https://towardsdatascience.com/media/5f4503a73aeaef5e28cdeb0be919b7b7

This function’s arguments — name and df correspond to the name of the downloadable file and data frame that needs to be converted to a CSV file respectively.

Streamlit App

Finally, we can combine everything together in the form of a Streamlit application that will render the heatmap, data frame and a link to download our regrouped data as a CSV file.

https://towardsdatascience.com/media/6f1fe9b9017baae2849d5dc2f6d208d3

You can run your final app, by typing the following commands in Anaconda prompt. First, change your root directory to where your source code is saved:

cd C:/Users/...

Then type the following to run your app:

streamlit run file_name.py

Results

And there you go, an interactive rendering that enables you to visualize your timeseries dataset as month vs. hour heatmap.

Author

Hybrid of a data scientist and an engineer. Logistician. Candid. Realpolitik. Unlearning dogma one belief at a time. www.linkedin.com/in/mkhorasani

Write A Comment