There used to be a time not so long ago when creating web applications was the work of child prodigies the likes of Mark Zuckerberg and Elon Musk.

Or alternatively, you could enrol in a fancy college, spend the best four of years of your life (and your parent’s retirement savings) learning programming and then end up making subpar 90’s style web apps.

Well, we’ve come a long way since then. With the inundation of open source tools and cloud infrastructure, developing and deploying phenomenal applications has been largely democratized. Honestly, being a developer has probably never been so straightforward, all you need is the correct stack of tools and you’re good to go for most purposes.

I am going to introduce you to the three main tools that I have used abundantly myself to develop frontend user interfaces, provision server side infrastructure and finally to deploy all the goodness to a web server for the world at large. In this tutorial, we will be creating a simple job recommender app.

The user will select the country they wish to work in and will then upload their resume. Subsequently the app will analyze the uploaded file for key words and will search a database of companies to find the most similar matches. Before we proceed, I am going to assume that you are already well versed with Python and some of its packages such as Pandas and NLTK, as well as the fact that you have a GitHub account.

1. Streamlit

Streamlit is a brand new web framework that has all but closed the web development gap for Python developers. Previously, one would have to use Flask or Django to deploy an app to the web, which solicited a sound understanding of HTML and CSS. Thankfully, Streamlit is a pure Python package with an exceptionally shallow learning curve, that has reduced the development time from weeks to hours, kid you not.

While it is branded as a framework for machine learning and data science apps, I find that to be rather undignifying; indeed many (myself included) have used Streamlit to brandish dazzling general purpose apps.

In this section, I am going to show you how to install, build and run a Streamlit app in real time. First things first, shoot up Anaconda and install Streamlit in your environment. Alternatively, you may run the following command in Anaconda prompt.

pip install streamlit

Once you’ve got that out the way, proceed with opening the Python IDE of your choice and open it in the environment you just downloaded Streamlit into. Now let’s create a simple interface where the user can select the country they want to work in and can also upload their resume. Initially, we need to import all the packages as follows:

https://towardsdatascience.com/media/afe7eef710d8905dceac30cf8530028e

Other than Streamlit and Pandas, we will need pdfplumber and PyPDF2 to process our uploaded resume. rake_nltk, nltk and their corpus of relevant words are required to parse the key phrases from the resume text and io is needed to converted the binary resume file into a decoded form that is readable by Python.

Subsequently, we create our user interface by adding a text input for the country, a file uploader for the resume and a multiselect word box to select the acquired key phrases as follows:

https://towardsdatascience.com/media/c2eb10e11095ea97f2a4e00c5038d09c

The function ‘keyphrases’ invoked above, analyzes the resume text to discover relevant keywords by using the nltk library. The function is shown below:https://towardsdatascience.com/media/385feb7a0caa5720691362f1ccf9f550

Finally, we save the Python script in our environment’s directory and execute the following command in Anaconda prompt to run our Streamlit app locally:

streamlit run file_name.py

You will notice that the web app will be forwarded to your localhost port at:

http://localhost:8501

Now you should see the following:

Streamlit page. Image by author.

Go ahead and upload your resume to generate the key phrases as shown below:

Key phrase generation. Image by author.

The beauty of Streamlit is that you can update your code in real time and observe your changes instantaneously. So feel free to make as many modifications as you possibly need on the fly.

2. MongoDB

No web app is complete without a server side component and what better way to learn interacting with databases than MongoDB? While in this case our dataset is structured and using a non-relational database architecture like that of MongoDB is overkill, yet as with all scalable apps at some point you will probably end up dealing with tons of unstructured data, so it’s best that you get into the correct ecosystem at the onset. Not to mention the slew of features MongoDB has to offer, such as full-text indexing and fuzzy matching.

For this application, I have downloaded a public dataset of global companies from kaggle.com. I’ve removed some of its attributes to reduce the total size of the dataset to 100 megabytes; bear in mind MongoDB offers a free tier cluster with up to 512 megabytes of storage. Next, proceed with opening a MongoDB Atlas account, then register an organization and a project. Subsequently set up a free tier cluster called ‘M0 Sandbox’ hosted on AWS’s US-East-1 region. This will ensure we have minimal latency, given the fact that our Heroku web server will also be hosted in the same region.

Building a MongoDB cluster. Image by author.
Selecting MongoDB free tier cluster. Image by author.
Selecting MongoDB cluster region. Image by author.

After you create the cluster, it will take MongoDB several short minutes to provision it the servers. Once the provisioning has been completed, you need to whitelist the IP address from which you will use to connect to your cluster. While I do not recommend it for obvious security reasons, you may whitelist all IP addresses for simplicity’s sake. Select ‘Network Access’ from the ‘Security’ menu as shown below:

Configuring MongoDB cluster’s network access. Image by author.

Subsequently, select ‘allow access from anywhere’ and select the required time frame as shown below:

Whitelisting IP addresses for MongoDB cluster. Image by author.

Next you need to set up a connection string to connect to your cluster remotely. Select ‘Clusters’ from the ‘Data Storage’ menu and then click on connect on your cluster’s window as shown below:

Setting up MongoDB connection. Image by author.

Enter a username and password that will be used to connect to this cluster as shown below:

Selecting username and password for MongoDB cluster access. Image by author.

Once the user has been created, then click on ‘Choose a connection method’ to proceed to the next step where you should select ‘Connect your application’ as shown below:

Creating connection string for MongoDB cluster. Image by author.

When prompted, select ‘Python’ for the driver and ‘3.6 or later’ for the version. A connection string will be created for you as follows:

MongoDB connection string. Image by author.

Your connection string and full driver code should be:

client = pymongo.MongoClient("mongodb+srv://test:<password>@cluster0.nq0me.mongodb.net/<dbname>?retryWrites=true&w=majority")
db = client.test

Copy this string and replace the ‘<password>’ part with the actual password you created in the previous steps.

Now that we got all the logistics of MongoDB out the way, let’s load up our dataset! Select the ‘Collections’ in your cluster, then click on ‘Add My Own Data’.

Loading data into MongoDB cluster. Image by author.

When prompted, enter a name for the database and collection to continue with loading your data. Please note that a ‘collection’ in MongoDB is synonymous with a table in any other database; likewise a ‘document’ is the same as a row in other databases. There are multiple ways to load your dataset into your newly created database; the easiest way is to use the MongoDB Compass desktop application which you can download here. Once you’ve downloaded and installed it, proceed with connecting to your cluster using the connection string you generated earlier, as shown here:

MongoDB Compass desktop application. Image by author.
MongoDB Compass database. Image by author.

After connecting, you should be able to find the database and collection you created in the previous steps. Go ahead and click on ‘Import Data’ to upload your dataset into your collection. Once the data is loaded, you need to create a full-text index so that you can later query your data. A full-text index quite literally indexes every single word in your database; it is an extremely powerful form of indexing that is very similar to what search engines utilize. For this you need to navigate back to your MongoDB cluster on the internet and select the ‘Search Indexes’ tab within your collection as shown below:

Creating a full-text index in MongoDB. Image by author.

Click on ‘Create Search Index’. This will prompt you with the following window where you can modify the index, however you can proceed with the default entry by clicking on ‘Create Index’.

Creating a full-text index in MongoDB. Image by author.

Now that your database has been configured, your dataset has been loaded and your index has been created, you can create the querying function in Python to extract records from the database with the user’s entry as shown with the function below:

https://towardsdatascience.com/media/0c7f5c35cf274940302deb33c7f08e47

The first line in the above gist, establishes the connection to the collection within your database; make sure to enter your password. Alternatively, you may store your password in a configuration file and invoke it as a parameter in your code if you wish to handle passwords more securely.

In order to query your MongoDB collection, you need to use the ‘Aggregation’ feature which is simply a filtering pipeline in MongoDB. The first stage of the pipeline ($search) is the part that utilizes our full-text index; here we use fuzzy matching to match our previously generated resume key phrases with the ‘industry’ attribute in the dataset. The next stage ($project) uses document ranking to rank matched queries in descending order of score. The third stage ($match) filters out documents that do not contain the country specified by the user. The last stage ($limit) simply restricts the number of returned results to 10.

Now that we have sorted out the query function, we need to add a few more lines of code to our Streamlit page code to add a search button that executes the query and displays the search results as shown below:

https://towardsdatascience.com/media/4f3d4f76f1cd81e9bd48cefe0aec6cc5

Job recommender web app. Image by author.

3. Heroku

If you’ve gotten this far, then you’ll be glad to know that you’re just a couple of clicks away from finishing. Thanks to Heroku, deploying a web app to the cloud is literally a couple of clicks. It is worthy to mention that Streamlit has developed its own ‘one-click deployment’ option where you can integrate your GitHub repository with Streamlit and deploy your app free of charge. However, the catch is that your repository needs to be public and until Streamlit releases its enterprise deployment edition which caters to private repositories, then for many this will be a deal breaker.

Before we proceed, you need to create four files that will be deployed with your source code.

1. requirements.txt

You need a requirements.txt file that includes all the packages Heroku needs to install for your app. This can be generated using the ‘pipreqs’ package. Type the following in Anaconda prompt to install pipreqs:

pip install pipreqs

Then change your directory to a folder that only contains your source code and nothing else.

cd C:/Users/..../folder/

Then type the following:

pipreqs

A requirments.txt file will be generated with the following packages:

pymongo[tls,srv]==3.6.1
lxml==4.5.2
nltk
pdfplumber==0.5.24
pymongo==3.11.0
pandas==1.0.5
rake_nltk==1.0.4
streamlit==0.69.2
PyPDF2==1.26.0

2. setup.sh

The setup.sh file tells Heroku how to configure the Streamlit app and can be created using a text editor such as Atom and must be saved with a .sh extension.

https://towardsdatascience.com/media/eff20f91e72227cb094c3e02d40fe88c

3. Procfile

Similarly the Procfile is a configuration file that tells Heroku to run our source code on startup. It can also be created with the Atom text editor and please note that it has no extension.

https://towardsdatascience.com/media/1cc5c0f36a9f7385753daf9ac4ce32d2

4. nltk.txt

The nltk.txt file tells Heroku which corpus of words to use. Simply add the following to the file and make sure to save it using the ‘LF’ UNIX formatting for the text file.

wordnet
pros_cons
reuters

Once you have your source code and these four files ready, upload them to a private GitHub repository (so as to not make any passwords public). Then head over to Heroku and open an account if you haven’t already done so. We will be using the free tier Dyno, which is more than enough to deploy an experimental app. Proceed to creating a new app by selecting the ‘New’ button below:

Creating Heroku app. Image by author.

When prompted enter the name of your and select the ‘United States’ region.

Creating Heroku app. Image by author.

Subsequently, select the ‘GitHub’ deployment method and your repository as shown below:

GitHub integration with Heroku. Image by author.

Scroll down to the bottom, select the correct branch of your repository and then select the ‘Deploy Branch’ button to deploy your app. Sit back and relax while Heroku effortlessly builds your app. Once it is done and assuming that there were no errors, you will be given the link to your web app. Given that you are using the free tier, the first time you run the app it will take a while for it to crank start.

Since your Heroku account is integrated with GitHub, you may update/modify your code at any time and redeploy the app as before. With Heroku, you can even set up automated actions so that whenever your repository is updated the app will be automatically redeployed. You can also add data to your MongoDB database at any time and it will be instantaneously reflected in the web app. Hence the name ‘scalable web apps’. Congrats on deploying your web app using Streamlit, MongoDB and Heroku!

Original Source

Author

Hybrid of a data scientist and an engineer. Logistician. Candid. Realpolitik. Unlearning dogma one belief at a time. www.linkedin.com/in/mkhorasani