Data warehouses have been around for several decades, serving as a central repository of integrated data from disparate sources that meet an enterprise’s reporting and data analysis needs. With the recent advent of newer analytical capabilities driven by machine learning algorithms and a wide array of computational paradigms and data formats, the data warehouse is undergoing a rather rapid modernization into an analytics platform and ecosystem.
While there are several aspects that make up what can be called a modern analytical ecosystem, an important ingredient is a forward-looking and modern application platform. A modern application platform in a modern data warehouse empowers rapid deployment of custom applications that are flexible, powerful and polyglot. It is through custom and use-case-specific and complex applications that data scientists and business analysts ultimately extract meaningful and non-trivial insights in creative and well-governed ways. Among other things, a modern application platform offers easy external integrations, bring-your-own-tools flexibility, rapid prototyping and deployment, application discovery and robust and inclusive developer support.
The following tips will help you modernize your modern application platform (also known as your analytical ecosystem or data warehouse) and take it to that next level.
Multiple Language Support
Using Open Datasets
A data scientist’s typical day includes working with various datasets. Often times, the experimentation includes working with known public datasets first and extending and customizing the models to real datasets residing in the data warehouse. Easy or one-click access to open and public datasets that can be housed close to and be accessible in the context of the analytical ecosystem is a time-saving function. It is an important step toward a seamless application development experience. Kaggle, Google Public Data, Archive.org and “awesome public datasets” on GitHub are some of my favorite open datasets online.
Using Containerization To Your Benefit
The power of containerization has surely and quickly touched the analytics and data warehouse world. Building and packaging complex analytics outcomes as a set of interrelated, but isolated and portable containers, has huge appeal in a world where an analytics solution is built using an amalgamation of multiple open-source and closed-source functions, libraries and services as building blocks. A modern application platform can use Kubernetes as an optimized container orchestration platform in tandem with an associated container registry to make it easy to deploy and operationalize container-based analytics solutions. Furthermore, easy integration with a public container registry — such as Docker Hub — makes it a delight to easily pull and work with thousands of containerized analytics libraries and functions that are growing by the day.
Going Serverless With Analytics
With the rapid adoption of serverless computing, such as AWS Lambda, a similar pattern of serverless analytics is gaining traction. Business analysts and analytics developers don’t want to worry about deployment and only want to focus on their analytics functions and models, whether the modern analytical ecosystem is in a public cloud, on-prem or a hybrid. Offering an embedded serverless framework, such as OpenFaaS, Kubeless, Apache OpenWhisk or Fission, gives a modern application framework a platform-agnostic edge to offer serverless analytics support.
One can see how function as a service (FaaS) registry can serve as a foundation for an analytics function store — and eventually pave the way for stitchable functions that allow you to create a pipeline or workflow based on multiple functions.
Support For Varied Applications
Interoperability and extensibility for bring-your-own-analytics tools is a key aspect in a modern application platform in the presence of hundreds of powerful open-source and third-party libraries and tools, such as JupyterHub, Knime and Tensorflow, that are a data scientist’s best friends. For a large enterprise running a large analytical ecosystem, there are often hundreds or even thousands of varied users in the analytics community. A modern application platform offers an analytics app marketplace, where a large community can easily share and reuse analytics ideas built and deployed as byte-sized reusable and redeployable apps. In addition, an analytics app discovery and governance platform offers necessary and secure scaling attributes.
Life Cycle Management
Finally, and probably needless to say by now, having robust model deployment and life cycle management capabilities built right into the analytical ecosystem’s application platform is central to building, deploying and managing production-grade machine learning models that can run close to where the data resides.
The Future Of Analytical Ecosystems
Though the industry is broadly coming to terms with what a modern data warehouse looks like (advanced analytics, data lakes, open data formats, new engines, hybrid cloud, etc.), the elements of what makes for a modern application platform within an analytical ecosystem are rapidly evolving. There is an interesting pattern in the industry’s growing realization and appreciation of data gravity. Data warehouses are ultimately culminating into comprehensive and extensible analytics platforms. This shrinking gap between a data warehouse and an advanced analytics platform brings a new dimension to what a modern application platform should cater to.