The pandemic-driven shift to a new virtual world of business has accelerated the pace of digital transformation. Employees, business processes, and products need to be supported and shift to a virtual environment for enterprises to stay competitive and survive. It’s equally important that data and analytics become as agile as other business elements as the variety, volume, and distribution of data continue to grow, unencumbered by external world events.
Data analysts and business analysts rely heavily on a fit-for-purpose data environment that enables them to do their jobs well. These environments allow them to answer questions from management and different parts of the business. These same professionals have expertise in working and communicating with data but often do not have deep technical knowledge of databases and the underlying infrastructure.
For instance, they may be familiar with SQL and bringing together data sources in a simple data model that allows them to dig deeper in their analysis, but when the database performance degrades during more complex analysis, the depth of infrastructure reliance becomes clear. The dreaded spinner wheel or delays in analysis make it difficult to meet business needs and demands. This can impact critical decision making and reveal underlying weaknesses that get in the way of other data applications, such as artificial intelligence (AI). These indicators of poor performance also show the need for scaling the data environment to accommodate the growth of data and data sources.
To relieve frustration and deliver a better analytics solution and experience for the organization, data and business analysts must focus on strengthening the three pillars of data analytics: agility, performance, and speed. The overall data architecture, supported by a sound data strategy, is the foundation for enabling these pillars. A reliable and fit-for-purpose infrastructure is essential for handling increasing data volumes and can also accommodate more advanced and more performant analytics models.
Data architects and data engineers provide the expertise to proactively identify weaknesses in the underlying infrastructure and existing data models that might impact performance and subsequently the end-user experience. Reviewing and addressing these weaknesses makes the overall environment more flexible and provides room for a more agile approach to data analytics. With the right infrastructure choices and architecture, an organization can achieve better performance, which is reflected in the user experience of analysts as well as that of stakeholders across the business as they consume and interact with information.
Strengthen Agility by Solving the Model Problem
When we think back to how we used search engines 20 years ago, it seems almost prehistoric. We had to use specific syntax and formulas to get relevant results. Today, these engines account for poor spelling and grammar, often offering the answer before the query is fully typed. Using natural language to find information is as much a consumer demand as it is a business demand. The underlying data drives the suggestions for autocompleting a search query, but this process relies heavily on the data model.
A data model is only as good as the information used to create it and the requirements communicated by the business to the data engineers. Ideally, a data model brings together all the relevant data and relates tables from different data sources to one another so they can be queried by analysts in their entirety and in relation to one another. Otherwise the information and the value of the insights that analysts produce are limited.
For instance, when working with financial data, the business wants to understand transactions within a certain time frame to identify customer purchasing behavior. On the surface, this seems like a simple request, but only after detailing what makes up a “customer” and their “behavior” will it become clear what data is required. If only transaction data is taken into account, the insights that can be gained are limited. When the business includes information about the customer segment, past purchases, the customer’s occupation and income, their location, and preferences, the data model enables the analyst to create a more complete picture of the customer.
To account for all the necessary data that analysts will require, data engineers, data analysts, and business stakeholders must communicate with one another to outline business requirements, the intended use of the data, and potential (current) limitations of data access. Such communication requires ongoing conversations, asking the right questions, and agreeing on the business needs and timelines everyone is working toward.
One challenge for the organization’s data model is the constantly changing business environment, especially right now during a global pandemic. A rigid data model may still be suitable for certain data processes and to achieve specific reporting outputs. However, it can get in the way of running an organization flexibly. At the heart of the data strategy should be the consideration of the business’s current operating environment.
One method of creating an agile data model is to use data vault modeling. With a data vault, the organization can grow its data volumes unhindered and can respond to rapid business changes, keeping their data model flexible while maintaining a detailed data catalog. This proves useful for compliance and auditing requirements because a full history of the data is available.
Strengthen Performance by Including Semistructured and Unstructured Data
Although trickier to analyze than structured data, semistructured and unstructured data are far more widespread in the enterprise. IDC projects that 80 percent of worldwide data will be unstructured by 2025, with the vast majority of this influenced by the rise of IoT. Determining what data is structured and what is semistructured or unstructured is not always clear. Sometimes unstructured data is defined and processed in a structured way (i.e., log data in CSV format), and other times, data that is schema-defined isn’t necessarily structured. These discrepancies throw a proverbial wrench into data analytics.
To stay competitive, businesses need to bring a broader array of data together in a 360-degree view to support deeper, more accurate, and more precise analytics insights. Supporting semistructured data formats such as JSON is now a business imperative because they offer potential for business advantage to any company that handles and analyzes them well. AI algorithms can help extract meaning from large volumes of unstructured data, driven by data scientists and data analysts with deep expertise in developing the right models and approaches to work with this data.
Strengthen Speed (And Performance) with GPUs
Speed and performance for your analytics environment go hand in hand. Enabling your data professionals to stay in the flow of analysis requires a performant architecture that delivers data to applications without delay. There is the speed with which your technology handles the analytical queries. Performance goes even deeper. It includes the complexity of your data model and whether it handles massive data volumes effortlessly and ensures that everyone can use the system when they need it, without having an unsatisfactory user experience.
As your data architecture evolves, so does the technology and — more specifically — the hardware that makes it all happen. Leading companies are going beyond the traditional use of CPU power and are embracing graphics processing units (GPUs) for their AI and machine learning (ML) applications. This allows them to develop, train, and run their analytics models faster, which in turn can lead to better products and services for their customers.
Whether it’s a healthcare provider delivering more accurate, faster diagnoses or a retailer creating more personalized customer experiences, more enterprises are adopting AI, with 53 percent of enterprises using it, according to a 2018 Forrester survey of data and analytics decision makers.
With GPUs, organizations can massively parallelize operations to support the training of analytics models and/or inferencing, providing the scale and performance required to efficiently complete multiple epochs in a shorter time frame and to fine-tune the model. Additionally, using GPUs in the cloud offers the ability to run different AI/ML workloads with the flexibility needed for a cost-effective, scalable, and secure AI solution.
The long-term outcomes of this acceleration are yet to be seen, but one thing is certain: being agile in the face of change is critical, now more than ever. By evaluating how an organization is truly meeting the three pillars of data analytics, business leaders gain insights into where change may be welcome, where it is needed, and where it is critically necessary if the business is to survive as the pace of transformation and volumes of data continue to rise.