We’re excited to bring back Transform 2022 in person on July 19 and virtually from July 20-28. Join leaders in AI and data for in-depth discussions and exciting networking opportunities. Register today!
Every business today is data-driven, or at least pretends to be. Business decisions are no longer made on the basis of hunches or anecdotal tendencies as they were in the past. Actionable data and analytics now power the most critical business decisions.
As more companies harness the power of machine learning and artificial intelligence to make critical choices, there needs to be a conversation around quality – completeness, consistency, validity, timeliness and uniqueness – of the data used by these tools. The insights businesses expect to gain from machine learning (ML) or AI-based technologies are only as good as the data used to power them. The old adage “Garbage in, garbage out” comes to mind when it comes to data-driven decisions.
Statistically, poor data quality leads to increased complexity of data ecosystems and poor long-term decision making. In fact, approximately $12.9 million is lost each year due to poor data quality. As data volumes continue to increase, the challenges businesses face to validate and their data will also increase. To overcome data quality and accuracy issues, it is essential to first understand the context in which the data elements will be used, as well as best practices to guide initiatives.
1. Data quality is not a one-size-fits-all business
Data initiatives are not specific to a single business engine. In other words, determining data quality will always depend on what a business is trying to accomplish with that data. The same data can impact more than one business unit, function or project in very different ways. Additionally, the list of data elements requiring strict governance may vary among different data users. For example, marketing teams are going to need a very accurate and validated mailing list while R&D would be invested in quality user feedback data.
The best team to discern the quality of a data element would then be the one closest to the data. Only they will be able to recognize the data as it supports business processes and ultimately assess accuracy based on usage and how the data is used.
2. what you don’t know can hurt you
Data is a business asset. However, actions speak louder than words. Not everyone in a company does everything possible to ensure that the data is accurate. If users don’t recognize the importance of data quality and governance, or simply don’t prioritize it the way they should, they won’t make the effort to anticipate data issues resulting from a data entry. poor data or to raise their hand when they find a piece of data. problem that needs to be remedied.
This could be solved in practice by tracking data quality metrics as a performance target to foster greater accountability of those directly involved with the data. Additionally, business leaders need to champion the importance of their data quality program. They need to align with key team members on the practical impact of poor data quality. For example, misleading information that is shared in inaccurate reports to stakeholders, which can potentially lead to fines or penalties. Investing in better data literacy can help organizations create a culture of data quality to avoid making careless or ill-informed mistakes that hurt the bottom line.
3. Don’t try to boil the ocean
It’s not practical to fix a long list of data quality issues. It is also not an efficient use of resources. The number of active data elements within any given organization is huge and growing exponentially. It is best to begin by defining an organization’s Critical Data Elements (CDEs), which are the data elements that are integral to the core function of a specific business. CDEs are specific to each company. Net income is a common COE for most businesses as it is important for reporting to investors and other shareholders etc.
Since every company has different business objectives, operating models, and organizational structures, every company’s COE will be different. In retail, for example, EDCs can be design-related or sales-related. On the other hand, healthcare companies will be more interested in ensuring the quality of regulatory compliance data. While not an exhaustive list, business leaders may consider asking themselves the following questions to help define their unique CDEs: What are your critical business processes? What data is used in these processes? Are these data elements involved in regulatory reporting? Will these reports be audited? Will these data elements guide initiatives in other departments of the organization?
Validating and correcting only the most key elements will help organizations expand their data quality efforts in a sustainable and resourceful way. Eventually, an organization’s data quality program will reach a level of maturity where there are frameworks (often with some level of automation) that will categorize data assets based on predefined elements to eliminate disparities within of the company.
4. More visibility = more accountability = better data quality
Businesses drive value by knowing where their CDEs are, who is accessing them, and how they are being used. Essentially, there is no way for a company to identify its CDEs if it has not implemented proper data governance in the first place. However, many companies struggle with unclear or non-existent ownership of their data stores. Defining ownership before integrating more stores or data sources drives commitment to quality and utility. It is also wise for organizations to have a data governance program in place where data ownership is clearly defined and people can be held accountable. This can be as simple as a shared spreadsheet dictating ownership of the set of data elements or can be managed by a sophisticated data governance platform, for example.
Just as organizations must model their business processes to improve accountability, they must also model their data, in terms of data structure, data pipelines, and how data is transformed. Data architecture attempts to model the structure of an organization’s logical and physical data assets and data management resources. Creating this kind of visibility is at the heart of the data quality problem, i.e. without visibility into the *lifecycle* of data – when it is created, how it is used/transformed and how they are produced – it is impossible to ensure true quality of data.
5. Data overload
Even when data and analytics teams have established frameworks to categorize and prioritize CDEs, they are still left with thousands of data elements that need to be validated or corrected. Each of these data elements may require one or more business rules specific to the context in which it will be used. However, these rules can only be assigned by business users working with these unique datasets. Therefore, data quality teams will need to work closely with subject matter experts to identify rules for each unique data element, which can be extremely dense, even when prioritized. This often leads to burnout and overload within data quality teams, as they are tasked with manually writing a large amount of rules for a variety of data elements. When it comes to the workload of their data quality team members, organizations should set realistic expectations. They may consider expanding their data quality team and/or investing in tools that leverage ML to reduce the amount of manual work in data quality tasks.
Data isn’t just the world’s new oil: it’s the world’s new water. Organizations may have the most complex infrastructure, but if the water (or data) flowing through those pipes isn’t potable, it’s useless. People who need this water need to have easy access to it, they need to know that it is usable and uncontaminated, they need to know when the supply is low and, finally, the providers/custodians need to know who is accessing it. Just as access to clean water helps communities in a variety of ways, better access to data, mature data quality frameworks, and a deep culture of data quality can protect data-driven programs and information. , helping to drive innovation and efficiency in organizations around the world.
JP Romero is Technical Manager at Kalypso
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.
If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You might even consider writing your own article!
Learn more about DataDecisionMakers