Data Standardization: Why is it so difficult?


In the last months several announcements have changed the fast-evolving concept of data standardization. Even though standardization isn’t a new concept at all yet the increase demand from the market for a tangible solution and recent changes have increased the focus, the attention and consequently the results.

Allow us to start with a very simple and possibly provocative statement: “data standardisation is a business need since the very first day that scientific data moved into computerized systems. The value of data is in their aggregation and analysis, enabling the access to meaningful information for business and research projects”

Data Standardisation: The Root Cause

Since the data moved to computerized systems, scientists are facing the challenges of data integration and easy access to the information in their day-to-day activities. Till now, the data generated, and information collected from the laboratory instruments and software systems are created with a specific format developed by the vendor. The supplier’s interest was and will remain to protect their Intellectual property in the company business interests. There is a lot of science, informatic engineering, efforts and cost behind the design of collecting data from a technical platform. It is the core of each specific technology that gives a competitive advantage to the vendors. Yet this business interest might be of a negative impact soon and as a boomerang it might impact the incomes if the companies do not react quickly enough to the customers’ demands.

Data Standardisation: Main Reason for it

Clearly customers’ interest clashes with the providers’ ones since a while. The users aim for freedom in selecting and purchasing the technical platforms of their choice fitting with their requirements, without constraint for aggregating and integrating their data and taking maximum advantage of the generated information.

The end goal is fast access to the data, easy analysis in order to extract the information that would allow the scientists to move forward with their experiments or the quality team to release batches in more effective and quicker manner.

This is only possible when the data are generating meaningful information.

Very often our customers’ requests are about improving the searches of information and possibilities to enhance their daily operations. This is the moment to discuss strategies for data standardisation.

“The real advantage of data standardisation resides in better decision making with easy access to meaningful information”

Data Standardisation: The Available Solutions

Fact is that, nowadays, several initiatives have gathered together key players from the industry, users and providers, and have reached the key milestone of agreeing and finally allow the data collection using consolidated standards.

We´ll introduce you to three of them actually releasing latest versions of their solution.

The Allotrope Foundation is an open industry consortium of leading pharma, life science companies and suppliers with the objective to build a common language, an open framework managing analytical data throughout its complete lifecycle, using a common set of standard tools that could  be used to collect the data in an harmonized format and easily interpret the data produced by the various technical platforms. The foundation aims to make the intelligent analytical laboratory a reality – an automated laboratory where data, methods and hardware components are seamlessly shared between disparate platforms, and where one-click reports can be produced using data generated on any analytical instrument.

The SiLA (Standardization in Lab Automation) consortium is a not-for-profit membership organization formed by software suppliers, system integrators and pharma/biotech companies. This consortium develops and introduces new device and data interface standards allowing rapid integration of lab automation hardware and data management systems. Highly skilled experts of member companies contribute in SiLA's technical work groups. Membership is open for institutions, corporations and individuals active in the life science lab automation industry. The SiLA consortium provides professional training, support and certification services to suppliers and system integrators implementing SiLA compliant interfaces.

The Pistoia Alliance has been conceived to address precompetitive collaboration opportunities in life sciences research and development with a mission to lower barriers through its established framework for open innovation. Identifying the root causes, developing standards and best practices, sharing pre-competitive data and knowledge, and implementing technology pilots. Project areas have included a broad range of challenges, e.g. user experience in scientific software, chemical representation of macromolecules (HELM - hierarchical editing language for macromolecules), controlled substance compliance, ontology mapping, artificial intelligence/machine learning. The convergence of the lab and the bedside offer many new opportunities for alliance members to collaborate around instrument interoperability and integration, the role of the cloud and Internet of Things (IoT).

All in all, the eCollect step of the eData lifecycle requires this initial step while taking the data from the hardware. Without this initial step, data will continue to be tightly linked to the proprietary format generated by the different suppliers.

This is a critical step that allows for example archiving of the data in a single format to be re-used at a later stage without relying on the availability of the technology originally used to generate the raw data.

Data Standardisation: Why is it so difficult?

Turning data into actionable insights isn’t straightforward. There are multiple reasons why it is so difficult. Beside the providers specific format limitations, the organisations need to change their paradigm when selecting systems. Instead of thinking how to cover their sample management processes or more generally their R&D or quality processes, organisations should seek for understanding how to convert their data into information, focusing on the entire lifecycle of their data and invest efforts in ensuring that it is optimized adequately.

The very first step of the data lifecycle is the generation and collection of the raw data. Since the instruments generate data in format file designed to be specific and closed by their providers according to their own standards, here starts the standardization challenge.

The initiatives that are currently developing in the market are great opportunities for both vendors and customers. Suppliers demonstrate real interest to support customers in taking advantage of the data generated by their technologies.

Customers can finally rely on a simplified access to original data with higher flexibility.

 

On a next article we will develop further on the concepts of data aggregation and data catalogues, unavoidable tool to ensure that all the data generated are not just stored as unusable bit but the beginning of a completely different future for the companies and their end customers, for all of us, in our benefit as potential patients or consumers.