Data Hubs: Master Data Repository or Master Data Service?

There are two common misconceptions about today’s data hubs and their capabilities.
Data hubs are popular architectural constructs of the enterprise data management solutions of this decade. Even though the term “data hub” has been used extensively over the last five or six years, many IT professionals use this term as a new word for a more traditional Operational Data Store of the 1980s and 90s.
There are two common misconceptions about today’s data hubs and their capabilities. These misconceptions make data hubs look like their predecessors (Operational Data Stores). As a result, you may underestimate the role of data hubs in enterprise architecture, EDM and MDM solutions - all of which are enabled by data hubs.
Misconception 1:”Data must be cleansed and standardized before it is loaded in the data hub”
For many professionals brought up on the concepts of operational data stores, data warehouses and ETL, it is an undisputable truth that the data must be cleansed before data is loaded into the hub. With this principle, a data hub is just another data repository or database used for storing cleansed data, often used to build data warehousing dimensions.
The current reality is that the concept of a data hub includes a much more active approach to data than just storage of a “golden record”. The data hub makes the best decisions on entity and relationship resolution by arbitrating the content of data in the source systems where the master data is created. Expressed differently, a data hub operates as a service responsible for creation and maintenance of master entities and relationships. The data hub as the enterprise master data service applies the power of both advanced algorithms and human input to resolve entities and relationships in real time
Misconception 2: “The golden record must be persisted in the data hub”
The notion of a data hub as a data repository discussed above presumes that the golden record must be persisted in the data hub. The notion of the data hub as a service does not make this presumption hold. Indeed, as soon as the master data service can deliver the “golden record” to the enterprise, the data hub may or may not persist the “golden record”. The notion of the data hub as a service leaves unanswered the decision to persist or not to persist. A data hub can persist the golden record or assemble it dynamically instead.
One of the arguments for a persistently stored golden record is that retrieval times for the golden record will suffer if the record is assembled dynamically when it is requested. In reality, however, the existing data hub solutions have demonstrated that a dynamic golden record can be assembled with no impact on performance times. One of the advantages of dynamically assembled records is that the data hub can maintain multiple views of a golden record aligned with various requirements, including those for specific lines of business, functionalities, data visibility, tolerance to false positives and negatives and latency.
Mature enterprises increasingly require multiple views for the golden record as they work to align their needs and initiatives with their architectures. Data hubs are at the core of any master data management solution, especially since they overcome the two most common misconceptions outlined here.
Leave a Response

Entries(RSS)