What ARE “Master Data?”

Do you concur with the definition of master data?

Do you concur with the definition of master data?

My last blog post discussed the relationship between Data Governance and MDM - addressing the question of whether DG "owns" MDM or if they're separate initiatives, and which influences the other. In that blog I mentioned that we should address the definition of "master data."

This is a very interesting question to nerdy folks like me (and a good number of other Information Quality folks).

At the Enterprise Data World conference a few months back, David Loshin asked this question to a vendor panel on which I sat. It took us three iterations to get to an answer he was satisfied with, but unfortunately, I don't remember what the right answer was. That's so me...

So, here's an attempt to define "master data" (deep, cleansing breath...).

First, let me distinguish between four "types" of data (yeah, I can feel it going off the rails already and getting peoples' hackles up!):  Reference data, Master data, Transactional data and Audit data.

Reference data are typically:

  • Fairly atomic in nature in that they are not comprised of several different kinds of elements to create a new, composite object
  • Used as "building blocks" or "atoms" for more coarse-grained business objects (used to create Master data and Transactional data),
  • Often created by an external authoritative source (the US Postal Service defines postal codes, international standards bodies define standard units of measure for shipping containers, countries define lists of states and territories etc.)
  • Represent fixed lists of values such as states within a country, months in a period, or stock symbols in a trading list

Reference data are fairly easy to distinguish if you have some domain knowledge. They are similar in some ways to enumerated lists of domain values in data models (another topic).

So, these lists are relatively static and short, and they are usually comprised of a couple of identifiers, a few textual depictions of the values and a definition of the meaning of the value. But a state code is not the same thing as master data.

Master data are those which are "essential ingredients" of transactions. Reference data are also essential to transactions, but are most often used as qualifiers for master data on the transaction (see lists below).

Master data also have to be "unique" with their larger set, but as I often say "uniqueness is in the eye of the beholder" and the determination of "unique" with master data is more difficult than with reference data.  Even though master data have some similar characteristics to reference data, they have a few key differences:

  • They're more complex objects, formed of several kinds of reference data bound together to create a complex record of something that's used to create myriad business transactions
  • They're usually created, updated, managed, used and propagated from multiple systems of record, each according to the systems' own private rules. (Reference data shares this characteristic a little bit, but the authority for defining reference data is not from within the application using it; it's usually owned by an external standards board, agency, country etc.)
  • There is enough agreement about the business value of the master data that entire data quality initiatives are funded to improve the quality of master data, complete with headcount. Reference data are usually "purchased" from an external source and are much simpler to manage (with exceptions, of course)
  • They matter so much to the organization that if the data were compromised there could be serious impacts to the organization

Trying to define these without using actual master data as examples is difficult and I invite your additions! Here are some examples. You'll see how master data are comprised of reference data:

Reference Data

Countries
States/Provinces/Territories
Cities within States/Provinces/Territories
Street names within cities
Street types (Dr, Lane, Boulevard...)
Street direction (N, NE, E, S...)
Blocks & Block Groups
Postal codes within states
Months of the year
Name prefixes, suffixes, titles etc.

Master Data

Customer (made up of 1:m names (with standard prefixes, suffixes, titles etc.), 1:m addresses (made up of country, state/province, city, street number, street name, street direction, postal code, etc), 1:m phone numbers (made up of country code, area code, exchange, number, extension, phone type/role), 1:m email addresses (made up of the address, domain, domain type/extension)...)

Product (made up of 1:m names, description, measurements of size, weight, minimum order quantity, bill of material, version...)

Transactional Data

Sales order
Product registration
Product shipment

Ok, so I know I've missed a lot, given the brevity of a blog, and I know I've irked folks in the financial sectors when characterizing reference data as fairly simple objects (trading symbols and other reference data for managing trades change all the time). But, what do you think?


Tagged as: , ,

10 Responses »

  1. Nice post, Marty.

    When it comes to this common question, I usually quote Peter Benson of the ECCMA, who explains that “data is intrinsically simple and can be divided into data that identifies and describes things, master data, and data that describes events, transaction data.”

    Therefore, master data is an abstract description of real-world entities. Transaction data is an abstract description of real-world interactions involving two or more of these entities.

    Using an admittedly simple (and fictional) example:

    Michelle Davis-Donovan purchases a life insurance policy for her husband Michael Donovan from the good folks at Vitality Insurance.

    In this example, Michelle Davis-Donovan and Michael Donovan (customers), the life insurance policy (product), and Vitality Insurance (vendor) are all master data objects, and the premium payments that Michelle sends to Vitality Insurance exemplify the transaction data involved.

    We could also further refine the roles of Michelle and Micheal to be Beneficiary, and Insured, respectively.

    Using a real-world example from professional baseball. Last night, Clay Buchholz of the Boston Red Sox pitched 7 innings, allowing three hits, three earned runs, and four walks in a 11-0 loss to the Cleveland Indians.

    In this example, Boston Red Sox and Cleveland Indians (teams) and Clay Buchholz (player) are all master data objects, and 7 innings, three hits, three earned runs, and four walks (player statistics) and 0-11 loss for Boston and 11-0 win for Cleveland (team statistics) exemplify the transaction data involved.

    We could also further refine the role of Clay Buchholz to be Starting Pitcher.

    Additionally, an example of Reference Data, would be the description of the Boston Red Sox provided by Major League Baseball, which would include information such as Boston, Massachusetts (Location), Fenway Park (Stadium), 36,984 (Seating Capacity), John Henry (Owner), and 7 (Number of World Series Championships, most recently 2004 and 2007).

    Best Regards,

    Jim

    P.S. GO RED SOX !!!

    :-)

  2. Thanks so much again Marty, nice break down, and for Jim for his real world examples... which I always enjoy reading posts from both of you.

    Marty, I try not to make it complicated... for us to go to the business and analysts who demand the best data, how would you explain Master Data to them.

    What I do, I say that Master Data is the data that drives the business, it is the information that describes the "thing" you want to better understand. Now I know you are going to say, "but Garnie, you just lumped Reference, Master & Transactional Data together" and would be absolutely right. Reason I do this, is to keep it simple. Helping the user community to buy into the concept and get them to better understand what is involved in "mastering" the data.

    From an architectural point of you, you are correct. There are data types that define, reference, and explains (transactionally) what the object is or associated with..

    Most people can break it down: who are they, where are they (in a point of reference), and what did they do... WHO WHERE WHAT, which both you and Jim did an excellent job of sharing.

    So the next question: what about HOW and WHY. These elements of Master Data are just as important, which most of us know all to well... HOW do you find the data, what are the policies that govern the data etcetera, etcetera, etcetera... WHY ? is there business problem you are solving, elements you need, clarity of the customer base, suppliers to a product, agility ?

    OK... I will stop there... my point, keeping it simple to sell to the organization. I have been winning with the WHO, WHAT, WHERE, HOW and WHY (sometime WHEN) explanation.

    That is how I explain Master Data to folks who are new and want to learn more about "What ARE Master Data"

  3. Great posts and comments. Someone smart once said that knowledge begins with the clear definition of terms.

  4. I like the article, it gives a good explanation. Put a CRM, a MDM and a DWH expert in one room and they have all different opinions what master data is and what is consists of.

    Especially DWH experts tend to compare it to dimensional data and fact data. Dimensional can be compared to reference data. And pure dimension table is a collection of only primary keys and a descriptive meaning to those keys. In the fact table the combined foreign keys which relate to the dimensions are stored. Therefore it is very comparable to transactional data.

    In my opinion master data is something in the middle between dimensions and facts. It is a collection of both items. A customer could not only identified and especially located or targeted by his or her "collection" of reference data, but you can identify a customer also with a set of transactional data. E.g. the items I have bought at Amazon will probably identify my uniquely, especially if prices and dates are included in these transactions. Yes, indeed it will be much harder to find me using such a set of data than my email address, phone number or SSN.

    Best regards,

    Ramon

  5. Marty,
    Great essay. To the list of terms needing definitions related to Master Data I think we will soon need to add Supply Chain Master Data and Instance Data. Supply Chain Master data is the same as Master Data in that sense that they are "essential ingredients" of transactions, but the "Supply Chain" qualifier identifies that the source of this master data is one or more of your trading partners in a supply chain. For example the product master data that a wholesaler or retailer relies on may be defined and maintained automatically by the original manufacturers of the products and arrive in their systems through a global data synchronization mechanism like GS1's GDSN. Dealing with Supply Chain Master Data will add an extra layer of complexity over traditional Master Data.

    Instance data exists once products or assets are serialized and it refers to the data associated with a given serialized unit. An example of instance data would be the lot number and expiration date on a unit of sale of food or a pharamceutical within the supply chain.

    See my essay on these types of data at Master Data, Supply Chain Master Data and Instance Data .

    Dirk Rodgers.

  6. Having moved from being a techie, into supporting Sales, Finance and now into Marketing it is always interesting to see the various views of data from these perspectives and agree that dependent on your expertise, data opinions will vary.

    I do have one query though, what is the definition for Audit data ? How is this different to Transactional data ? (Is it data about data that can help drive data quality ?) it's mentioned as one of the four possible data types, but no mention of what it means - any insight please ?

  7. Hey Adam -

    I thought no one would ever ask! Thanks!!!

    You can look at Malcolm Chisholm's work on this it's enlightening (http://www.information-management.com/issues/20060401/1051002-1.html).

    My quick take is that audit data are data that are "metadata about transactions/events." these are data about when the transaction occurred, from where, what spawned/triggered it and other contextual info. Audit data can also include data about the results of the transaction and actions taken as a result of a transaction/event.
    It's similar to data you'd expect to see in a log file, as well as in a metrics DB.

    Hope this helps!

  8. I am currently involved with my company in a data management effort and we have similar discussion around 'master data'. I know basically most opinions/reasoning frameworks from veteran authors in the field (Chisholm , Loshin, etc.) but I still feel that something more subtle is missing in their examples. I will like to give a real example from my industry:

    In the telecommunication industry the classical 'sales order' somehow also exists but it is typically executed within/or even creating a customer agreement. The difference here (from buying a bike) is that when purchasing a wireless post paid price plan for example ("the product"), one ("the customer") is usually bound to a period of time of for using it (in my country here in Europe 2 years typically! as you don't get the 1 USD mobile phone really for 1 USD :-) ).

    The customer agreement is an E2E business process (a constrained sequence of events) in the problem domain and not an Entity (like a business party); it gets created with an initial order transaction, then different booking transactions of supplimentary products or cancelations can take place within this process; there are agreement prolongations transactions (a MUST for telco companies), there are agreement switch transactions (post paid to pre paid),etc. Simply put, it is the agreement that is the heart of the business !

    Now it is obvious that in the agreement-based transactions, 'no brainer' master data participates: product, Customer...BUT WHAT about the business object that stores the state of the agreement business process in the solution space, as we will have an "Agreement" Type in the database for recording the state of the contract. IS THIS ALSO MASTER DATA ? I would tend to say yes, but I don't understand why it isn't mentioned at all ....

    In a more general sense, I would tend to say that 'Entities' of the real world (Parties) or of a constructed business world (Price plan prodcut, Insurance Policy Product) are the obvious master data; I feel however, that there are also other Business Objects modelling E2E Processes around Business relationships (like Agreements) and these have to also referenced in Business transactions. BTW , are business transactions from this point of view 'atomic events' ? an E2E process definitively has its own lifecycle.

    Coming back to the 'Sales order', which was classified as Transactional data; What if a Bike Business would permit the following sequence of interactions:

    if I am purchasing a Bike today (Transaction=New order), cancel the Order tommorow (transaction=cancel Bike Order, given it was not shiped yet=business rule) and than Cancel the Cancelation after tommorow (transaction= cancel cancelation) ...would Sales Order still be Transactional data ... it obviously at least participates in all the named Transactions...

    Any similar concerns, thoughts, comments ?

    Thanks in advance!
    Adrian

Trackbacks

  1. Tweets that mention What ARE “Master Data?” | Mastering Data Management -- Topsy.com
  2. Top 10 Posts of 2010 | Mastering Data Management

Leave a Response