Uniqueness is in the Eye of the Beholder

Who do you see? Which identity would you assign?
I just read an interesting blog post from William Sharp at the DQ Chronicle, and it brought to mind an "ah ha" moment I had several years ago about uniqueness and identity matching. Based on my learning, I realized that what one group considers unique, another groups considers non-unique or a duplicate.
Here's the scenario in a large nutshell (a coconut shell perhaps?):
I've used Intuit products for managing various aspects of my finances and taxes for decades (disclosure: I worked for Intuit from 2001-2007). Over that time, I've had 5+ different mailing addresses, 4+ email addresses (not counting work email addresses), 8+ phone numbers, and I've used several name combinations to establish accounts.
I've licensed one product for managing our own household finances, another product for several small businesses, and I've licensed and used some of those same products for other organizations' needs.
During that period, I've used several different credit cards (I don't even remember how many) – maybe a different one for every license, so let's say I've potentially used eight different cards for purchasing licenses.
Some of those credit cards were issued to me, with similar Personally Identifiable Information (PII) as the info I used to create the account with Intuit. Some of those were issued to my company, or another organization, or to my wife, for example.
When I signed up for accounts for those products, I've used my own name, some variant of my name (nickname, middle name, etc.), a company name, or another person's name.
Finally, over the years I've opted out of being contacted for some products (the really simple ones) but have opted in on being contacted for some more difficult scenarios, so I have had several sets of privacy preferences. I won't mention that our oldest son shares my name - that's a whole additional level of ambiguity!
So, now the big question: How many customers am I to Intuit? Or, "Who is Marty?"
If I am in marketing, I may be one single customer. After all, there's a guy named Marty Moseley living at this address - how many of them can there be at this address, after all? Or, if I am marketing by line of business, I might be three customers: one for each family of products I've licensed over the years.
If I am in finance, I have potentially eight different identities, since the billing address and names on the cards are different, so we have little confidence that these are for the same individuals and a fairly conservative threshold for false positives. You definitely don't want to mix up credit cards and account holders!
If I am in the Chief Privacy Officer's world and I look at privacy preferences and PII, I may align the customers along preferences, since that is what matters most. So, I may be two or three different customers.
After all, you don't want to mix up people and their privacy preferences! That's the mother-of-all privacy nightmares (we just had a prospect call us in a panic because that happened and they now realize the importance of good matching and linking!).
So, am I one customer? Three customers? Eight customers? Why or why not? You see (pun intended), the determination of uniqueness depends on the algorithm and the user asking the question.
There are a lot of therefores, so let me try a few out. Therefore:
1) You need to keep all underlying records of "me" separate so you can create virtual views of "me" according to changing rules
2) You need to be able to run several different (even competing) definitions of "uniqueness" simultaneously and
3) You'd better have some pretty advanced algorithms to help you manage your FP and false negative thresholds!
What other "therefore's" would you propose?
Marty/Martin/wmmarty/wlmarty
12 Responses »
Trackbacks
Leave a Response







Entries(RSS)
Nice post, Marty.
Due to the choice of picture, I can't help but provide a link to my recent blog post:
The Point of View Paradox
Best Regards,
Jim/Jimmy/James/ocdqblog
Unfortunately the world is crowded with Marty’s
Therefore I usually tend to approach data quality (and not at least data matching) in this hierarchy:
• First gain real world alignment by having data correctly represent the real-world construct to which they refer
• Next make sure that data are fit for their multiple intended uses
All the best
Henrik/liliendahl/hlsdk
Great post, Marty,
I find that "worldview" a very powerful concept to use when talking to enterprises on the subject of uniqueness.
Helping enterprise management define where their particular "world" starts and ends greatly assists them in defining what makes one occurrence of an entity distinctly different from all other occurrences of that entity (i.e. unique) in that world.
Asking questions such as, "From the point of view of you as a training organisation, what makes one customer distinctly different from all customers?" helps to focus their thinking.
Too often, of course, the initial response will be, "customer number"! Having then explained that, because a code never "identifies" an entity**, it can never uniquely identify it, we can move on to define true uniqueness as it applies to the defined worldview.
There may be several worldviews within one enterprise. In the training organisation, asking the question, "From the point of view of you as an employer, what makes one employee distinctly different from all other employees?" may give different definition.
Getting organisations to define uniqueness in these terms helps them to begin realise that some of the mechanisms they have previously used to try implement it, such as Customer No, are actually the mechanisms tat allow duplication.
It also moves them closer to being able to develop mechanisms for preventing them creating duplicates in the
their data in the first place.
** See my post on QUACKS and UIDs
Regards
John/johnimm
Hey Jim -
I think I missed that post!
I was really hoping for some images from the "Where's Waldo" series, but they're all copyrighted and licensed, so no go...
It's a better visual, imho.
...or the Sesame St. song "One of these things is not like the other..." (whatever you do, don't sing that song... it'll stick in your head all day)
The best viz image would be a series of pics of me over time, w/ one of my son stuck in the middle - who are the same person and who are different kind of thing...
Cheers!
One of these Martys is not like the others,
One of these Martys just doesn't belong,
Can you tell which Marty is not like the others
By the time I finish my song?
Some of these Martys belong together
Some of these Martys are kind of the same
But one of these Martys is doing his own thing
Did you guess which Marty was not like the others?
Did you guess which Marty just doesn't belong?
If you think it really hard to pick the one that's not like the others,
Well, then you're absolutely...right! You're soooo smart
. . .
As for the original Sesame Street version, I'll let the Cookie Monster sing:
One of these things is not like the other things
Make
it
STOP!!!
ARGH!!!!!
A GREAT discussion on this post has also picked up over on LinkedIn - check it out there for more good dialog: http://bit.ly/cdCiCv
Jeff Huth, another Initiate blogger, has extended this concept in his newest post, Uniqueness in the Eye of the NSTIC, about how government should consider uniqueness of identity. Read it here: http://blog.initiate.com/index.php/2010/07/12/uniqueness-in-the-eye-of-the-nstic/.
What a great thread! I spend my days examining and setting standards for Master Data morning, noon and night! Yes, Master Data is just that exciting! OK, really it can be fun and exciting. Over the past two and a half decades, I have noticed a few things. No one agrees on a clear definition of master data and no one can say definitively for any one company exactly what is master data. Let's try to tackle this with a different approach. I like that fact that a previous author in the thread mentioned data governance.
Data governance is a starting point for those organizations that are comfortable working from the big picture (high altitude) perspective down to the grass roots (table-field) level. It anticipates issues of uniqueness and how to handle the process to find distinctions where not previously identified. Data governance provides the template for the conversation across the organization. Topics should include rules, tools, and standards. Discussions of rules permit a perspective of relationship between business process and the tools of technology (i.e. which ERP system). The objective of a single version of the truth is highly complex when dealing with heterogeneous environments. When corporations have multiple systems communicating in real time, real world conflicts can arise. When we founded Black Watch Data we were looking for tools that provided on the fly, if you will, conversions between systems. We could not find any solid middle-ware permitting this level of communication. Therefore to work we went.
Selecting tools and standards is not easy. It is very easy to miss key elements particularly as so much occurs. Uniqueness is often an issue that arises when the standards weren't fully and completely aligned. Guiding the conversations and negotiations is as important as anything else during governance discovery.
Taking these standards, rules and elements of discovery to the system/technology world is a task in and of itself. Making sure that nuances of business processes are properly permitted and that configuration changes get noted and followed up on requires a highly dedicated staff. My team uses numerous tools that appear to be duplicative in nature, but ensure risk management in that we do not leave any stone unturned. We have a responsibility in master data to make sure that cleansings, enrichments, and migrations occur in a flawless manner such that at the time of go live the project is a raging success. Yet go live is only the first day of the rest of production.
Production today deserves the opportunity to have a continuous cleanse, a constant monitoring of the data, the persistent inspection of the data to ensure that it continues to improve. Quarterly data quality reports should demonstrate improvement rather than the standard degradation. Uniqueness and the many other issues of unplanned data governance can creep into the workday if not considered welcome friends of discovery in data governance.
The summary of the discussion for when there is a lack of uniqueness is going back to governance and what guidance has been provided there. This unfortunately is the hole that exists in many/most companies. Therefore the companies are sent into a fox hole and they anguish with competing values. Business units battle each other to figure out who will win. It is a sad day for master data when those battles occur. It is a great day for data governance and master data when a good plan is put together and a process is in place to resolve/negotiate these issues that will appear in every system sooner or later.
@Andy - thanks for your insightful comments on my post!
I'd love your thoughts on another blog where I discussed the meaning of Master data vs Reference data vs Transactional data (didn't cover audit data in that one).
I think people need to hear your voice on this!!!
Cheers!
Marty did indeed write a post following up Andy Mathewson's great comment. Check it out!
- Crysta Anderson, Editor