Transforming Data Warehouses With SOA

There are lots of bright ideas about leveraging SOA to improve your data warehouse.

In my previous two blogs I ranted a little bit about what was broken with current data warehouse ecosystems.

The first post reviewed the extreme pressures we’re under today to make better business decisions with more risk, less time, greater transparency and more obstacles than ever before. Then, in the last blog I discussed architectural problems.

So in this blog, I’d like to start talking about remedies – enough throwing stones!

Let’s focus on SOA in this blog. Remember, SOA is about architecture – it’s not just using web services. I wish I could devote 10 blogs to SOA because it’s that important [ed note – you can!]. But there’s a lot of great material out there on SOA so I’ll stick to my knitting. Here are some examples of leveraging SOA in your DW ecosystem:

Decoupling
One of the main principles of SOA is decoupling computing assets from one another, which decreases the ripple effects and change management when inevitable changes happen.

Instead of extracting data from a source system’s database and inserting it directly into another physical schema, decoupling would cause the data to be extracted and converted into a form that is independent of any specific system’s schema or unique constraints.

Autonomy
Autonomy follows decoupling, and means that each service can evolve independently without being impacted by any assumptions about the other services with which it interacts.

If endpoint systems’ services refrain from imposing their own physical implementation nuances and constraints on all other participating systems, then it makes it easier to build systems that are self reliant and impervious to change, instead of being interdependent.

Integration via Document Exchange
When decoupling services, you move data via documents. These documents, most commonly codified as XML files, are created by one service to be consumed by other services. Being autonomous, they contain little or no private data, few constraints about sending or receiving services, and no assumptions about how the document will be processed by participating systems.

Canonical Schemas
Although not a strict SOA tenet, the best way to create shared documents is to make them “generic” to facilitate sharing and reuse.

By forcing endpoint systems to map their own internal, private structures to a reusable generic schema, systems can communicate more easily, because they do not have to know anything about any other system in their universe. Now each system only needs to know how to translate its private representation to/from a generic schema. Much simpler!

Integration via Messages
What if instead of receiving terabyte-sized batch files, the staging portion of a data warehouse ecosystem could ingest transactions as they occur from a message bus? This is a huge mental shift for many data warehouse architects, who have focused on moving ever larger files around.

But imagine the simplicity of simply subscribing to those business objects, and not having to care about how many internal tables, rules, constraints, processes, intermediate forms, and systems were involved in creating it? This would eliminate an order of magnitude of work from the data warehouse architecture.

Ok, next Thursday, it’s MDM! Your thoughts on the connection?


Tagged as: , , ,

Leave a Response