Copy Management as the Answer?

The modern-day solution to the ‘data into intelligence’ process has been through the implementation of Data Marts and Data Warehouses. This ‘science’ is rather mature these days but there is still incredible confusion as to what they are, how they should be structured and who should use them. Incredibly, many Data Warehouses still under-perform, so in this chapter we will introduce the different architectural concepts and define clear roles for the Data Mart and Data Warehouse.

Basically, all contemporary solutions to the Business Intelligence challenge involve copy management – copying data from its source system to another ‘place’ where it can be used to provide reports and answer queries. There are several alternatives in architecture, all of which are used today. Whilst we read them we should bear in mind that copying data is an architectural sin. It always costs money and brings opportunity for error. It should be avoided at all cost.


The Data Mart

Basically, Data Marts are the fundamental elements of Business Intelligence today. We may not like it, it may be a flawed strategy, but the fact is that there are hundreds of thousands of Data Marts out there because they make people happy in many ways:

  • Ø They provide value.
  • Ø They are easy to build and maintain.
  • Ø They give the users a sense of power.
  • Ø The provide the vendors with lots of income.

In short, everyone wins except the purists … or do they? What Is a Data Mart?

Everyone knows at least two generic things about data. They know that important data is spread all over the organisation, making it impossible to correlate in any meaningful way, and they know that nearly all reporting needs data from multiple systems.

We have already discussed these two statements in great detail, and so, given that these presumptions are true and not likely to change any time soon, there has been one overwhelmingly attractive solution. The Data Mart.

The Data Mart is a set of data copied from another system or systems that is stored in a single database and is structured in a way that supports certain pre-defined business reporting activities.

Probably the key characteristic of the Data Mart is that it supports a pre-defined type of business information requirement, and its data can thus be structured in a way to enhance performance and ease of use for these specific requirements. Because Data Marts have limited scope it is common to find many of them in the same enterprise, and because they are usually small in ‘size’, cost is not high and so Data Marts can be ‘purchased’ without complex ROI cases and CFO approval. On the same theme, and once again because the scope of the Data Mart is limited, it is often possible to avoid high-cost items common in the Data Warehouse world, such as complex ETL and backup/restore capabilities.


What Are the Pros and the Cons of the Data Mart?

Let’s start with the good:

  • Ø Generally the total cost of ownership of a Data Mart can be low.
  • Ø Performance for a known set of queries should be good.
  • Ø Change requests should be easy to execute.
  • Ø Privacy and security issues are generally simple.

Let’s move on to the bad:

  • Ø When an organisation examines the cost of all Data Marts deployed, the cost can be surprisingly high.
  • Ø This logic is also true about total effort and total cost of ownership.
  • Ø By splitting the overall problem into many parts (marts), computers and software are not used in an efficient way.

Let’s now finish with the ugly:

  • Ø Data Marts are rarely treated with the rigor of a Warehouse or operational system. Too often the data is corrupt, incomplete or too old, and the really ugly part is that too many times, reports from Data Marts are badly misleading or simply incorrect.

About bibongo

I'm a consultant in the field of Business Intelligence and have been since the mid 80's which gives you some idea of my age! I'm priviledged to have held senior positions with Teradata, Oracle, Hp and EMC. I have an English son and a Swedish daughter seperated by some 18 years which is another type of welcome challenge!
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Copy Management as the Answer?

  1. Sometimes the data must be migrated from one platform (hardware) to another platform (hardware), due to a better performance, high availability or scalability (processing and/or storage) of the new platform.
    About your considerations on datamarts… Your approach is a bottom-down architecture (coming from a corporate/enterprise data warehouse, to a small, departamental point-of-view, datamart). There is another approach: bottom-up. From datamarts to corporate data warehouse.
    Which one to use? Depends on how fast the end user wants to get results.
    Another use of datamarts: as a datalab. Used on analytical development environments, based on letting “play” the end user with the data… When the user gets an stable model… then it’s applied to a data warehouse scale.
    And nowadays… you have the virtualization approach: take a physical data warehouse… and slide it into several virtual datamarts.
    Best regards.

  2. marcothesane says:

    The Atomic Data Warehouse (cf. Inmon) is a moderately normalised data model that should be the famous Single Point of Truth.
    It is the source of queries that no-one has foreseen, of queries that bring insight for strategic decisions. It is not essential, for a strategic decision, whether such a query takes a minute or 12 hours.
    It usually also takes a day or two to formulate such a query in the first place.
    Performance becomes important when we have many queries whose nature is known in advance. For these queries/reports, we optimise the underlying physical data model – and keep it up to date if requirements change. This physical data model can be a Kimball-ian Data Warehouse with Conformed Dimensions (they can perform very nicely) or a set – or a plethora – of multi-dimensional data marts, built as independent star models or multi-dimensional cubes stored in a proprietary fashion.

  3. marcothesane says:

    I forgot to add something that’s possibly obvious, but maybe the point should be made:
    From the OLTP sources to the Data Warehouse and to the dimensional models – data is not copied. It’s cast into a different shape to serve a new purpose – while the old shape is still needed.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s