The start of Big Data

Let’s just examine again why these dimensions are important. There are two major aspects to be considered: the business value that can be obtained from data and the cost of storing and using it for meaningful query analysis. If we look at the latter, we can make a vague assumption that the cost is pretty much in line with the volume of data we store. Thus it costs ten times as much to store ten terabytes as it does one – we can argue this point for hours and I admit to it being a gross over-simplification, but it will serve its purpose in this discussion. The key to understanding data volume, and therefore cost, is to understand the three dimensions above. For example, if we feel that there is value for a mobile communications company in storing information about the calls people make, then the first thing to do is to decide at what level the call information should be held. Each individual call is manifest in a Call Detail Record (CDR), which is a single record generated for every individual telephone call made (or accepted).

Well, suppose we are an average-size mobile provider with 1,000,000 subscribers each making ten calls per day and we decide to hold individual CDRs – then we must be able to hold 10,000,000 per day.

We must next decide what attributes of the CDR are important to us for decision-making. In fact, the CDR contains many, many attributes of interest (see a later chapter for details) and each has a physical size when it comes to storage in our computer systems. Let’s say for now, however, that there are at least 200 characters of useful information in each CDR, so we can now multiply the sum from above by 200 to find out how much data is created each day by people making calls:

10,000,000 multiplied by 200 equals 2,000,000,000 characters (or bytes) of data daily.

 This is TWO GIGABYTES of data generated every day.

We now have to tackle the issue of history. Basically, we must decide how many months’ worth of CDR history we need to store to allow us to make meaningful (and predictive) business decisions. Let’s say for now that we opt for thirteen months to allow us to do yearly month-on-month analysis. Well, there are approximately thirty days per month, so we must now multiply the above sum by three hundred and ninety:

2 gigabytes multiplied by 390 equals 780 gigabytes of total data per year.

This is hardly ‘big data’ but some of the mobile companies in Europe have huge numbers of subscribers, 20 million is not uncommon generating 20 Tbytes of data per year. Now THAT is BIG


About bibongo

I'm a consultant in the field of Business Intelligence and have been since the mid 80's which gives you some idea of my age! I'm priviledged to have held senior positions with Teradata, Oracle, Hp and EMC. I have an English son and a Swedish daughter seperated by some 18 years which is another type of welcome challenge!
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s