Sunday, November 11, 2007

Aggregating transactions

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail --Abraham Maslow

We looked at the significant performance gains achieved by aggregating the balances fact. The case for aggregations is not as clear-cut with the transactions fact.

Sparse transactions

The key difference between balances and transactions for aggregations is that the transactions are sparse.

With balances, you need to maintain a balance for every customer, every product, every day irrespective of whether the balance changes or not. Thus, aggregation on balances always results in significant and known reduction in data volume.

You will NOT have a transaction for every customer, every product, every day. The transaction volume is small relative to balances:
30 million Accounts
90 million Balances per day
2 million transactions per day

Aggregating transactions does not help much

Let us continue with the in the transactions fact example:

Account (customer)
30 million records





Product

\

/

Account customer
demographics
mini dimension
-
Summarized Transactions
2 million records per day
-
Organization
(branch,
financial adviser)

/

\

Account status
mini dimension





Calendar

When the transactions are aggregated by eliminating the customer dimension, the chances are much higher that there will be a transaction for every day, every product, every organization unit, and every account status, etc. The volume of aggregated transactions fact table, therefore, ends up being very similar to the aggregated balances table that was derived after eliminating the customer dimension:

Account customer
demographics
mini dimension





Product

\

/



Aggregated Transactions
1 million records per day
-
Organization
(branch,
financial adviser)

/

\

Account status
mini dimension





Calendar

You do not achieve the kind of performance improvement here that was achieved for the aggregated balances fact. The dimensionality of the balances and transactions being same at this point, they can be stored together in the same table. All the types of aggregations including the reduction in time periods stored can now be applied to this fact.