The aggregate tables enhance reporting performance1 by reducing the amount of data that needs to be queried for widely used reports. Aggregate awareness at the database or the query tool makes the use of aggregate tables transparent to the user of the data warehouse.
Consider the example star schema for a financial services company as a simple case for aggregations:
| Account (customer) 30 million records | Product | |||
| \ | / | |||
| Account customer demographics mini dimension | - | Balances 90 million records per day! | - | Organization (branch, financial adviser) |
| / | \ | |||
| Account status mini dimension | Calendar |
The balance fact is huge: assuming that there are an average of 3 financial instruments per account, you have 90 million balance records inserted everyday for the 30 million accounts!
The balances may be checking account balances, investment instrument positions, or the loan principal amounts depending on the business you are involved in.
Limiting the balance fact volume by reducing the number of time periods stored
Before you even consider aggregations, consider changing the granularity of the balance fact from daily to weekly / monthly / quarterly / yearly for historical data. For trend reporting done at the data warehouse, weekly or monthly snapshots for historical trends should be sufficient in most situations.
For example, only month-end balance amounts may be stored for data older than 2 months. This query provides the required month end fact balances for the new fact:
This query shows the fact records that are preserved for historical data. Effectively, you delete the records not returned by this query. The actual implementation will probably insert the records returned by the query into a separate partition and drop the original partitions with daily data, in the range partitioned fact table.SQL> SELECT demo_id, status_id, product_id, org_id, balances.calendar_id, balance_amount,
SQL> FROM balances, calendar
SQL> WHERE balances.calendar_id = calendar.calendar_id
SQL> AND calendar.month_end = 'Y';
Aggregations on the balance fact by eliminating the customer dimension
Keeping the balances at the customer level adds significant volume to the fact. A large number of queries that do not contain any customer dimension attributes will benefit if a new fact is created without the customer dimension. This new aggregated fact is derived from the original detailed fact shown above by running the aggregation query:
SQL> SELECT demo_id, status_id, product_id, org_id, calendar_id, sum(amount)
SQL> FROM balances
SQL> GROUP BYdemo_id, status_id, product_id, org_id, calendar_id;
| Account customer demographics mini dimension | Product | |||
| \ | / | |||
| Aggregated Balances 1 million records per day | - | Organization (branch, financial adviser) | ||
| / | \ | |||
| Account status mini dimension | Calendar |
Note that the account-customer dimension has been eliminated. All the other dimensions of the detailed star are used in this aggregated star schema.
Other aggregations on the balance fact
The change in time period granularity and elimination of the customer fact provides the biggest performance benefit, and will be sufficient in most situations. If required, several other aggregated facts can be created by eliminating other dimensions, or more than one dimensions simultaneously. Aggregation through elimination of relevant dimensions is recommended for it is easy to implement and understand. It will suffice when larger dimensions were appropriately broken into mini dimensions in the first place.
Do NOT create aggregations by eliminating the dimensional attributes (creating new dimensions)
It is also possible to create aggregated facts by eliminating some attributes from the dimensions rather than the complete dimensions. For example, detailed product attributes may be eliminated from the product dimension leaving only the higher level product categories. Similarly, detailed branch and financial adviser attributes may be eliminated from organization dimension leaving only the higher level regional attributes. This will require creation of new dimensions with only the new limited set of attributes. The new aggregated fact will now be linked to these new dimensions. All the queries that do not use the detailed product or organization attributes can now be executed against this new aggregated fact for better performance.
| Account customer demographics mini dimension | Product category | |||
| \ | / | |||
| More Aggregated Balances 0.2 million records per day | - | Organization regions | ||
| / | \ | |||
| Account status mini dimension | Calendar |
Note that this aggregated star uses a new balance fact as well as new dimensions created specifically to support this level of aggregation. While this will certainly improve performance for some queries, the cost of aggregate maintenance has to be balanced against the marginal benefit of performance improvement. This is not recommended as a general approach to aggregations. There might be a genuine need for extreme query performance that can be achieved by this type of aggressive aggregation, and appropriate products such as the multidimensional databases should be considered in that situation instead of implementing this requirement as aggregations on the relational star schemas.
Do create aggregations by eliminating the dimensional attributes
There is one situation, already discussed above where this approach is recommended. The reduction in fact volume by reducing the number of time periods stored is actually a special case of this aggregation approach.
When attributes from a dimension such as the product are removed, you need to create a new dimension. For example consider the original product dimension:
| Product Key | Product Name | Class | Capitalization | Domestic | Style | ... |
| 1 | Great Index | Equity | Large | Y | Core | ... |
| 2 | Strong Value | Equity | Large | Y | Value | ... |
| 3 | Tremendous Value | Equity | Large | Y | Value | ... |
| 4 | Grand Value | Equity | Large | Y | Value | ... |
| 5 | Mungo Small | Equity | Small | Y | Value | ... |
| 6 | Mini small | Equity | Small | Y | Value | ... |
| 7 | Super Value | Equity | Large | Y | Value | ... |
| 8 | Global Hero | Equity | Large | N | Value | ... |
| 9 | International Intrigue | Equity | Small | N | Growth | ... |
| ... | ... | ... | ... | ... | ... | ... |
A new dimension has to be created when some attributes, for example the individual product level details, are eliminated:
| Product Cat Key | Class | Capitalization | Domestic | Style | ... |
| 1 | Equity | Large | Y | Core | ... |
| 2 | Equity | Large | Y | Value | ... |
| 2 | Equity | Small | Y | Value | ... |
| 4 | Equity | Large | N | Value | ... |
| 5 | Equity | Small | N | Growth | ... |
| ... | ... | ... | ... | ... | ... |
Compare this to the calendar dimension where the (shaded) month-end record is used to represent the month:
| Calendar Key | Date | Day | Month | Month End | Year | ... |
| 20071126 | 26-NOV-2007 | Monday | 11 | N | 2007 | ... |
| 20071127 | 27-NOV-2007 | Tuesday | 11 | N | 2007 | ... |
| 20071128 | 28-NOV-2007 | Wednesday | 11 | N | 2007 | ... |
| 20071129 | 29-NOV-2007 | Thursday | 11 | N | 2007 | ... |
| 20071130 | 30-NOV-2007 | Friday | 11 | Y | 2007 | ... |
| 20071201 | 01-DEC-2007 | Saturday | 12 | N | 2007 | ... |
| ... | ... | ... | ... | ... | ... | ... |
The new month-level calendar dimension is not created, the usage of month-end records to represent the month is equivalent to creating a new calendar dimension:
| This dimension is not required and not created | Calendar Month Key | Month | Year | ... |
| 1 | 9 | 2007 | ... | |
| 2 | 10 | 2007 | ... | |
| 3 | 11 | 2007 | ... | |
| 4 | 12 | 2007 | ... | |
| ... | ... | ... | ... |
Since the original calendar dimension is used, the same fact can contain both daily balances for recent history and only weekly or monthly balances for older history. Mixing of different grains in this fashion is not possible with other dimensions. (Addition of a new attribute, 'Aggregate Level', to the dimensions do make it possible to mix the different dimension and fact aggregations in the same tables of the star. This is not recommended, and not discussed here).
Aggregate awareness
This query will run much faster if the aggregated fact, shown above, created by eliminating the customer dimension is used instead of the detailed fact used here:
Aggregation awareness is the ability of the database or the reporting tool to do this substitution automatically:SQL> SELECT organization.branch, sum(balance)
SQL> FROM balances, organization, calendar
SQL> WHERE balances.calendar_id = calendar.calendar_id
SQL> AND balances.org_id = organization.org_id
SQL> AND calendar.month = 11
SQL> AND calendar.year =2007;
The query user does not know anything about all the aggregated tables that have been created. The query user writes the first query and the second query is executed in the background with an order of magnitude improvement in the performance.SQL> SELECT organization.branch, sum(balance)
SQL> FROM aggregated_balances, organization, calendar
SQL> WHERE aggregated_balances.calendar_id = calendar.calendar_id
SQL> AND aggregated_balances.org_id = organization.org_id
SQL> AND calendar.month = 11
SQL> AND calendar.year =2007;
1 While aggregations are very effective for balance facts, that may not be the case for transactions.