Dimensional modeling, explained by Ralph Kimball, provides industry specific techniques to ensure that:
- The number of rows in the dimension tables is small. The row size can be large, row size is the sum of bytes stored for all the attributes in a row;
- The row size of the fact table is small. The number of rows can be large, use range partitioning for fact tables with very large row counts.
| Transaction detail (junk dimension) Transaction detail key Transaction type Payment type Coupon code Placement code Update date 50K rows, type 1 | Product Product key SKU Category Sub-category Color Style Effective date Effective end date 100K rows, type 2 | |||||
| one | one | |||||
| many | many | |||||
| Sales Transaction Calendar key Transaction detail key Product key Sales Org key Transaction number* Transaction line* Dollar retail amount Dollar discount amount Dollar tax amount Cost of goods sold Insert date 10,000K rows per partition 1 partition = 1 month * degenerate dimension | ||||||
| many | many | |||||
| one | one | |||||
| Sales Organization Sales Org key Clerk Manager Store Region Effective date Effective end date 50K rows, type 2 | Calendar Calendar key Calendar Date Day Week Weekend Holiday Month Month_end Quarter Year Update date 10K rows, type 1 |
The foreign keys on the fact table - Calendar key, Transaction detail key, Product key, and Sales Org key - are the dimension surrogate keys. Since the fact table is 100 times as big as the largest dimension, the dimension keys have a relatively small number of distinct values compared to the volume of rows in the fact table. In other words the dimension keys have a low cardinality on the fact table; there are a relatively small number of values for the dimension keys that repeat again and again on the fact table. The queries to extract data from the star schema use star joins for optimal performance. The star join performs well only when dimension row count can be minimized. While dimensional techniques, such as splitting a dimension into mini dimensions, are very potent in taming the large dimensions, there may be genuine requirements for having large dimensions. The customer-account dimension, especially when made type 2 to deal with legal and compliance needs, puts a damper on any star schema. As the dimension row count increases, the star schema starts to become unmanageable. The table below shows the optimizations used to make the star schema perform to its full potential:
| Star join decision table | Query selectivity; number of dimension rows returned: | |||
| Small | Large | |||
| Dimension size: | Small, < 1% of fact row count | Bitmap indexes (Commercial DBs only) | Dimension cartesian product: not recommended. Nested fact joins | Aggregates |
| Medium, ≈ 1% of fact row count | Nested fact joins | |||
| Large, > 1% of fact row count Are your sure this is needed? Read "the book", again, before answering yes | Consider non-dimensional model for the specific dimension and the requirement | |||
Note that snowflakes are not a recommended optimization technique.