Monday, September 10, 2007

Performance and throughput optimizations

If everything seems under control, you're just not going fast enough --Mario Andretti

The performance optimization options available are too numerous and often confusing. You need to keep the design simple by minimizing the number of optimizations required. Choosing the optimizations with biggest impacts is easy when you align them to the appropriate process in the data warehouse:

ETL performance
  1. Database bulk load
  2. Assuming that sufficient hardware capacity in terms of CPU processing power and disk throughput is available to support the parallelism:
  3. Database table range partitioning
Reporting performance
  1. Database table range partitioning
  2. Bitmap indexes (star schema)
  3. Aggregations
There will be genuine situations where you will need to go beyond these simple optimizations to meet the requirement. But those are few and far between. Think hard about what you are doing before making your design unnecessarily complicated.