Thursday, February 15, 2007

Data warehousing tools

In the beginner's mind there are many possibilities, in the expert's mind there are few --Shunryu Suzuki

ETL
  1. Informatica - Popular ETL. Will it ever catch up to Ab Initio's parallelizing capabilities?
  2. Ab Initio - High scalability through easy availability of all possible types of parallelisms. The pipeline parallelism is especially powerful. Popular in financial services industry. Steep learning curve to fully utilize the features along with secretive nature of the company makes availability of skilled developers a challenge
  3. IBM/Ascential DataStage - Mixed bag of acquisitions
Database
  1. Oracle - Popular DB. SMP King, more than enough for most situations. MPP is shared disk Grid-RAC-OPS confusion, lesser said the better.
  2. DB2 - Traditionally emphasized the MPP approach. With bitmap index and range partitioning support, now supports both SMP and MPP. Something for everyone.
  3. Teradata - MPP pioneer; now everyone in the linux space wants to be MPP (the appliances). Also, Walmart runs Teradata! Teradata could handle the data volumes generated by the retail industry many years ago when Oracle/DB2 had not yet learned the data warehousing language. But it is unfortunate that, Teradata has a contempt for dimensional modeling or the star schema and does not support star schema joins very well - maybe the causality is the other way around.
BI/Reporting
  1. Business Objects - Popular ROLAP. Cognos and Oracle/Brio/Hyperion provide similar functionality, do we need so many tools in this space?
  2. Hyperion-Essbase - MOLAP King, tight integration with Excel (the way to go)
  3. MicroStrategy - ROLAP on steroids. High scalability through server side caching that provides on-the-fly aggregate optimization. Relatively small market of available developers
The swissknife
  1. SAS - Perl of the DW world, dream come true for the DW hacker
OLAP is explained in the post on trend reporting. ROLAP is OLAP using popular relational databases such as Oracle or DB2. MOLAP is a database engine designed specifically for OLAP, for example Hyperion Essbase.