See also: SmartArrays for Data Integration pressentation (pdf)
|
Case Study: Data Transformation and Loading
How SmartArrays Reduced Execution Time From 4 Hours to Less Than a Minute...
This task was performed for a large automotive insurance company. The company receives monthly government data
on US insurance sales and uses it to analyze its competitive position. Loading and cleansing the data is a
multi-step process, beginning with the import of raw data into a relational database, and culminating in analytical
results that measure competitiveness in various market scenarios.
One stage in this process was identified as a serious bottleneck. In this stage, the raw data is combined with a
dozen lookup tables, using business rules that are specified by an analyst and that vary based from one month to
the next, since lookup tables and data formats frequently change. Typically the analyst makes multiple iterations,
refining the data mapping and transformation rules each time to produce the desired result. Unfortunately, each
iteration was taking 4 hours to complete.
Using SmartArrays technology, we developed a plug-replacement tool to speed up this stage of the process. The tool reads
each month’s data from SQL Server into fast memory-resident SmartArrays tables. Lookup tables are also stored as
SmartArrays tables, and combined using high-speed algorithms for joining tables, matching and filtering values, and
performing calculations. The calculated results are then posted back to SQL Server in the form expected by the next
stage in the analysis process. Each month’s SmartArrays tables are saved to disk so they can be instantly accessed and
re-used when the next month’s data becomes available.
Using the tool, the execution time per iteration was reduced from 4 hours to a few seconds, which permits interactive,
“what if” trials by the analyst. The tool required approximately 100 hours to develop, with a single SmartArrays developer
advised by a subject matter expert.
|