Database Elaborations
 
spacer
Populating the Heap: Getting There from Here
 
schraml
spacer
Todd Schraml
 

In architected business intelligence environments the “staging area” or “the heap” comprises the first resting point for the majority of incoming data. While described in many differing ways, these data heaps are fairly consistent across organizations. The staging area may consist of a taxonomy of folders and files, or a set of relational DBMS tables, or a combination of both of these approaches. When implemented across a set of database structures, a limit set of table-kinds supports this endeavor. As a generalization, staging area table-kinds include the following: look-alike, general-message-queue, specific-transaction, cherry-picked, shadow, and metadata/control. Look-alike tables offer almost exact copies of incoming content structures, although the addition of load dates/IDs and generated keys help manage the data. The load dates/IDs add a level of auditing, and the generated keys enable successful data loading prior to testing the validity of any expected natural keys within the incoming data. General-message queues accept incoming transaction-oriented notifications from operational applications, as do specific-transaction tables. The difference between those things characterized as general versus those items viewed as specific lies in the eye of the beholder. In cases of only a solitary event or object, then one will likely label such an item as “specific.” Alternatively, a single structure receiving many kinds of event notifications, even if those notifications emanate from a single application, results in that single structure getting viewed more often as “general.”

Structures containing the most desired and useful information constitute the cherry-picked tables that effectively get hand-selected for passing necessary data on to the data warehouse. Cherry-picked structures, being more cleansed and target-focused, mimic their intended targets more than they reflect the source structures. Although the possibility may exist for defining the incoming table data in such a manner that makes the contents indistinguishable from a cherry-picked table’s data, those circumstances rarely occur. Additional kinds of data structures defined within the heap include shadow tables for auditing changes and history. These structures provide the means to track any historical activity potentially required by an individual organization. And lastly, metadata/control tables serve by monitoring and managing the data flows. A complete set of such control tables should allow for immediate determination of the state of all content within the staging area. Likewise, the metadata/control tables will guide any dynamic aspects of system ETL logic from simple listings of valid values through establishing target tables on-the-fly.

Bundled batch files or event-driven message/transactions encompass the source content that flows into the heap or staging area, and bundled batch files comprise the traditional source of any staging area. These batch files may contain only new/change data, or full copies of production objects for replacement processing or comparison use. Incoming data flows through either a two-step or a one-step process. The two-step approach involves loading the incoming data first into a look-alike table, and then processing against the look-alike to populate a cherry-picked table. The one-step approach takes data directly into the cherry-picked table. The two-step approach offers assistance for any level of variation or expected changes coming into the system. Also, if incoming data supports multiple intended targets, then a look-alike provides a common shopping list for multiple ETL processing to peruse. This mimic table may also aid in a full comparison process deriving change activity. Event-driven messages most likely get directed into the general or specific transaction structures. At times, these transaction structures can appear as a variant of a look-alike table, although circumstances prioritize the interception and cleansing of this content directly into a cherry-picked structure.

Data warehousing systems worldwide tend to operate the same way. Incoming data, whether transactional or batch-driven, flows into the heap/staging area in order to prepare the content for the data warehouse. Loading source material “as-is” into look-alike or transactional queue structures allows selection and transformation to move the “as-is” content on through to subsequent steps where varying cherry-picked tables sort out useful items from those not needed currently. Shadow tables track audit requirements and metadata tables drive or observe data status. Business intelligence initiatives thrive through the continued use of these simple structures.

About the Author:

Todd Schraml is senior data architect and manager of ETL at Innovative Health Strategies, Inc. He can be reached at tschraml@ihsiq.com.

|<<TOC  <<Back  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  Next>>  Masthead >>|

DBTA Home Page | About Us | Contact Us | Partners

To receive a monthly notice about new material and a quarterly
complimentary print edition, click
here.

 
spacer
 

DBTA Home Page

To receive a monthly notice about new material and a quarterly complimentary print edition, click here.

Table of Contents

TRENDS AND APPLICATIONS
End of the One-Size-Fits-All DBMS?
Automate Your SQL Server Migration Concerns
Security Safeguards are Critical Requirement for Basic System Functionality
IT Process Automation Enables IT to Provide Better Service
Open Source Business Intelligence Offers Benefits

MV COMMUNITY
Revelation Plans to Unveil OpenInsight 9.0; New Release to be Featured in Fall 2008 Training
MITS and Asynchron Systems Announce Reseller Relationship
Entrinsik and Campana Systems Partner to Offer Real-Time Reporting to AAA/CAA Auto Club Customers
Registration Now Open for U2 University 2008
Ashwood Offers Disk to Disk Backup Solution for IBM Users
Sierra Bravo Opens Chicago Office

COLUMNS
CA -Sponsored IDUG Survey Shows Data Professionals’ Growing Role in Business Success by Joe McKendrick
Consider Data Access Auditing to Classify Database Data by Craig S. Mullins
The Involuntary DBA by Kevin Kline
High Availability with Oracle Data Guard by Arun Kumar R.
Populating the Heap: Getting There from Here by Todd Schraml
J2EE Application Servers 10 Years On by Guy Harrison

News
Download Central
Places to Go
Did Ya Hear?
New Products

Online Masthead

DBTA Home Page

DBTA E-Editions
June 2008

May 2008
April 2008
March 2008
February 2008
January 2008

 
spacer
spacer
spacer
spacer
spacer