“Day 3: Learn 3 Things A Day – Learning 1”

What do I want from DWH architecture?

Reading Data Vault modeling lead me to believe what do we require from DWH modeling / architecture. These requirements may or may not be practical but nevertheless would like to understand myself what exactly is need from a data warehouse architecture. Also these are non functional requirements that are generic and can be applied to all verticals

Prior to detailing requirements, let me define problem space

  • DWH should be place where enterprise data should be consolidated.
  • Data is generated by LOB applications.
  • Existing applications are modified to meet new business changes and requirements.
  • Few applications are retired and at same time new applications are introduced.
  • Business partners will have different application landscapes that organizations will need to integrate
  • With new technologies possibility of rate of changes have increased and modeling should be handled.
  • All above requirements can be summarized as a new data warehouse models should handle
    • Volume (V) high volume of data
    • Variety (V) data from disparate data sources (both internal or external applications)
    • Velocity (V) rate of data flow is high
  • In addition to above a data warehouse architecture should be able to absorb changes to structure (format) of source data at high rate.

Requirements can be split into two categories.

  1. Modeling requirements
  2. Data movement requirements

Modeling Requirements:

  • Encompassing & integrated:
    • All possible sources of data should be incorporated
    • Include attributes for every table from every LOB applications
    • Data from every source application should be merged (either as-is or de-duplicated)
    • Integrate data from third party applications both internal and external to organization
  • Flexible:
    • DWH model should withstand application changes
      • Applications may have features enabled or disabled (with flags) resulting in different data generated.
      • Existing features may be modified or new features added again resulting changes to data format
      • Applications may be retired resulting in data not generated by application but data collected till data need to be still retained by DWH
  • Extendable: DWH model should be extendable and new entities may be added
    • New application may be introduced that will bring
      • New masters (contexts to slice transactions in new ways)
      • New transactions
    • All these should be used on conjunction with existing data.
  • Simplistic & Performant:
    • Should be simple for end users to perform adhoc analysis and hypothetical testing
    • Should be highly performant (that promotes adhoc analysis)
  • Traceability:
    • Data flow from source should be traced and audited

Data movement requirements:

Data from source system needs movement to DWH and loaded into set of tables in DWH model.

  • Considering volume, velocity and variety of data generated by source system, data flow architecture should be scalable and preferably horizontally scaled (rather than scaleup).
  • Additionally with few data sets (like comments) there could be a need to aggregate data during data movement before persisting in DWH model.
  • Development of data movement packages should go in tandem with modeling changes. And similar to modeling requirement, package development should be a append only mode with new packages added (instead of modifying existing packages)

DU = Deployable Unit

A deployable unit should include

  • Model changes (DB Scripts)
  • Packages for movement of data

In summary architecture should be have high performance that enables deployment of independent DU (deployable unit of BI component) which can be turned off or on as required with builtin features like traceability, audit, historical.

Admittedly some of these requirements may be idealistic and not practical but I would like to ideate on this further especially “Deployable Unit” in DWH.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s