“DAY 3: LEARN 3 THINGS A DAY – LEARNING 2”

Data Vault Series 1– Data Vault Overview

http://tdan.com/data-vault-series-2-data-vault-components/5155

In DV model task of data transformation, cleansing and de-duplication is moved from Source to DWH (Data Warehouse) to DWH to DM (Data Mart layer). Thus effort needed to move data from disparate source systems to centralized DWH model becomes simpler . As I understood till now

 

  • Using DV modeling simplify, expedite data modeling and loading data from source system to DW model.
  • Simply modeling to dimension models for data marts preferably by using views (for faster loading into dimension model) unless performance becomes an issue.

As detailed in previous learning post, main entities of DV modeling are

  • Hubs
  • Links
  • Satellites

For this learning I am using AdventureWorks database. Picked few tables with scripts below

CREATE TABLE [ProductCategory](
    [ProductCategoryID] [int] IDENTITY(1,1) NOT NULL,
    [Name] [dbo].[Name] NOT NULL,
     CONSTRAINT [PK_ProductCategory_ProductCategoryID] PRIMARY KEY CLUSTERED
    (
        [ProductCategoryID] ASC
    )
)

CREATE TABLE [ProductSubcategory](
    [ProductSubcategoryID] [int] IDENTITY(1,1) NOT NULL,
    [ProductCategoryID] [int] NOT NULL,
    [Name] varchar(100) NOT NULL,
    CONSTRAINT [PK_ProductSubcategory_ProductSubcategoryID] PRIMARY KEY CLUSTERED
    (
        [ProductSubcategoryID] ASC
    )
)

CREATE TABLE [Product](
    [ProductID] [int] IDENTITY(1,1) NOT NULL,
    [Name] nvarchar(100) NOT NULL,
    [ProductNumber] [nvarchar](25) NOT NULL,
    [Color] [nvarchar](15) NULL,
    [SafetyStockLevel] [smallint] NOT NULL,
    [ReorderPoint] [smallint] NOT NULL,
    [StandardCost] [money] NOT NULL,
    [ListPrice] [money] NOT NULL,
    [Size] [nvarchar](5) NULL,
    [SizeUnitMeasureCode] [nchar](3) NULL,
    [WeightUnitMeasureCode] [nchar](3) NULL,
    [Weight] [decimal](8, 2) NULL,
    [DaysToManufacture] [int] NOT NULL,
    [ProductLine] [nchar](2) NULL,
    [Class] [nchar](2) NULL,
    [Style] [nchar](2) NULL,
    [ProductSubcategoryID] [int] NULL,
    [ProductModelID] [int] NULL,
    [SellStartDate] [datetime] NOT NULL,
    [SellEndDate] [datetime] NULL,
    [DiscontinuedDate] [datetime] NULL,
    CONSTRAINT [PK_Product_ProductID] PRIMARY KEY CLUSTERED
    (
        [ProductID] ASC
    )
)

CREATE TABLE [SalesOrderHeader](
    [SalesOrderID] [int] IDENTITY(1,1) NOT NULL,
    [OrderDate] [datetime] NOT NULL ,
    [SalesOrderNumber]  NOT NULL,
    [CustomerID] [int] NOT NULL,
    [SalesPersonID] [int] NULL,
    [TerritoryID] [int] NULL,
    [BillToAddressID] [int] NOT NULL,
    [ShipToAddressID] [int] NOT NULL,
    [ShipMethodID] [int] NOT NULL,
    [CreditCardID] [int] NULL,
    [CreditCardApprovalCode] [varchar](15) NULL,
    [CurrencyRateID] [int] NULL,
    [SubTotal] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL ,
    [Freight] [money] NOT NULL ,
    [TotalDue]  AS (isnull(([SubTotal]+[TaxAmt])+[Freight],(0))),
    [Comment] [nvarchar](128) NULL,
    CONSTRAINT [PK_SalesOrderHeader_SalesOrderID] PRIMARY KEY CLUSTERED
    (
        [SalesOrderID] ASC
    )
)

CREATE TABLE [SalesOrderDetail](
    [SalesOrderID] [int] NOT NULL,
    [SalesOrderDetailID] [int] IDENTITY(1,1) NOT NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [OrderQty] [smallint] NOT NULL,
    [ProductID] [int] NOT NULL,
    [SpecialOfferID] [int] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [UnitPriceDiscount] [money],
    [LineTotal]  AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),
     CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED
    (
        [SalesOrderID] ASC,
        [SalesOrderDetailID] ASC
    )
)

From above tables, hubs, links and satellites are as below

  • Hubs (HProduct, HProductSubCategory, HProductCategory)
  • Links (LProducts, LOrderHeader, LOrderDetails)
  • Satellites (SHProudct, SHProductSubCategory, SHProductCategory, SLOrderHeader, SLOrderDetails)

Create Table [HProductCategory]
(
    HProductCategoryKey int identity(1,1) NOT NULL constraint  pk_hProductCateogry primary key clustered,
    ProductCategoryID int not null,
    LoadDate Timestamp,
    RecordSource varchar(100)
)

Create Table [SHProductCategory]
(
    HProductCategoryKey int not null,
    LoadDate datetime not null,
    [Name] nvarchar(100) not null
    constraint PK_SHproductCategory Primary Key Clustered
    (
        HProductCategoryKey,
        LoadDate
    )
)

Create Table HProductSubCategory(
HProductSubCategoryKey int identity(1,1) NOT NULL constraint PK_HProductSubCategory primary key clustered,
ProductSubCategoryID int NOT NULL,
LoadDate Timestamp,
RecordSource varchar(100)
)

Create Table SHProductSubCategory(
HProductSubCategoryKey int not null,
LoadDate TimeStamp not null,
Name varchar(100),
constraint [PK_ProductSubcategory] primary key clustered
    (
        HProductSubCategoryKey,
        LoadDate
    )
)

Create table HProduct
(
    HProductKey int identity(1,1) not null constraint PK_HProductKey primary key clustered,
    [ProductID] [int] IDENTITY(1,1) NOT NULL,
    LoadDate Timestamp,
    RecordSource varchar(100)

)

Create table SHProduct
(
    HProductKey int identity(1,1) not null,
    LoadDate datetime not null,
    [Name] nvarchar(100) NOT NULL,
    [ProductNumber] [nvarchar](25) NOT NULL,
    [Color] [nvarchar](15) NULL,
    [SafetyStockLevel] [smallint] NOT NULL,
    [ReorderPoint] [smallint] NOT NULL,
    [StandardCost] [money] NOT NULL,
    [ListPrice] [money] NOT NULL,
    [Size] [nvarchar](5) NULL,
    [SizeUnitMeasureCode] [nchar](3) NULL,
    [WeightUnitMeasureCode] [nchar](3) NULL,
    [Weight] [decimal](8, 2) NULL,
    [DaysToManufacture] [int] NOT NULL,
    [ProductLine] [nchar](2) NULL,
    [Class] [nchar](2) NULL,
    [Style] [nchar](2) NULL,
     [SellStartDate] [datetime] NOT NULL,
    [SellEndDate] [datetime] NULL,
    [DiscontinuedDate] [datetime] NULL,
    Constraint [PK_SHProductKey] Primary Key Clustered
    (
        HProductKey,
        LoadDate
    )
)

 

Create Table LProducts
(
    LProductsKey int identity(1,1) not null constraint PK_Products Primary key Clustered,
    HProductKey int not null,
    HProductSubCategoryKey int not null,
    HProductCategorykey int not null,
    LoadDate datetime not null,
    RecordSource datetime not null
)

Create table LOrderHeader
(
    LOrderHeaderKey int identity(1,1) not null constraint pk_LOrderHeaderKey Primary key Clustered,
    [SalesOrderID] int not null,
    [CustomerID] [int] NOT NULL,
    [SalesPersonID] [int] NULL,
    [TerritoryID] [int] NULL,
    [BillToAddressID] [int] NOT NULL,
    [ShipToAddressID] [int] NOT NULL,
    [ShipMethodID] [int] NOT NULL,
    [CreditCardID] [int] NULL,
    [CurrencyRateID] [int] NULL
)

Create table SLOrderHeader
(
    LOrderHeaderKey int not null,
    [OrderDate] [datetime] NOT NULL ,
    [SalesOrderNumber]  NOT NULL,
    [CreditCardApprovalCode] [varchar](15) NULL,
    [SubTotal] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL ,
    [Freight] [money] NOT NULL ,
    [TotalDue]  AS (isnull(([SubTotal]+[TaxAmt])+[Freight],(0))),
    [Comment] [nvarchar](128) NULL
)

Create table LOrderDetail
(
    LOrderDetailKey int identity(1,1) constraint PK_OrderDetailKey primary key clustered,
    LOrderHeaderKey int not null,
    [SalesOrderDetailID] [int]  NOT NULL,
    LProductsKey int not null,
    [SpecialOfferID] [int] NOT NULL
)

Create table SLOrderDetail
(
    LOrderDetailKey int not null,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [OrderQty] [smallint] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [UnitPriceDiscount] [money],
    [LineTotal]  AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),
)

This completes my first attempt to try to design a DV DWH model from 3NF of OLTP systems. But during design there are few questions that cropped up which for now are ignored and purpose is to address  / understand them in next couple of posts. Few of questions are below

  • If there is a parent and child relationship in tables,
    • In DV these need to be flattened in two ways
      • including parent attributes in Hubs or
      • create a link table that capture relationships between parent and child.
        • When creating do we need to create an additional link table or link table that is created for transactions will suffice?
          • Pros and cons of new linked vs leveraging existing link table.
  • An attribute is present in transaction table
    • What decides when to place attribute link or satellite tables.

Few of such questions are pondering that need to be learnt also I need to present above sql scripts in more presentable visual format.

Until my next learning

Abhyast

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s