Pages

Friday, November 21, 2014

Dynamic Cube Partitions Using SSIS


Dynamic Cube Partitions Using SSIS

Implementing Creation & Merging of Dynamic Cube Partitions using SSIS


In this section I will explain step by step operations to create Dynamic Cube Partitions.
For this you need AdventureWorkDW2012 database and the Analysis Services Tutorial samples.

You can get the complete details for set up the database and samples files from here.

We are using Analysis Services Tutorial “Lesson 2 Complete” for this purpose.

This cube will look like this.


Deploy SSAS Cube on your local server

Step 1: - Set Target Properties 

Make sure that Target Server is set to localhost and database is Analysis Services Tutorial.

Step 2: Create Template Partition

For this we need to create a default partition, which will act like template partition
Double click on the cube and navigate to Partition tab. You will see a default partition as shown below:


Delete this default partition and Click on new partition... to click new partition. In Partition Wizard, Select “InternetSales” as available table in Specify Source Information page and click next. Select Specify a query to restrict row in Restrict Rows page and write WHERE condition (WHERE 1 = 2) to restrict partition rows. This is just to create a partition template with no data.


Click Next twice and enter Partition name as Internet_Sales_Partition_Template in the Completing the Wizard page.


Select Design aggregation later and check Deploy and Process now.
Finally click Finish.

Once cube is processed successfully, you can see a new database in Analysis Services:



Step 3: - Generate Scripts

We are done with one manual partition. Now it’s time to create the partitions dynamically and apply logic to process the new data every time.

First off all we need to create the create, process, alter and merge scripts.
 

Create Script 

For this right click on the Partition Internet_Sales_Partition_Template and then select Script Partition as -> Create To -> New Query Editor Window.


It will create XMLA scripts for this partition. Save this file in the project with Create_InternetSales_Partition_Template.xmla name.

Process Script

Right click on the Partition Internet_Sales__Partition_Template and then select Process. This will open Process Partition wizard, click Script to generate XMLA script and save this file in the project with Process_InternetSales_Partition_Template.xmla name.


Alter Script
For this right click on the Partition Internet_Sales_Partition_Template and then select Script Partition as ==> Alter To ==> New Query Editor Window.


It will create XMLA scripts for this partition. Save this file in the project with Alter_Internet_Sales_Partition_Template.xmla name.

Create SQL Tables and Procs


We need some database objects need to create in AdventureWorksDW2012 for this.
Below are details of these objects

S.No
Object Name
Type
Purpose
1
Check_Partition_Merge_Required
Proc
Use to check if partitions are required to merge or not.
2
Delete_PartitionLog
Proc
Use to delete the data in dbo.PartitionLog table.
3
Get_Cube_Partition_Details
Proc
Use to get the details for new partition need to create
4
Get_Cube_Processing_Details
Proc
Use to get the details of type of processing, i.e. normal run, delete and process again or just process all again.
5
Get_Cube_XMLA_Scripts
Proc
Use to store the xmla scripts in variables.
6
Get_Current_Partition_Details
Proc
Get all the partitions details from dbo.PartitionLog table
7
Get_PartitionDetails_ToMerge
Proc
Use to get the details of Partition which need to merge
8
Insert_PartitionLog
Proc
Use to Insert the data in dbo.PartitionLog table.
9
Reset_Variables
Proc
Use to reset the variable values
10
Update_PartitionLog
Proc
Use to Update the data in dbo.PartitionLog table.
11
Insert Data in PartitionLog
Script
Insert Template partition detail in table.
12
Insert Data in PartitionQuery
Script
Insert xmla queries in table.
13
dbo.PartitionLog
Table
Use to store the Partitions details.
14
dbo.PartitionQuery
Table
Use to store the xmla queries (create, alter, merge and process).
15
dbo.SSISLog
Table
Use to store the event logging of ssis package.

You can get the scripts of these objects here.

Create SSIS Package

We are done with SSAS and SQL development.
Now we have to create an SSIS package to implement dynamic creation of cube partition and processing of that partition.

Step 1: Create SSIS Package

Create a New Project of BIDS and Select Integration Services Project and enter the name of the SSIS Project as Tutorial_DynamicPartiton.


Rename the SSIS package1.dtsx by DynamicPartition.dtsx.

Step 2: Add connection managers

Add two connection Managers - one OLE DB connection manager for database and another Analysis Services connection manager for SSAS database.




Step 3: Add Package Variables

Add following Package variables:



Here is the brief description of each variable:

Name
DataType
Description
Default Value
AlterPartitionXMLA
String
To store XMLA script to alter partition

CreatePartitionXMLA
String
To store XMLA script to create monthly partition

Cube_Name
String
Cube name
Analysis Services Tutorial
Database_Name
String
SSAS Database name
Analysis Services Tutorial
FromDateKey
String
To Store monthly partition start date key e.g. 2014-01-01

History_Flag
Int16
To check what kind of processing is required.
1
IsPartitionExists
Int16
To check whether monthly partition already exists or not
0
LastDate_Data
String
To store the last day of data in fact table for a particular partition.

LoopIteration
Int16
Loop iteration variable set to 1
1
Measure_Group_Name
String
Measure group name of the cube
Internet Sales
Merge_Flag
Int16
To check whether monthly partition need to merge in yearly or not.
0
MergePartitionXMLA
String
To store XMLA script to merge partitions

PartitionList
Object
To store the existing partition list

PartitionName
String
Monthly partition name e.g. InternetSales_20120101_20120131

PartitionToMerge
String
Name of Partition which need to merge in yearly partition.

Process_Flag
Int16


ProcessPartitionXMLA
String
To store XMLA script to process monthly partition

Success_Flag
Int16
To store the success flag, i.e. whether process need to run or not, in case of monthly partition

Template_Name
String
Template name prefix for the partitions
Internet_Sales_
ToDateKey
String
To Store monthly partition end date key e.g. 2014-01-31

TotalMonthstoProcess
String
Numbers of months in the fact data to run the loop for all in once.


Step 4 –Package Development

Generally there are 4 different scenarios which can happens while processing a cube.

1)     Fresh data is available for processing
2)     All data need to be processed again (due to some data changes in tables)
3)     Cube is altered and all partitions are need to be created again.
4)     No fresh data is available for processing.

We are handling all these scenarios in this package.

Variable History_Flag is very important, as this will tell us what kind of processing we required. This will be passed as Input Parameter to the package.

There could be 3 values for this i.e.
1 means normal / incremental processing 
2 means there is some data changes and we need to process whole cube.
3 means there is schema changes / cube is altered which cause all partitions to be override and we need
    to create all partitions again.

We are making 12 monthly partitions, of last 12 months and rest will be merged into a yearly partition of that year.

Now you have to develop SSIS package as shown below:






Task 0 - Exec_SQL_Tk - Check Processing type based on History_Flag

First Execute SQL Task is to check what path need to use for processing based on the @History_Flag value.

In Execute SQL Task Editor, select ResultSet Single row, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Get_Cube_Processing_Details] ?,?,?


Click on Parameter Mapping and map the parameters as shown below:


Click on Result Set and map the result as shown below:


This will give data as below: 


Note: in case of @History_Flag = 2, TotalMonthstoProcess is null as in this case we just need to process the whole cube and not partition wise.

If there is no fresh data to process it will return TotalMonthstoProcess as null and History_Flag as 0.


Task 1.1 – Seq_Container - Re-Process Whole Cube

 If History_Flag = 2 then this container will execute and process the whole cube.

Task 1.1.a – AS_Processing_Tk - Re-Process Whole Cube     


Task 1.1.b – Exec_SQL_Tk - Update info of Monthly Partition

Use this to update the dbo.Partition table.
In Execute SQL Task Editor, select ResultSet Single row, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

exec Update_PartitionLog ?,?,?,? ,2


Click on Parameter Mapping and map the parameters as shown below:


Task 1.2 – Seq_Container - Create fresh partitions after Alter

If there is any change in the schema of cube, an alter / drop create statement is executed which will cause drop of all the partitions, except the template one. So in this case we need to delete it from the dbo.PartitionLog table as well, however we are still checking in SSAS cube and drop, if found and then delete from the dbo.PartitionLog table as well.
Note: this will run only if History_Flag value is 3

Task 1.2.a – Exec_SQL_Tk - Get List of All Partitions to delete

First task in this is to get a list of all the Partitions using proc get_current_partition_details

In Execute SQL Task Editor, select ResultSet Full result set, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

exec Get_Current_Partition_Details ?,?


Click on Parameter Mapping and map the parameters as shown below:



Click on Result Set and map the result as shown below:



This will return the list of Partitions (Active as well as already merged)


Task 1.2.b – Foreach_Loop_Container - Delete Partitions from Cube

In Foreach Loop editor select Enumerator as Foreach ADO Enumerator and User:PartitionList in the ADO object source variable


and map the variable PartitionName to Index 0


In this Foreach-Loop we have a script task

Task 1.2.b – Script_TK - Check and Delete Partitions from Cube

This task is to check if partition exists, on the SSAS cube, and drop them if exists.


ReadOnlyVariables User::Cube_Name,User::Database_Name,User::Measure_Group_Name,User::PartitionList,User::PartitionName
ReadWriteVariable - User::IsPartitionExists

Task 1.2.c – Exec_SQL_Tk - Delete Partitions details from Log table  

Finally it will delete these partitions details from the table as well.

In Execute SQL Task Editor, select ResultSet None, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

exec [Delete_PartitionLog] @Database_Name, @Cube_Name


Click on Parameter Mapping and map the parameters as shown below:


Note: it will not delete the Template partition. 

Task 1.0 – For Loop Container - Create and Process Partitions

Now in this loop container we create the monthly partition process it and if number of active monthly partitions goes beyond to 12 then it will create and process a yearly partition of the first month year and merge that to yearly partition.
This will run only when History_Flag value is 1 or 3.
We need to run the loop for all data available in the fact table. TotalMonthstoProcess will store the total no of month’s data in the fact table.
Set the For loop properties as below:
 

Task 1.0.1 – Exec_SQL_Tk - Check Partition Details

In this we’ll initialize FromDateKey, ToDateKey, PartitionName and LastDate_Data for the next partition.

Note: As we are making monthly partitions so in this case ToDateKey will return the last date of the FromDateKey month and LastDate_Data will return the maximum date of that month, for which data is available.
For example if it 1st of Jan 1014 and we have data till 1st of Jan, then FromDateKey and LastDate_Data
will return 2014-01-01 and ToDateKey will return the last date of jan i.e. 2014-01-31.

In Execute SQL Task Editor, select ResultSet Single row, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

 Exec [Get_Cube_Partition_Details] ?,?,?


Click on Parameter Mapping and map the parameters as shown below:


lick on Result Set and map the result as shown below:

this will return below data

If Success_Flag = 1 then it means we need to process further.

Task 1.0.2 – Exec_SQL_Tk - Store XMLA to variables


In Execute SQL Task Editor, select ResultSet Single row, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Get_Cube_XMLA_Scripts] ?,?


Click on Parameter Mapping and map the parameters as shown below:


Click on Result Set and map the result as shown below:


this will return below data


Task 1.0.3 – Script_Tk - Generate XMLA scripts to create and process Monthly Partition

Script Task is used for two reasons - 

1. To check whether current partition exists or not. 
2. To update XMLA content for current Partition.


In script User::Cube_Name,User::Database_Name,User::FromDateKey,User::Measure_Group_Name,User::PartitionName,User::ToDateKey will be the ReadOnly Variables and User::CreatePartitionXMLA,User::IsPartitionExists,User::ProcessPartitionXMLA will be ReadWriteVariable

This is the most tricky and interesting part here. We need to replace Partition ID, Partition Name, and Source QueryDefinition for current Partition:

 

Task 1.0.4 – Seq_Container - Create Monthly Partition 

Task 1.0.4a – AS_Exec_DDL_Tk - Create new monthly partition

Second task in the sequence Container is Analysis Services Execute DDL Task. This task is used to create new Partition. In Analysis Services Execute DDL Task Editor, select Analysis Services Tutorial as Connection, Variable as SourceType, and User::CreatePartitionXMLA as Source.


Also don't forget to set precedence constraint. You should execute this task only when partition does not exists. Open Precedence Constraint Editor and Select Expression and Constraint in Evolution operation, Success in Value, and @IsPartitionExists == 0 in Expression

Task 1.0.4.b – Exec_SQL_Tk - Insert monthly partition info in log table


This is used to store newly created Partition information in the PartitionLog table.
In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:
 
Exec [Insert_PartitionLog] ?,?,?,?,?,?, ‘Monthly’


 Click on Parameter Mapping and map the parameters as shown below:

Task 1.0.5 – Sequence Container - Process Monthly Partition 

Now we need to process the current partition. This is independent of previous sequence container where we are creating new partition if it is not exists.
We need to process the current partition every time we execute the package. New partition will be created on the first day of month but partition should be processed every day to include delta data.

In case you are in current month partition, if you execute the package every day in current month - it will not create any new partition till beginning of next month. It will only process Current Partition to include the data in the cube as shown below:

This sequence container contains three tasks.

Task 1.0.5.a – AS_Processing_Task - Process Dimensions (Monthly Partition)

First task is Analysis Services Processing Task - to process the dimensions data. Select Analysis Services Tutorial as connection manager and all the dimensions in Object Name - which are not static in data. Select Process Update as Process Option.


Task 1.0.5.b – AS_Processing_Task - Process Monthly Partition


Second Task is again Analysis Services Processing Task - to process Current Partition. Select Analysis Services Tutorial as connection manager and Internet_Sales_Partition_Template in Object Name - which is default partition (or Template partition). Select Process Full as Process Option.
Now click on Expression and select @[User::ProcessPartitionXMLA] in ProcessingCommands.


Task 1.0.5.c – Exec_SQL_Tk - Update info of Monthly Partition

Store the Last Processed Date and Processed Count (number of time this partition has processed) in the log table using Execute SQL task.
Select ResultSet None, ConnectionType OLE DB, Connection ADVENTUREWORKSDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Update_PartitionLog] ?,?,?,?, 1


Click on Parameter Mapping and map the parameters as
 
Now we are done with package development for creation of partitions.
Now we’ll see how to do the Dynamic Merging of Partitions.

Task 1.0.6 – Exec_SQL_Tk - Check Merge need to be done or not 

As per requirement we are making last 12 months partitions and then create and merge in yearly for rest of the partitions. So in this task we check if we have 12 active partitions or not. It will goes further only if we have more than 12 active monthly partitions.

In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Check_Partition_Merge_Required] ?,?


 Click on Parameter Mapping and map the parameters as shown below:


Click on Result Set and map the result as shown below:

Task 1.0.7– Seq_Container - Check details for Merge Partitions

Task 1.0.7.a – Exec_SQL_Tk - Set Current Partition name and Date Range for Merge Operation

First Execute SQL Task is used to initialize FromDateKey, ToDateKey, PartitonName, Success_Flag and LastDate_Data and PartitionToMerge for the partition which need to be merged. Use below query in this Task:

Exec [Get_PartitionDetails_ToMerge] ?,?,?


Click on Parameter Mapping and map the parameters as shown below:



Click on Result Set and map the result as shown below:


Output of this will be like this
  

Task 1.0.7.b – Exec_SQL_Tk - Store XMLA to variables

In Execute SQL Task Editor, select ResultSet Single row, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Get_Cube_XMLA_Scripts] ?,?


Click on Parameter Mapping and map the parameters as shown below:


Click on Result Set and map the result as shown below:


this will return data like this


Task 1.0.7.c – Script_Tk - Generate XMLA script to Create and Process Yearly Partition

Script Task is used for below reasons - 

1. To check whether current yearly partition exists or not. 
2. To update XMLA content for yearly partition creation, merging, processing and altering of scripts.


ReadOnlyVariables
User::Cube_Name,User::Database_Name,User::FromDateKey,User::Measure_Group_Name,User::PartitionName,User::ToDateKey

ReadWriteVariables
User::AlterPartitionXMLA,User::CreatePartitionXMLA,User::IsPartitionExists,User::MergePartitionXMLA,User::PartitionToMerge,User::ProcessPartitionXMLA

We need to replace Partition ID, Partition Name, and Source QueryDefinition for current Partition which are highlighted below:



Task 1.0.8– Seq_Container - Create Yearly Partitions 

Task 1.0.8.a – AS_Exec_DDL_Tk - Create New Yearly Partition

First task in the sequence container is Analysis Services Execute DDL Task. This task is used to create new yearly partition.
In Analysis Services Execute DDL Task Editor, select Analysis Services Tutorial as Connection, Variable as SourceType, and User::CreatePartitionXMLA as Source.




Also don't forget to set precedence constraint. You should execute this task only when partition does not exists. Open Precedence Constraint Editor and Select Expression and Constraint in Evolution operation, Success in Value, and @IsPartitionExists == 0 in Expression 

Task 1.0.8.b – Exec_SQL_Tk - Insert Yearly Partition Info in Log Ttable

This is used to store newly created partition information in the PartitionLog table.

In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:
 
Exec [Insert_PartitionLog] ?,?,?,?,?,?, ‘Yearly’
  


Click on Parameter Mapping and map the parameters as shown below:

Task 1.0.8.c – AS_Processing_Task - Process Dimension (Yearly Partition)

Now we need to process the newly created year partition. 

In Analysis Services Processing Task select Analysis Services Tutorial as connection manager and all the dimensions in Object Name - which are not static in data. Select Process Update as Process Option.


Task 1.0.8.d – AS_Processing_Task - Process Yearly Partiton

Next Task is again Analysis Services Processing Task - to process Current Partition.
Select Analysis Services Tutorial as connection manager and Internet_Sales_Partition_Template in Object Name - which is default partition (or Template partition). Select Process Full as Process Option.

Now click on Expression and select @[User::ProcessPartitionXMLA] in ProcessingCommands.


Task 1.0.8.e – Exec_SQL_Tk - Update info of Yearly Partition in Log Table

Now we need to store the Last Processed Date and Processed Count (number of time this partition has processed) in the log table using Execute SQL task.
Select ResultSet None, ConnectionType OLE DB, Connection ADVENTUREWORKSDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Update_PartitionLog] ?,?,?,?, 1


Click on Parameter Mapping and map the parameters as 


Task 1.0.9– Seq_Container - Merge Monthly Partition to Yearly Partition

Now we need to merge the first monthly partition to Yearly Partition.
This is independent of previous sequence container where we are creating new partition if it is not exists.
However, we need to merge the last partition every time we execute the package. In most of the cases, new partition will be created only once but partition should be merged on first processing of every month.

This sequence container contains 4 tasks.
 

Task 1.0.9.a – AS_Exec_DDL_Tk - Merge Monthly Partition to Yearly Partiton

Second task in the sequence Container is Analysis Services Execute DDL Task. This task is used to create new Partition. In Analysis Services Execute DDL Task Editor, select Analysis Services Tutorial as Connection, Variable as SourceType, and User::MergePartitionXMLA as Source.


Task 1.0.9.b – Exec_SQL_Tk - Update info of Merging Partition to Log Table

Second Task in the Sequence Container is Execute SQL Task - this is used to update newly merged Partition information in the PartitionLog table.

In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection ADVENTUREWORKSDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Update_PartitionLog] ?,?,?,?, 3

Click on Parameter Mapping and map the parameters as shown below:


Task 1.0.9.c– AS_Exec_DDL_Tk - Alter Definition of Yearly Partition

Third task in the sequence Container is Analysis Services Execute DDL Task. This task is used to alter the “QueryDefinition” of yearly partition. In Analysis Services Execute DDL Task Editor, select Analysis Services Tutorial as Connection, Variable as SourceType, and User::AlterPartitionXMLA as Source.


Task 1.0.9.d – Execute SQL Task - Update LastDate_Data for Yearly_Partition

Last Task in the Sequence Container is Execute SQL Task - this is used to update yearly Partition information in the PartitionLog table.
In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection ADVENTUREWORKSDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Update_PartitionLog] ?,?,?,?, 4


Click on Parameter Mapping and map the parameters as shown below:

Task 1.0.10– Exec_SQL_Tk - Reset Variables

As we are processing data in loop, we need to reset some variables.
In Execute SQL Task Editor, select ResultSet Single row, ConnectionType OLE DB, Connection AdventureWorksDW2012, SQLSourceType Direct input and SQLStatement as:

Exec [Reset_Variables] ?,?


Click on Parameter Mapping and map the parameters as shown below:


Click on Result Set and map the result as shown below:


We are done with final package development. Now it’s time to execute the package and check the dynamic behavior of Partition creation and processing. Before execution the package, there is only one partition in the SSAS Analysis Services Tutorial database.
Now Right Click on the package and execute it. Once package is execute successfully, you will see one more partition in the database and one more entry in PartitionLog table.

If you query dbo.PartitionLog then it will return data like this.


 
If you browse cube it will show you all the partitions, you can see 12 monthly partitions -one partition for each month and one yearly partition as shown below:



You can get the all scripts used in project here.

This is my first attempt to write a post, so there may be some formatting or small mistakes :).
If you require further details or want SSIS package, just drop me a mail.






No comments:

Post a Comment