Wednesday, October 15, 2014

SSIS Variables

 

Variables have been available in SSIS since first introduced in. They can be used for a number of different purposes:
1. Define the path of a package
2. Hold system information, such as error information
3. Provide a means to create custom logging
4. Hold result set(s) to iterate through
Let’s begin with a simple example such as using a variable to define the path of a package. Consider that you are having to execute a data flow task and based on the number of rows loaded will dictate what task should occur next. This is simple enough to using a variable and precedence constraint. The first step is to create a variable, which is done in the variable pane in SQL Server Data Tools:
variablepane

Scope

The first thing to note from the variable pane is the scope.  The scope defines where the variable will be available.  For example the scope for my variable “RowCount” is set to my package which I have named RowCount.  This means that any task within the package, including child packages, will have access to the variable.  It is best practice to insure that the variable scope is limited to only where it is needed.  This package will have a total of three tasks:
1. Data flow task that will take the data from the Adventureworks2012.Person.Person table through the Row Count transformation and to a flat file destination
2. A script task that executes if the row count is greater than 1,000
3. A script task that executes if the row count is less than 1,000
In this case the scope is set to the package since all 3 tasks will need access to the variable.  If the package contained other control flow tasks that did not require access to the variable then scope could be limited by placing the three affected tasks in a sequence container and setting the scope to the container.

Data Type

The data type defines the type of object and the acceptable values that the variable can be held.  Another important factor is to assign the most efficient data type for a variable.  For example, if the variable will hold a numeric value between -128and 128 the most efficient data type is SByte rather than UInt, which can hold a numeric value between o and 65,535.  MSDN documents the data types here.

Value

This holds the value of the variable.  This value can be changed within the package using script task, expression task, execute SQL task, as well as several other.

Expression

The expressions provides a means of assigning a value to the variable using an SSIS expression.  Again this can be changed later within the package, but once package execution begins and the variable scope is  invoked the value or value from the expression provides the variable value until and if it is changed.
With the variable configured now I will include my data flow task that will record the value of all rows that go from the source, Adventureworks2012.Person.Person, to the destination, to the flat file destination using a row count transformation.  This is easy enough to do as once connecting the data source to the row count you will be prompted for the variable that will be used to hold this numeric value.
DataFlowCount

Precedence Constraint

Keeping in mind the requirements we now must define the path of execution based on the success of the data flow and the number of rows processed, which is now held within our variable.  By connecting both script tasks using our On Success precedence constraint we can then change both constraints to use Expression and Constraint we can define one constraint to use the expression @[User::RowCount] > 1000 and the other @[User::RowCount] < 1000.  This method will leave one path uncovered, that is if the RowCount == 1000.  In this case we are not concerned about that so we will let it ride.
precedence
Each script task will be used to access the variable and display it in a message box and then reset the variable to 0.  This requires that both tasks will need to have read and write access to the variable.
ScriptVariable
I will use VB.NET in each script task using the following code:
< style=”font-size: 12px;”>
Public Sub Main()

MessageBox.Show(“The row count was: ” + Dts.Variables(“RowCount”).Value.ToString)
Dts.Variables(“RowCount”).Value = 0
MessageBox.Show(“The row count was: ” + Dts.Variables(“RowCount”).Value.ToString)
Dts.TaskResult = ScriptResults.Success
The final package in the control flow looks like this:
FinalPackage
You will notice that the precedence constraints are green, meaning that the path will only be taken if the preceding task completes successfully.  You will also note that there is an “fx” next to each precedence constraint which means that the constraint is also based on a precedence.
This is obviously a very simplistic example, but the first in the series so we progress with each new post.
The sample package, completed in SSDT 2012, can be downloaded here.

One of the more interesting values that can be held within an objects is a full result set, which can then be treated similar to a collection.  One thing that I want to point out is just because you can doesn’t mean you should.  Quite often I will be asked how a result set can be contained in a variable in SSIS and when I ask why I am most often told that it would be used to do complex transformations on the results in the control flow.  HOLD ON!!  That is exactly what the data flow task is for!!  In response to this I am most often told that the transformations are most easily done in a script task.  Once again there is a script component in the data flow that can act as a source, transformation, and/or destination and such complex transformations should be kept in the data flow as a best practice as well as for performance considerations.
If I haven’t scared you away already let me point out several other performance considerations.  First keep the scope of the variable limited to only what is needed, task, container, and only if necessary the package.  The memory required to hold an object will be dictated by the value(s) that are stored in the variable so extra attention is required to limit the exposure to only what is required.  Boxing and Unboxing can also carry performance costs, MSDN documents boxing and unboxing here.
Enough of the gloom and doom, let’s take a look at an object variable and how to work with it.  To properly present the demonstration let’s present the requirements of the sample package.  A result set from an execute SQL task needs to be captured within the object variable that will consist of two columns, FirstName and LastName.  The result set will then be iterated through to execute complex logic, in this case just pass the values to a script task that will present the name in a message box(I know!!  This is just a simple example though).  This package will require three variables,
1. Names                        Object
2. FirstName                    String
3. LastName                    String
objvariable
With the variables created we can now begin to populate the Names variable with our execute SQL task.  The task will need to set the Result Set property to full result set and the Connection Type will use an OLEDB connection to my localhost default instance connecting to the Adventureworks2012 database.  The query is simplistic enough:
SELECT FirstName,
LastName
FROM Person.Person
We now need to define the Result Set settings in the task to specify that the results will be captured within our Names object variable.  From within the Result Set tab you must specify the zero based index value of the results to be stored in the Name property, since our query only returns a single result set this will be set to 0, and then the Variable Name of the SSIS variable that will be used to hold the result set, in this case Names.
resultset
Our package will now populate the Names variable with the results of our query and we now have several methods that we can work with this variable.

ForEach Container

The first way to work with an object is to use a For Each container and set the Enumerator to a Foreach ADO Enumerator and define the ADO object source variable to our Names variable:
foreach
Now keep in mind that the Names is an object, very much like an array or collection, so we need to take both columns, FirstName and LastName, and place them in their own variables.  This is done on the Variable Mapping page by again mapping the zero based index value of the ordinal position of the columns to the appropriate variables:
varmap
**I intentionally placed the LastName and FirstName out of order to demonstrate how the variable mapping is done based on the zero based index and not by the order in which the variables are mapped to the columns
Within the ForEach container place a script task that has ReadWriteVariables or Read Only Variables set to both the FirstName and LastName:
script
You may ask why access is not given to the Names variable.  The answer is that the ForEach container accesses the Names object variable and iterates though each row placing the FirstName column in the FirstName variable and LastName column in the LastName variable and passes those variable, one at a time, to the script task so access is not needed to the object.
The C# script task simply calls the MessageBox Show method to display the names one at a time:
MessageBox.Show(“The name is: ” + Dts.Variables[0].Value.ToString() +” ” + Dts.Variables[1].Value.ToString());

Script Task

Another way to work with the variable is directly within a script task.  Since the script task will now be iterating through the Names object variable directly the variable needs to be assigned to the script tasks ReadWriteVariables or ReadOnlyVariables:
scriptnammes
The C# script task first creates an OleDbAdapter and a data table and calls the OleDbAdapters Fill method to populate a data table with the Names object variable.  Once the data table is populated a foreach loop is used to iterate through the results and display the results using the MessageBox Show method.
//Populate a data table with the Names Variable
OleDbDataAdapter A = new OleDbDataAdapter();
System.Data.DataTable dt = new System.Data.DataTable();
A.Fill(dt, Dts.Variables["User::Names"].Value);
// Iterate through the data table
foreach (DataRow row in dt.Rows)
{
string FirstName;
string LastName;
object[] array = row.ItemArray;
FirstName = array[0].ToString();
LastName = array[1].ToString();
MessageBox.Show(“FirstName=” + FirstName + ” AND LastName=” + LastName);
The sample package outlined can be downloaded here.

Default Value Assignment

Once a variable is declared, scoped, and a data type defined a default value can be assigned within the Value field.  This is the value that the variable will take once it falls in scope.  This will remain the variable value until a task changes that value:
variablevalue

Expression

An SSIS expression can be used to assign a value to a variable which, like the Value property, is the value that will be assigned once the variable falls into scope.  In SSIS 2005-2008R2 the variable expression value had a limitation to 4,000 characters.  This could be overcome by using an expression to concatenate several together.  I outlined this technique here.
varexpress

Expression Task

An expression task, first introduced in SSIS 2012, provides the ability to use an SSIS expression to assign a value to a variable once the task is execute:
Expressiontask

Script Task

A script task provides two separate means for variable value assignment.

ReadOnly/ReadWrite

The first, and probably the easiest, is to list the variables for ReadOnly or ReadWrite access within the components configuration pane:
Scriptassign
With the variable(s) being entered within the component code is written that we don’t see that handles the locking and unlocking of the variable(s) for both read and write.  The system generated code allows us to interact directly with the variables without having to first lock the variables.  For example the below VB.NET code will use the MessageBox Show method to display the value of the variable “Today” that was enabled for ReadWrite access, then change the value, and once again display the new value in a message box:
MessageBox.Show(Dts.Variables(0).Value.ToString)
Dts.Variables(0).Value = Now.AddDays(-1)
MessageBox.Show(“The new date is ” + Dts.Variables(0).Value.ToString)
This provide quick and easy access to the variable for read and write, but does limit when the variable is locked and unlocked based on the system generated code.  For more granular control over variable locking you can utilize the VariableDispenser within your code.

VariableDispenser

Utilizing the VariableDispenser method does require more code, but again provides complete control over the locking and unlocking of variables.  DO NOT include the variable within the script configuration ReadOnly or ReadWrite property when using this method or an error will arise when you attempt to programmatically lock or unlock the variables since the components system generated code handles this.  The below VB.NET code demonstrates using the VariableDispenser to work with variable directly within a script task:
Public Sub Main()
Dim vars As Variables = Nothing
Dim myVar As Variable
Dts.VariableDispenser.LockForRead(“User::Today”)
Dts.VariableDispenser.GetVariables(vars)
For Each myVar In vars
MessageBox.Show(“Variable value ” + myVar.Value.ToString + ” and name is ” + myVar.Name)
Next
Dts.VariableDispenser.LockForWrite(“User::Today”)
       For Each myVar In vars
myVar.Value = Now
Next
For Each myVar In vars
MessageBox.Show(“Variable value ” + myVar.Value.ToString + ” and name is ” + myVar.Name)
Next
vars.Unlock()
Dts.TaskResult = ScriptResults.Success
End Sub
The above code uses a for each loop to iterate through all locked variables, which in this case is only the User::Today variable.  To access the variable properties and methods directly the vars(i) can be called, where “i” is the zero based indexed value of the variable.  For example, to re-assign the today variables value the following code could be used to assign the current
vars(0).Value = Now

Execute SQL Task

The execute SQL task can also be used to assign a variable a value in two different ways.

Output Parameter

Variable assignment can be done by using a query that assigns the result(s) to a placeholder of a “?“.  For example the below query would take the result of SELECT GETDATE() and assign it to the output parameter that is being held with the “?
SELECT ? = GETDATE()
sqloutput
Using the query with the placeholder alone does not complete the assignment and still requires the mapping of the variable to the output parameter which is done on the Parameter Mapping page.  On the parameter mapping you select the variable, in this case User::Today, the direction, which is output, and the ParameterName, which will be the zero based index of the value, in this case 0.  The question often comes up as to whether a named output parameter can be used rather than the indexed value and the truth is that it depends upon the connection manager.  This post outlines the different configuration methods available for output parameters.
outputparam

Result Set

Another way to use an execute SQL task to assign a value to a variable is by using a result set.  This is very similar to an output parameter and only differs in how the query is written, there is no “?” place holder or parameter name, but rather just the result set type and result set mapping.  On the execute SQL configuration page result set type must be defined as None, Single row, Full result set, or XML, the type is completely dependant upon the type of result(s) that will be provided from the query:
resultsetpane
In the Result Set pane you again map the zero based index results to the return value(s) of the query.  In this case there is only one query that will return a result so the indexed value is zero.
resultsets
One benefit that you may immediately notice is that if you have multiple variables requiring assignment, such as the current date and the last date a process ran, you can do this within one single execute SLQ task using output parameters, while it would require 2 execute SQL tasks each with a single row result set. There is the possibility to utilize one execute SQL task using a result set of a full result set or possibly XML this would be overkill in such a situation and more easily accomplished with output parameters.

Data Flow Task

The data flow task can be used for variable assignment, for example using the Row Count transformation to assign the number of rows that pass between two data flow components, or a complete result set using a Recordset destination.  To go back to my previous post that covered the object variable data type I used an execute SQL task to populate the variable.  Rather than an execute SQL task I can use a data flow task that gets the FirstName and LastName columns from the Adventureworks2012.Person.Person table and sends the result set to a Recordset destination mapping my object variable, in this case the User::Names variable:
recordset

0 comments:

Post a Comment