Data Integrator Transforms Overview:

Data Integrator Transforms Overview:

 

 Data Integrator Transforms List:


  SAP Data services is one of the finest ETL(Extract, Transform, Load) tools which  delivers a single enterprise-class solution for data integration, data quality, data profiling, and text data processing that allows you to integrate, transform, improve, and deliver trusted data to critical business processes.

  SAP Data services Transforms are built-in system objects stored in repository, which are used whenever we want to transform data from source(s) to target(s).

  The transforms can be found under Transforms tab of our Local object Library – which provides access to all repository objects (in-built or user built).

 The transforms are majorly classified into four categories as below. Expanding each type we can see the list of transforms present in each category.

 

·          Data Integrator 

·          Data Quality

·          Platform

·          Text Data Processing

   

     Data Integrator Transforms:


    The list of data integrator transforms present are:


 

 Data transfer

 Date Generation

 Effective_Date

Hierarchy_Flattening

 History_Preserving

 Key_Generation

 Map_CDC_Operation

 Pivot (Columns to Rows)

 Reverse Pivot (Rows to Columns)

 Table_Comparison

 XML _Pipeloine

 

Let’s look into the detailed description of each of the transforms present under Data integrator category.

 Data Transfer:


    This transform writes the data from a source or the output from another transform into a transfer object and subsequently reads data from the transfer object.
        The transfer type can be a relational database table or file.
      We can use the Data_Transfer transform to push down operations to the database server when the transfer type is a database table.
    We can also can push down resource-consuming operations such as joins, GROUP BY, and sorts using this transform.
Please go through the article ‘Data Transfer transform in SAP Data Services’ to know more details about this transform.
  Date Generation:
   This transform produces a series of dates incremented as we specify in the transform settings.
   We use this transform to produce the key values for a time dimension target.
   From this generated sequence we can also populate other fields in the time dimension (such as day_of_week) using functions in a query transform.
Please go through the article ‘Date Generation transform in SAP Data Services’ to know more details about this transform.
  Effective Date:
o   This transform is used to calculate an “effective-to” value for data that contains an effective date.
o   The calculated effective-to date and an existing effective date produce a date range that allows queries based on effective dates to produce meaningful results.

    Please go through the article ‘Effective Date transform in SAP Data Services’ to know more details about this transform.

v  Hierarchy Flattening:

          This transform constructs a complete hierarchy from parent/child relationships, then produces a description of the hierarchy in vertically or horizontally flattened format which can be used to build start models in data warehouse environment.

Please go through the article ‘Hierarchy Flattening transform in SAP Data Services’ to know more details about this transform.

v  History Preserving:
     The History_Preserving transform allows us to produce a new row in our target rather than updating an existing row.
     We can indicate in which columns the transform identifies changes to be preserved.
      If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.
Please go through the article History Preserving transform in SAP Data Services to know more details about this transform.
  Key Generation:
o   This transform is used to generate new keys for new rows/records in a data set.
o   When it is necessary to generate artificial keys in a table, the Key_Generation transform looks up the maximum existing key value from a table and uses it as the starting value to generate new keys.
o   The transform expects the generated key column to be part of the input schema.
Please go through the article ‘Key Generation transform in SAP Data Services’ to know more details about this transform.
 Map CDC Operation:

o   Using its input requirements (values for the Sequencing column and a Row operation column), the Map CDC Operation transform performs three functions:
o   Sorts input data based on values in Sequencing column drop-down list and (optional) the Additional grouping columns box.
o   Maps output data based on values in Row operation column drop-down list. Source table rows are mapped to INSERT, UPDATE, or DELETE operations before passing them on to the target.
o   Resolves missing, separated, or multiple before- and after-images for UPDATE rows.
o   Allows you filter columns and view UPDATE rows prior to running the job.
o   While commonly used to support relational or mainframe changed-data capture (CDC), this transform supports any data stream as long as its input requirements are met. Relational CDC sources include Oracle and SQL Server.
o   This transform is typically the last object before the target in a data flow because it produces INPUT, UPDATE and DELETE operation codes. Data Services produces a warning if other objects are used.
Please go through the article ‘Map CDC Operation transform in SAP Data Services’ to know more details about this transform.

 Pivot (Columns to Rows):

o    This transform creates a new row for each value in a column that we identify as a pivot column.
o    The Pivot transform allows us to change how the relationship between rows is displayed.
o    For each value in each pivot column, Data Services produces a row in the output data set. We can also create pivot sets to specify more than one pivot column.
Please go through the article ‘Pivot (Columns to Rows) transform in SAP Data Services’to know more details about this transform.

 Reverse Pivot (Rows to Columns):
o  
This transform creates one row of data from several existing rows.
o   The Reverse Pivot transform allows us to combine data from several rows into one row by creating new columns.
o   For each unique value in a pivot axis column and each selected pivot column, Data Services produces a column in the output data set.

Please go through the article ‘Reverse Pivot (Rows to Columns) transform in SAP Data Services’ to know more details about this transform.

 Table Comparision:

o   This transform compares two data sets and produces the difference between them as a data set with rows flagged as INSERT, UPDATE, or DELETE.
o   The Table_Comparison transform allows us to detect and forward changes that have occurred since the last time a target was updated.

Note that in order to use the Table_Comparison transform with Teradata 13 and later tables as the comparison table and target table, we must do the following things:

  On the Teradata server, set the General parameter DBSControl to TRUE to allow uncommitted data to be read.

   In the Data Services Teradata datastore, add the following statement in the “Additional session parameters” field:

   SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

Please go through the article ‘Table Comparision transform in SAP Data Services’ to know more details about this transform.

  XML Pipeline:

               This transform is used to process large XML files one instance of a repeatable structure at a time.


o   With this transform, Data Services does not need to read the entire XML input into memory then build an internal data structure before performing the transformation.

o   An NRDM structure is not required to represent the entire XML data input. Instead, the XML_Pipeline transform uses a portion of memory to process each instance of a repeatable structure, then continually releases and reuses memory to steadily flow XML data through the transform.

o   During execution, Data Services pushes operations of the XML_Pipeline transform to the XML source.


Please go through the article ‘XML Pipeline transform in SAP Data Services’ to know more details about this transform.

The below image represents all the transforms available as part of Data Integrator category.

 

The calculated effective-to date and an existing effective date produce a date range that allows queries based on effective dates to produce meaningful results.
Please go through the article ‘Effective Date transform in SAP Data Services’ to know more details about this transform.

Data Quality Transforms Overview:

Data Quality Transforms Overview/List:


The list of data quality transforms present are:
1.        Associate
2.        Country_ID
3.        Data_Cleanse
4.        DSF2_Walk_Sequencer
5.        Geocoder
6.        Global_Address_Cleanse
7.        Global_Suggestion_List
8.        Match
9.        USA_Regulatory_Address_Cleanse
10.     User_Defined
Lets look into the detailed description of each of the transforms present under Data Quality category.
1 .Associate:

§  The Associate transform  works downstream from Match transform to provide a way to combine, or associate, their match results by using the Match transform-generated Group Number fields.
§  We may need to add a Group Statistics operation to the Associate transform to gather match statistics.
§  You can combine the results of two or more Match transforms, two or more Associate transforms, or any combination of the two.
§  For example, we may use one Match transform to match on name and address, use a second Match transform to match on SSN, and then use an Associate transform to combine the match groups produced by the two Match transforms.
Please go through the article ‘Associate transform in SAP Data Services’ to know more details about this transform.
2. Country ID:
§  The Country ID transform parses our input data and then identifies the country of destination for each record.
§  After identifying the country, the transform can output the country name, any of three different ISO country codes, an ISO script code, and a percentage of confidence in the assignment.
§  Though we can use the Country ID transform before any transform in a data flow, we will probably find it most useful during a transactional address cleanse job.
§  Place the Country ID transform before the Global Suggestion List transform. The Global Suggestion List transform needs the ISO_Country_Code_2Char field that the Country ID transform can output.
§  It is not necessary to use the Country ID transform before the Global Address Cleanse transform in a data flow because the Global Address Cleanse transform contains its own Country ID processing.
§  It is also not necessary to use the Country ID transform before the USA Regulatory Address Cleanse transform because the input data should contain U.S. addresses only.
Please go through the article ‘Country ID transform in SAP Data Services’ to know more details about this transform.
3. Data Cleanse:
§   Use the Data Cleanse transform to parse and format custom or person and firm data as well as phone numbers, dates, e-mail addresses, and Social Security numbers.
§  Custom data includes operational or product data specific to the business.
§  The cleansing package we specify defines how our data should be parsed and standardized.
§  Within a data flow, the Data Cleanse transform is typically placed after the address cleansing process and before the matching process.
Please go through the article ‘Data Cleanse transform in SAP Data Services’ to know more details about this transform.
4. DSF2 Walk Sequencer:
§  To add walk sequence information to our data, include the DSF2 Walk Sequencer transform in the data flow. We can then send our data through presorting software to qualify for the following walk-sequence discounts:
§  Carrier Route
§  Walk Sequence
§  90% Residential Saturation
§  75% Total Active Saturation
§  DSF2 walk sequencing is often called “pseudo” sequencing because it mimics USPS walk sequencing.
§  Where USPS walk-sequence numbers cover every address, DSF2 walk sequence processing provides “pseudo” sequence numbers for the addresses only in that particular file.
Please go through the article ‘DSF2 Walk Sequencer transform in SAP Data Services’ to know more details about this transform.
5. Geocoder:
§  The Geocoder transform uses geographic coordinates expressed as latitude and longitude, addresses, and point-of-interest (POI) data. Using the transform, we can append addresses, latitude and longitude, census data (US only), and other information to the data.
§  Based on mapped input fields, the Geocoder transform has three modes of geocode processing:
§  Address Geocoding
§  Reverse Geocoding
§  POI textual search

Please go through the article ‘Geocoder transform in SAP Data Services’ to know more details about this transform.

6. Global Address Cleanse:
§  The Global Address Cleanse transform identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, secondary number, locality, region and postcode.
§  Note:The Global Address Cleanse transform does not support CASS certification or produce a USPS Form 3553. If you want to certify your U.S. address data, you must use the USA Regulatory Address Cleanse transform, which supports CASS.
§  If we perform both address cleansing and data cleansing, the Global Address Cleanse transform typically comes before the Data Cleanse transform in the data flow.
Please go through the article ‘Global Address Cleanse transform in SAP Data Services’ to know more details about this transform.
7. Global Suggestion List:
§  The Global Suggestion List transform query addresses with minimal data, and it can offer suggestions for possible matches. It is a beneficial research tool for managing unassigned addresses from a batch process.
§  Global Suggestion List functionality is designed to be integrated into our own custom applications via the Web Service.
§  The Global Suggestion List transform requires the two character ISO country code on input. Therefore, we may want to place a transform, such as the Country ID transform, that will output the ISO_Country_Code_2Char field before the Global Suggestion List transform.
§  The Global Suggestion List transform is available for use with the Canada, Global Address, and USA engines.
Please go through the article ‘Global Suggestion List transform in SAP Data Services’ to know more details about this transform.
8.Match:
§  The Match transform is responsible for performing matching based on the business rules we define. The transform then sends matching and unique records on to the next transform in the data flow.
§  For best results, the data in which we are attempting to find matches should be cleansed. Therefore, we may need to include other Data Quality transforms before the Match transform.
Please go through the article ‘Match transform in SAP Data Services’ to know more details about this transform.


9. USA Regulatory Address Cleanse:
§  The USA Regulatory Address Cleanse transform identifies, parses, validates, and corrects U. S. address data according to the U.S. Coding Accuracy Support System (CASS).
§  This transform can create the USPS Form 3553 and output many useful codes to our records. We can also run in a non-certification mode as well as produce suggestion lists.
§  If we perform both data cleansing and matching, the USA Regulatory Address Cleanse transform typically comes before the Data Cleanse transform and any of the Match transforms in the data flow.
§  SAP recommends using a sample job or data flow that is set up according to best practices for a specific use case.
Please go through the article ‘USA Regulatory Address Cleanse transform in SAP Data Services’ to know more details about this transform.
10 .User Defined:
§  The User-Defined transform provides us with custom processing in a data flow using full Python scripting language.
§  The applications for the User-Defined transform are nearly limitless. It can do just about anything that we can write Python code to do.
§  We can use the User-Defined transform to generate new records, populate a field with a specific value, create a file, connect to a website, or send an email, just to name a few possibilities.
§  We can place this transform anywhere in our data flow. If we have created our own transform, then the only restrictions about where it can be located in the data flow are those which we place on it.
§  Although the User-Defined transform is quite flexible and powerful, we will find that many of the tasks we want to perform can be accomplished with the Query transform.
§  The Query transform is generally more scalable and faster, and uses less memory than User-Defined transforms.

Please go through the article ‘User Defined transform in SAP Data Services’ to know more details about this transform.

No comments:

Post a Comment