Data Integrator Transforms Overview:
Data
Integrator Transforms List:
SAP Data
services is one of the finest ETL(Extract, Transform, Load) tools which
delivers a single enterprise-class solution for data integration, data
quality, data profiling, and text data processing that allows you to integrate,
transform, improve, and deliver trusted data to critical business processes.
SAP Data
services Transforms are built-in system objects stored in repository, which are
used whenever we want to transform data from source(s) to target(s).
The
transforms can be found under Transforms tab of our Local object Library –
which provides access to all repository objects (in-built or user built).
The
transforms are majorly classified into four categories as below. Expanding each
type we can see the list of transforms present in each category.
· Data
Integrator
· Data
Quality
· Platform
· Text
Data Processing
Data Integrator Transforms:
The list of data integrator transforms present are:
Data transfer
Date Generation
Effective_Date
Hierarchy_Flattening
History_Preserving
Key_Generation
Map_CDC_Operation
Pivot
(Columns to Rows)
Reverse
Pivot (Rows to Columns)
Table_Comparison
XML
_Pipeloine
Let’s look into the detailed description of
each of the transforms present under Data integrator category.
Data Transfer:
This transform writes the data from a source or the output from
another transform into a transfer object and subsequently reads data from the
transfer object.
The transfer type can be a
relational database table or file.
We can use the Data_Transfer
transform to push down operations to the database server when the transfer type
is a database table.
We can also can push down
resource-consuming operations such as joins, GROUP BY, and sorts using this
transform.
Please go through the article ‘Data
Transfer transform in SAP Data Services’ to know more details about this
transform.
Date Generation:
This transform produces a series of dates incremented as we
specify in the transform settings.
We use this transform to
produce the key values for a time dimension target.
From this generated sequence we
can also populate other fields in the time dimension (such as day_of_week)
using functions in a query transform.
Please go through the article ‘Date
Generation transform in SAP Data Services’ to know more details about this transform.
Effective Date:
o
This transform is used to calculate an “effective-to” value for
data that contains an effective date.
o The calculated effective-to
date and an existing effective date produce a date range that allows queries based
on effective dates to produce meaningful results.
Please go through the article ‘Effective Date transform in SAP Data Services’ to know more details about this transform.
v Hierarchy Flattening:
This transform constructs a complete hierarchy from parent/child
relationships, then produces a description of the hierarchy in vertically or
horizontally flattened format which can be used to build start models in data
warehouse environment.
Please go through the article ‘Hierarchy Flattening transform
in SAP Data Services’ to know more details about this transform.
v History Preserving:
The History_Preserving transform allows us to produce a new row in
our target rather than updating an existing row.
We can indicate in which
columns the transform identifies changes to be preserved.
If the value of certain columns
change, this transform creates a new row for each row flagged as UPDATE in the
input data set.
Please go through the article ‘History Preserving transform
in SAP Data Services’ to know more details about this transform.
Key Generation:
o
This transform is used to generate new keys for new rows/records
in a data set.
o When it is necessary to
generate artificial keys in a table, the Key_Generation transform looks up the
maximum existing key value from a table and uses it as the starting value to
generate new keys.
o The transform expects the
generated key column to be part of the input schema.
Please go through the article ‘Key Generation transform in SAP Data
Services’ to know more details about this transform.
Map CDC
Operation:
o
Using its input requirements (values for the Sequencing column and
a Row operation column), the Map CDC Operation transform performs three
functions:
o Sorts input data based on
values in Sequencing column drop-down list and (optional) the Additional
grouping columns box.
o Maps output data based on
values in Row operation column drop-down list. Source table rows are mapped to
INSERT, UPDATE, or DELETE operations before passing them on to the target.
o Resolves missing, separated, or
multiple before- and after-images for UPDATE rows.
o Allows you filter columns and
view UPDATE rows prior to running the job.
o While commonly used to support
relational or mainframe changed-data capture (CDC), this transform supports any
data stream as long as its input requirements are met. Relational CDC sources
include Oracle and SQL Server.
o This transform is typically the
last object before the target in a data flow because it produces INPUT, UPDATE
and DELETE operation codes. Data Services produces a warning if other objects
are used.
Please go through the article ‘Map CDC
Operation transform in SAP Data Services’ to know more details about this transform.
Pivot (Columns to Rows):
o
This transform creates a new
row for each value in a column that we identify as a pivot column.
o The Pivot transform allows us to change how the relationship
between rows is displayed.
o For each value in each pivot column, Data Services produces a row
in the output data set. We can also create pivot sets to specify more than one
pivot column.
Please go through the article ‘Pivot (Columns to Rows) transform in SAP Data Services’to
know more details about this transform.
Reverse Pivot (Rows to Columns):
o
This transform creates one row of data from several existing rows.
This transform creates one row of data from several existing rows.
o The Reverse Pivot transform
allows us to combine data from several rows into one row by creating new
columns.
o For each unique value in a
pivot axis column and each selected pivot column, Data Services produces a
column in the output data set.
Please go through the article ‘Reverse Pivot (Rows to Columns) transform in SAP Data
Services’ to know more details about this transform.
Table
Comparision:
o This transform compares two
data sets and produces the difference between them as a data set with rows
flagged as INSERT, UPDATE, or DELETE.
o The Table_Comparison transform
allows us to detect and forward changes that have occurred since the last time
a target was updated.
Note that in order to use the Table_Comparison
transform with Teradata 13 and later tables as the comparison table and target
table, we must do the following things:
On the Teradata server, set the General parameter DBSControl to TRUE to allow uncommitted data to be read.
In the Data Services Teradata datastore, add the following statement in the “Additional session parameters” field:
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Please go through the article ‘Table Comparision
transform in SAP Data Services’ to know more details about this transform.
This transform is used to process large XML files one instance of a
repeatable structure at a time.
o
With this transform, Data Services does not
need to read the entire XML input into memory then build an internal data
structure before performing the transformation.
o
An NRDM structure is not required to represent
the entire XML data input. Instead, the XML_Pipeline transform uses a portion
of memory to process each instance of a repeatable structure, then continually
releases and reuses memory to steadily flow XML data through the transform.
o During execution, Data Services pushes operations of the
XML_Pipeline transform to the XML source.
Please go through the
article ‘XML
Pipeline transform in SAP Data Services’ to know more details about this transform.
The below image represents
all the transforms available as part of Data Integrator category.
The calculated effective-to date and an existing effective date
produce a date range that allows queries based on effective dates to produce
meaningful results.
Please go through the article ‘Effective Date transform in
SAP Data Services’ to know more details about this transform.
Data Quality Transforms Overview:
Data Quality Transforms Overview/List:
The list of data quality transforms present are:
1. Associate
2. Country_ID
3. Data_Cleanse
4. DSF2_Walk_Sequencer
5. Geocoder
6. Global_Address_Cleanse
7. Global_Suggestion_List
8. Match
9. USA_Regulatory_Address_Cleanse
10. User_Defined
Lets look into the detailed description of each of the transforms
present under Data Quality category.
1 .Associate:
§
The Associate transform
works downstream from Match transform to provide a way to combine, or
associate, their match results by using the Match transform-generated Group
Number fields.
§
We may need to add a Group
Statistics operation to the Associate transform to gather match statistics.
§
You can combine the results of
two or more Match transforms, two or more Associate transforms, or any
combination of the two.
§
For example, we may use one
Match transform to match on name and address, use a second Match transform to
match on SSN, and then use an Associate transform to combine the match groups
produced by the two Match transforms.
Please go through the article ‘Associate transform in SAP Data Services’ to know
more details about this transform.
2. Country ID:
§
The Country ID transform parses
our input data and then identifies the country of destination for each record.
§
After identifying the country,
the transform can output the country name, any of three different ISO country
codes, an ISO script code, and a percentage of confidence in the assignment.
§
Though we can use the Country
ID transform before any transform in a data flow, we will probably find it most
useful during a transactional address cleanse job.
§
Place the Country ID transform
before the Global Suggestion List transform. The Global Suggestion List
transform needs the ISO_Country_Code_2Char field that the Country ID transform
can output.
§
It is not necessary to use the
Country ID transform before the Global Address Cleanse transform in a data flow
because the Global Address Cleanse transform contains its own Country ID
processing.
§
It is also not necessary to use
the Country ID transform before the USA Regulatory Address Cleanse transform
because the input data should contain U.S. addresses only.
Please go through the article ‘Country ID transform in SAP
Data Services’ to know
more details about this transform.
3. Data Cleanse:
§ Use the Data Cleanse transform to parse and
format custom or person and firm data as well as phone numbers, dates, e-mail
addresses, and Social Security numbers.
§ Custom data includes
operational or product data specific to the business.
§ The cleansing package we
specify defines how our data should be parsed and standardized.
§ Within a data flow, the Data
Cleanse transform is typically placed after the address cleansing process and
before the matching process.
Please go through the article ‘Data Cleanse transform in SAP
Data Services’ to know more details about this transform.
4. DSF2 Walk Sequencer:
§ To add walk sequence
information to our data, include the DSF2 Walk Sequencer transform in the data
flow. We can then send our data through presorting software to qualify for the
following walk-sequence discounts:
§ Carrier Route
§ Walk Sequence
§ 90% Residential Saturation
§ 75% Total Active Saturation
§ DSF2 walk sequencing is often
called “pseudo” sequencing because it mimics USPS walk sequencing.
§ Where USPS walk-sequence
numbers cover every address, DSF2 walk sequence processing provides “pseudo”
sequence numbers for the addresses only in that particular file.
Please go through the article ‘DSF2 Walk Sequencer transform
in SAP Data Services’ to know more details about this
transform.
5. Geocoder:
§ The Geocoder transform uses
geographic coordinates expressed as latitude and longitude, addresses, and
point-of-interest (POI) data. Using the transform, we can append addresses,
latitude and longitude, census data (US only), and other information to the
data.
§ Based on mapped input fields,
the Geocoder transform has three modes of geocode processing:
§ Address Geocoding
§ Reverse Geocoding
§ POI textual search
Please go through the article ‘Geocoder transform in SAP Data
Services’ to know more details about this
transform.
6. Global Address
Cleanse:
§ The Global Address Cleanse transform identifies, parses,
validates, and corrects global address data, such as primary number, primary
name, primary type, directional, secondary identifier, secondary number,
locality, region and postcode.
§ Note:The Global Address Cleanse transform does not support CASS
certification or produce a USPS Form 3553. If you want to certify your U.S.
address data, you must use the USA Regulatory Address Cleanse transform, which
supports CASS.
§ If we perform both address cleansing and data cleansing, the
Global Address Cleanse transform typically comes before the Data Cleanse
transform in the data flow.
Please go through the article ‘Global Address Cleanse
transform in SAP Data Services’ to know more details about this
transform.
7. Global Suggestion
List:
§
The Global Suggestion List
transform query addresses with minimal data, and it can offer suggestions for
possible matches. It is a beneficial research tool for managing unassigned
addresses from a batch process.
§
Global Suggestion List
functionality is designed to be integrated into our own custom applications via
the Web Service.
§
The Global Suggestion List
transform requires the two character ISO country code on input. Therefore, we
may want to place a transform, such as the Country ID transform, that will
output the ISO_Country_Code_2Char field before the Global Suggestion List
transform.
§
The Global Suggestion List
transform is available for use with the Canada, Global Address, and USA
engines.
Please go through the article ‘Global Suggestion List
transform in SAP Data Services’ to know more details about this
transform.
8.Match:
§ The Match transform is
responsible for performing matching based on the business rules we define. The
transform then sends matching and unique records on to the next transform in
the data flow.
§ For best results, the data in
which we are attempting to find matches should be cleansed. Therefore, we may
need to include other Data Quality transforms before the Match transform.
Please go through the article ‘Match transform in SAP Data
Services’ to know more details about this transform.
9. USA Regulatory
Address Cleanse:
§
The USA Regulatory Address
Cleanse transform identifies, parses, validates, and corrects U. S. address
data according to the U.S. Coding Accuracy Support System (CASS).
§
This transform can create the
USPS Form 3553 and output many useful codes to our records. We can also run in
a non-certification mode as well as produce suggestion lists.
§
If we perform both data
cleansing and matching, the USA Regulatory Address Cleanse transform typically
comes before the Data Cleanse transform and any of the Match transforms in the
data flow.
§
SAP recommends using a sample
job or data flow that is set up according to best practices for a specific use
case.
Please go through the article ‘USA Regulatory Address Cleanse
transform in SAP Data Services’ to know more details about this
transform.
10 .User Defined:
§ The User-Defined transform
provides us with custom processing in a data flow using full Python scripting
language.
§ The applications for the
User-Defined transform are nearly limitless. It can do just about anything that
we can write Python code to do.
§ We can use the User-Defined
transform to generate new records, populate a field with a specific value,
create a file, connect to a website, or send an email, just to name a few
possibilities.
§ We can place this transform
anywhere in our data flow. If we have created our own transform, then the only
restrictions about where it can be located in the data flow are those which we
place on it.
§ Although the User-Defined
transform is quite flexible and powerful, we will find that many of the tasks
we want to perform can be accomplished with the Query transform.
§ The Query transform is
generally more scalable and faster, and uses less memory than User-Defined
transforms.
Please go through the article ‘User Defined transform in SAP
Data Services’ to know more details about this transform.
No comments:
Post a Comment