Let us introduce John : he is new to the team and has been chosen to work with Sandra, an SAP Data Services Expert, to work on a new integration project.
Data Services provides various objects that are used when building data integration and data quality applications.
The main objects
Before you can design your data transfer process, you will need to create a project structure.
A project consists of one or more jobs that can be split into different work flows.
In the job, or the work flow, you'll then be able to design the actual ETL process in a data flow.
Let me summarize it this way :
The data flow is primarily made of a source (where the data comes from) and a target (where the data goes to).
Those sources and targets can be of two different types :
- Data stores (Databases, Applications, Web Services...)
- Files (Formats) (Flat files, HDFS files, Excel workbooks...)
You can design a large variety of transformations on the source data by applying transforms.
Those transforms are stored in four different categories :
- Platform : main transforms that fetch data and execute some basic calculations or validations
- Data Integrator : mostly used for Data Warehouse/Data Mart design. Generates new data or changes the structure of the source data.
- Data Quality : apply modifications to complete, cleanse or augment the source data.
- Text Data Processing : for text analysis.
Here is an example of a data flow querying data from a flat file and storing the result into a table. The Query transform is the most used of the platform transforms.
Beside those essential objects, you can also import from data stores or create functions. Those functions can be used in any expression you need to write to filter or calculate some new data for example.
The Local Object Library
All the main objects I've presented can be found in the Local Object Library.
Most objects created in Data Services are available for reuse.
After you define and save a reusable object, Data Services stores the definition in the repository (the Local Object Library). You can reuse the definition as necessary by creating calls to it.
For example, a data flow within a project is a reusable object. Multiple jobs, such as a weekly load job and a daily load job, can call the same data flow. If this data flow is changed, both jobs call the new version of the data flow.
You can edit reusable objects at any time, independently of the current open project. For example, if you open a new project, you can open a data flow from another project and edit it. The changes made to the data flow are not stored until they are saved.
The orchestration objects
Project, Job and Data flow are the three main components to design a data transfer process. But of course, you might need to create more complex flows.
Other useful objects
- Scripts : To execute some logic before or after executing a data flow (instantiating a variable value for example).
- Conditionals : To execute work flows or data flows depending on conditions (variable value or file existence for example).
- While Loops : To execute a work flow or a data flow several times.
- Try Catch : To manage exceptions in your data flows.
Those objects are not stored in the Repository and thus are not reusable. Single-use objects appear only as components of other objects. They operate only in the context in which they were created. Single-use objects cannot be copied.