Operating Flowgraphs

Objectives

After completing this lesson, you will be able to:

  • Debug a flowgraph
  • Scheduling a flowgraph

Debugging a Flowgraph

Suppose you've created a flowgraph, and you want to check if the intermediate nodes are working as intended.

Launch the following video to learn how you can debug nodes of a flowgraph.

Debugging can be useful during development to ensure results appear as expected at each node stage of the flowgraph. You should try different sets of data, even bad data, and check how each node reacts.

Debugging is also useful once the flowgraph has moved to production as a tool to identify operational issues.

Scheduling a Flowgraph

You've created a flowgraph and already tested it. It works fine, and now you want to let it run regularly.

Make sure that the flowgraph has been deployed as a procedure.

In your project, create a source file with the extension .hdbschedulerjob, for example UPDATE_JOB.hdbschedulerjob .

The file should include the SQL command CREATE SCHEDULER JOB, but write it without the leading CREATE. You may already know this concept from writing the CREATE TABLE SQL command without the leading CREATE in a table definition (.hdbtable) file.

Depending on the flowgraph design, you need to provide various parameters in the .hdbschedulerjob file. For example, suppose you have created and deployed a flowgraph and its corresponding procedure with the name People_Fullname2 with a parameter P_COUNTRY. Its parameter value should be 'USA'. The job should run on Monday through Friday at 1:00 am during all of 2024 and 2025.

You would define the statement as follows:

Code snippet
SCHEDULER JOB UPDATE_JOB 
CRON '2024,2025 * * mon,tue,wed,thu,fri 1 00 00' 
ENABLE PROCEDURE "People_fullname2" 
PARAMETERS P_COUNTRY = 'USA'
Expand

After CRON, a cron expression (a string of format '<years> <months> <dates> <weekdays> <hours> <minutes> <seconds>') is expected. This expression defines the recurrence.

To delete the job, delete the file and redeploy the src folder.

Real-Time Processing

There are two ways of processing: batch and real-time.

Real-time means that records are immediately processed row by row. With batch processing, the data is selected in packages. This means, partitioning is possible. Nodes that can be processed row by row without changing the result can be used for real-time processing. Nodes that need to process the full data set at once cannot be used for real-time processing. Check the following table in case of doubt.

Valid for real-time processingNot valid for real-time processing
  • Aggregation
  • Case
  • Cleanse
  • Data Mask
  • Geocode
  • History Preserving
  • Lookup
  • Map Operation
  • Table Comparison
  • Union
  • Date Generation
  • Join
  • Match
  • Pivot
  • Procedure
  • Projection
  • Row Generation
  • Unpivot

References

Check out the following references:

Log in to track your progress & complete quizzes