The Talend Open Studio tHashOutput and tHashInput allow you to save your input in RAM, offering potential performance gains. The basic usage defines a single tHashOutput which gathers input and a tHashInput which will direct the input to a data flow. This post describes two expanded configurations.
tHashOutput and tHashInput worked with input stored in internal memory and do so in a way consistent with other Talend components. The Hash components allow you to define flows to retrieve data throughout a map that has been stored by some other part of the job. In a simple scenario, this is done with a single input/output pair.
Multiple Sources
This screenshot shows a job that will merge two data sources -- a tRowGenerator and a tFileInputDelimited -- into a single Hash data structure using two tHashOutputs. The first tHashOutput will be referenced by subsequent tHashOutputs in the "Link with a tHashOutput" control.
This tHashOutput refers to the first component.
The combined data sets are available through the tHashInput. It doesn't matter which of the two components are selected in the Component List select since they are linked.
Neither the Data Write Model, Keys Management, or Append settings will have any effect in this job. Data Write Model has only one value in its select. I think Keys Management has a bug in version 5 (see TDI-21180). Append only takes effect in an iteration.
Clearing When Iterating
This job iterates over a data set, clearing the backing RAM structure defined in the tHashOutput with each iteration. This is done by unchecking Append. If Append were not unchecked, each iteration would produce more and more output as the preceding iteration's tHashOutput gathers more values.
The results of clearing the tHashOutput follow.
If Append is checked, the output is repeated as it accrues through the iterations.
The tHashOutput and tHashInput components can provide your Talend Open Studio job with a performance gain by saving input in RAM. tHashOutput can be used to gather input from different sources using the Linked feature. Append mode will work only in iterations and provides control over when a tHashOutput is cleared.
tHashOutput and tHashInput worked with input stored in internal memory and do so in a way consistent with other Talend components. The Hash components allow you to define flows to retrieve data throughout a map that has been stored by some other part of the job. In a simple scenario, this is done with a single input/output pair.
Multiple Sources
This screenshot shows a job that will merge two data sources -- a tRowGenerator and a tFileInputDelimited -- into a single Hash data structure using two tHashOutputs. The first tHashOutput will be referenced by subsequent tHashOutputs in the "Link with a tHashOutput" control.
Configuration of Linked tHashOutput |
tHashOutput Referring to Prior Component |
tHashInput Configuration |
Clearing When Iterating
This job iterates over a data set, clearing the backing RAM structure defined in the tHashOutput with each iteration. This is done by unchecking Append. If Append were not unchecked, each iteration would produce more and more output as the preceding iteration's tHashOutput gathers more values.
Clearing After Each Iteration |
Results with Append Unchecked |
If Append is checked, the output is repeated as it accrues through the iterations.
Iterating in Append Mode |
Comments
Post a Comment