Parameters and Variables - Atrium Integrator (Spoon)

Version 1
    Share This:

    Parameters and Variables - Atrium Integrator (Spoon)

    Definitions upfront

    Named Parameter: “Named parameters are a system that allows you to parameterize your transformations and jobs.  On top of the variables system that was already in place prior to the introduction in version 3.2, named parameters offer the setting of a description and a default value. That allows you in turn to list the required parameters for a job or transformation.

    Variable: “Variables can be used throughout Pentaho Data Integration, including in transformation steps and job entries. You define variables by setting them with the Set Variable step in a transformation or by setting them in the file. [...] The first usage (and only usage in previous Kettle versions) was to set an environment variable. Traditionally, this was accomplished by passing options to the Java Virtual Machine (JVM) with the -D option. The only problem with using environment variables is that the usage is not dynamic and problems arise if you try to use them in a dynamic way.  Changes to the environment variables are visible to all software running on the virtual machine.  [...] Because the scope of an environment variable is too broad, Kettle variables were introduced to provide a way to define variables that are local to the job in which the variable is set. The "Set Variable" step in a transformation allows you to specify in which job you want to set the @variable's scope (i.e. parent job, grand-parent job or the root job).@


    Let’s walk through this very simple example of using parameters and variables. I try to explain all the jobs and transformations involved.







    In this extremely simple job we call a sub-job call jb_slave.kjb. In this case, we defined hard coded parameter values in the job entry settings. Alternatively, to make this more dynamic, we could have just defined parameters in the job settings.





    This sub-job executes the transformations tr_set_variables.ktr and tr_show_param_values.ktr. In this case, in order to access the parameter values from the parent job, we defined the parameters without values in the job settings:


    Job Properties.jpg


    Note: This is just one of the ways you can pass parameters down to the sub-process.





    This transformation sets a variable called var1 with scope Valid in parent job so that successive processes can make use if it. In this case the values originate from a Generate Rows step for demonstration purposes; in real world examples you might read in some values from a file or a database table.





    The main transformation has the sole purpose of writing all the parameter and variable values to the log. We retrieve the parameters and variable by using a Get Variables step. We also check if a value is present by using a Filter Rows step. In case one value is missing, we abort the transformation, otherwise the values are written to the log.

    There is no need to set the parameter names in this transformation; there is an advantage though if you do it:

    Missing parameter values will be properly displayed as NULL, which makes it a bit easier to check for them.

    If you don't define them in the transformation settings, missing parameter values will be displayed as ${PARAMETERNAME}.

    Important: Variables coming from tr_set_variables.ktr MUST NOT be listed in the Parameter tab in the Transformation Settings as this overrides the variable.

    Making Parameters available for all sub-processes in an easy fashion

    As you saw above, defining the parameters for each sub-process just to be able to pass them down can be a bit labour intensive. Luckily, there is a faster way of doing just this:

    1. In the main job specify the parameters that you want to pass in in the Job Settings:

    Job Settings.jpg

    These way parameters and their values can be passed in from the command line in example.

    2. Right after the Start job entry use the Set Variables job entry. Specify the variable names, reference the parameters you set up in step 1 and set the scope to Valid in the current job.


    set Variable.jpg


    3. There is no need to specify any parameters/variables in any of the sub processes.

    To see how this is working, run jb_main.kjb in the passing_down_parameters_in_an_easy_fashion folder (part of the provided examples).

    What if I still want to be able to run my subprocess independently sometimes?

    You might have some situations, when you have to run the subprocess independently (so in other words: You do not execute it from the parent/main job, but run it on its own). When we pass down parameters or variables, this can be a bit tricky and usually it just doesn’t work out of the box. Luckily, there is a way to achieve this though:

    1. In the subprocess, specify the parameter that you want to be able to pass in. In our example (which is based on the previous example), we modified the transformation tr_show_param_values.ktr and added following parameters to the Transformation Settings:

    Transformation Settings.jpg

    We also amended the Get Variables step to make use of these parameters:

    Get Variable.jpg

    This way, we can already run this transformation on its own. Now we only have to adjust the parent job so that we can run it from there as well.

    2. In the parent job, in the Job or Transformation job entry settings, go to the Parameters tab and tick Pass all parameter values down to the sub-transformation/sub-job. Next, as the Parameter set the name of the parameter you defined in the sub process. As the Value defines the variable that you want to pass down: ${variable}. This assumes that this variable was set beforehand by some Set Variables job entry/step.
    In our case, we modified transformation job entry in the job jb_slave.kjb and added following mapping to the job entry settings in the Parameters tab: