Data Management Tool (DmT)/ Universal Data Managament (UDM), Atrium Integrator, Spoon tips & tricks

Version 1
    Share:|

    The takeaways (good stuff) up top, discussion/details further down:

     

    change Spoon options to

        not export unused connections ("Only save used connections to XML?" should be checked);

        ask before overwriting connections ("Ask about replacing existing connections on open/import?" should be checked);

     

    Use "File, Save as (VSF)" (to avoid inserting an unwanted Data_Path variable) rather than "File, Export, Linked XML".

     

    Delete unneeded DB connections from the PDI Repository (using the Spoon client, Explore Repository, DB Connections)

     

    Use Parameters, Kettle/Spoon Environment Variables where possible/practical.

    Use generic references everywhere else possible/practical.

     

    Set the "kettle.properties" file values for the Transformation/Job Logging DB name(s), Tables with values appropriate to each environment. (leaving the SCHEMA entries empty works for my MS SQL environment at this time).

    Leave the corresponding Logging stuff EMPTY in the Transformation Properties/Logging, Job Properties/Logging.

    (didn't work, had to put the table name in the Transformation Properties/Logging Transformation logging table name )

     

    A step that creates (and maybe that updates) a CMDB entry (and maybe a regular) Remedy form entry returns the Instance ID which can be used later on...

     

    Consolidate "Set Constants" steps into one per Job or Transformation.

     

    Jobs can be created with the AI Job Wizard or directly in Spoon, but still have to be "registered" using the DMT Console, Other Functions, Atrium Integrator Jobs to be visible for a DMT Load Step (An entry with the same name as the Job name in AI Console or the Spoon Repository). (this is also where a Variable specific for this Job can be set up).

     

    The DMT can't call an AI Job/Spoon Job with (multiple) spaces in the name. (So use names without spaces.)

     

    For Excel Input steps, on the Fields tab, set the Format for any text columns which might have entries that are all digits, to "#" (number, NO decimal places).

    Also consider setting the Trim Type to "Both" for any text fields that might cause a problem if one or more non-printing characters (space, TAB, etc) were accidentally entered instead of leaving it empty.

     

    UDM:Import form can be used to IMPORT jobs and transformations if you don't want to log on to the server  or otherwise use Spoon.

    (I haven't tried this yet, YMMV.)

    ------------------------------------------------------------------------------------------------------------------------------------

     

    Discussion/rationale:

     

    There's a good deal of Spoon/Pentaho Data Integration information on the internet, but not much for using it within BMC Remedy/ITSM. Maybe this will help.

     

    Annoyance #1:

    When importing a Spoon Transformation or Job, existing Database Connections are overwritten silently (no warning, no prompts).

     

    Annoyance #2:

    (Even worse), ALL the DB connections from the original Transformation or Job environment are imported, whether used by that Transformation or Job  or not!

     

    These two things are probably the reason that even on a new BMC install, the PDI Repository shows a lot of unfamiliar DB connections. In addition to those expected (your server, PDIREPO_YourServer_YourPortNumber), there  is one for the Transformations and Jobs in the Samples sub-folder, and many others, presumably from other BMC development efforts.

    Just having all these DB connections makes it harder to find the correct ones for your environment... and they aren't needed!

    All those can be deleted... from within Spoon, Explore Repository, DB Connections. Any that are used by Jobs or Transformations will give a warning dialog, and should be left alone (Sample_CMDB_Connection is used by the jobs/transformations in the Samples folder ). Pretty much everything else can go!

     

    For each instance of Spoon in your various environments , change the Spoon options to NOT DO this:)

        not export unused connections ("Only save used connections to XML?" should be checked);

        ask before overwriting connections ("Ask about replacing existing connections on open/import?" should be checked);

    Kettle Options to prevent overwriting DB connections.JPG

      

    Some (All?) of the Out-Of the Box (OOB) Jobs and Transformations show an explicit Remedy Server DB Connection reference for each BMC-provided step.

    (I suspect these got updated with the current server name when installed.)

    Working from these as examples, I did the same and slowly learned this was a bad idea because when I exported the Job  or Transformation from my development environment to my Test or Prod environments, I had to change all the DB Connection references.

    On my next iteration I used a generic reference such as "AR Server",with the idea that I wouldn't have to update the reference when I imported the job to my other environments, and simply configure the DB connection named "AR Server" to point to the appropriate server in each environment.

    (Boy was I surprised!) The Job or Transformation could not connect to AR Server because... it was looking for the Development server! On investigating, the AR Server reference was still in place on each step, but the DB connection settings for AR Server in the new environment had been changed (See annoyance #1).

     

    After changing the Spoon options to "Ask before overwriting DB connections", this stopped happening.

     

    (On my next iteration of development, I'll probably use the ${AR Server} parameter to avoid this. See below.)

     

    Annoyance  #3:

    Some of the OOB Jobs/transformations use parameters (aka Variables, Environment Variables) at various points.

    The run-time values for these are expected to be populated by the DMT/UDM Remedy stuff calling them.

    When using one of these to develop your own Jobs/Transformations and running them manually within Spoon (while developing or for troubleshooting), these have NULL values and the job/transformation fails,

    even an Excel Input step can't find the file to populate field names, view data, etc.

     

    A clue is that on clicking the Run icon, a Launch Window opens, with some references (such as "Internal.Job.Filename.Directory") in the lower right-hand section ("Variables"). You can populate these with your values to use when testing.. but each time you reopen the job or transformation, this will be empty again.

    (You can ALSO set these from the Edit/Set Environment Variables within Spoon, so you can populate them without running the Job/Transformation, to allow using them while creating/setting up new Steps).

     

    tip: entering these one line at a time is tedious and gets old real fast.

    It's possible (I haven't tried) that these could be set in the "kettle.properties" file so they would be there for later sessions.

    if not, the values can be captured in a text file for reuse. format is "variablename space value". Leave an empty line above and below the set of values...

    copy the whole range, including a blank line above and below.  You can use "Paste clipboard to table" in the Variables section to copy the whole set at once rather than one line at a time.

     

    Text file in Notepad:

    Notepad file with Env variables for Transformation.JPG

    Pasted into Launch dialog:

    Transformation -Launch Window -with Environment Variables-ready to run.JPG

     

    The "kettle.properties" file can be used to set (persistent) variables to be used for all later sessions for both Spoon (developing jobs) and (maybe?) when running from the DMT.

    this is useful for setting the Job and Transformation Logging values, which go to the PDI Repository (PDIREPO_YourServer_YourPortNumber).

    by changing this in the Transformation  and Job Settings (Properties) screens to use variables such as ${Transformation_Job_Logging), etc and setting these in the "kettle.properties" file, these can be set to the appropriate value for each environment (the Dev server PDI Repository on DEV, the Prod server PDI Repository on Prod, etc).

    (To set the "kettle.properties" file within Spoon, use the "Edit, set Kettle properties" menu).

     

    Kettle Properties (Job & Transformation Log settings).JPG

     

    BMC recommends using "export Linked resources to XML" which creates a Zip file to copy to your other environments and import.

    I found an issue with this:

     

    Data file references are parameterized by the Export step; near the top of the XML files for each Job/transformation you will see a reference to "Data_Path_1" (and possibly others) and the value that will be used for it (them). ( I don't see any option to disable this).

     

    As is, the imported job will fail when it uses these references.

    One way to resolve this is to look for each reference in Spoon where you imported it... and remove those reference. (You might overlook some.)

    For me, its easier to use a text editor to find and replace each place this is used:

    in other words, replace

         ${DATA_PATH_1}/

    with

         (null)

    (the / is the HTML-encoded SLASH character used to separate path/folder names). It's addition is why setting the DATA_PATH_1 default value to NULL wouldn't be enough).

     

    After some experimentation, I found that the BEST way to export these was to use File, Save as (VFS).. even though its a pain to select the output location and file name.

    Even the obvious and easier "File, Export, To XML"  - mangles the type, length, and precision values in each Field section.

     

     

    TIP:

    A step that creates (and maybe that updates) a CMDB entry (and maybe a regular) Remedy form entry returns the Instance ID which can be used later on...

     

    This is really useful when you are creating a new CI as an Asset, and need to populate LifeCycleAttributes (such as Status) that aren't available/accessible in the CMDB entry directly.  Some of the OOB transformations (CI, CI-Express) show this.

     

     

    TIP:

    Consolidate "Set Constants" steps into one per Job or Transformation. You may want to use different text in different error log locations. You can't directly enter these on the Text Output steps (you have to use a field from the Stream). using a "Add Constants" step before each Text Output step clutters the layout; having these all in one place cleans up the layout and makes them easier to find/update and reuse in other Transformations.

     

    -----------------------------------------------------------

    Annoyance:

    no AR Lookup step

    (TIP:use CMDB Lookup or AR Input & Stream Lookup)

    If you can't use CMDB lookup for this, use an AR Input and Stream lookup combination.

    the AR Input step doesn't allow values from Stream Fields to be used in qualifiers (on reflection, maybe this makes sense. Input steps should be for getting a big bulk of data at one gulp, typically the start of the Transformation).

    if you have to use an AR Input Step feeding a Stream Lookup step, set the (static) qualifier to reduce the number of rows read (qualifiers for Production dataset, for instance) and only read in the columns that you are interested in.

    (to go even further, use a sort step and a Remove Duplicates step to reduce the data stored in memory)

     

    Annoyance:

    The AROutput step doesn't work properly to Modify (UPDATE) an existing entry on Join forms (it creates a new record instead, requiring ALL the required fields to be populated).

    (TIP:update the regular form(s) comprising the Join form OR use CMDBOutput, which does allow you to modify an existing record, even though all the CMDB stuff is Join forms).

    (This seems to be a limitation/design of the Merge activity. the good old Data Import tool has the same issue.

    BMC suggested that a workaround is to have workflow update the component forms. There are existing filters doing this on the AST: forms for Submit/Modify but not Merge actions.

    ).

     

    Annoyance:

    Jobs created with the AI Job Wizard may be overkill, Can't select Excel as a data source, only CSV (but a Job created using that wizard can be edited with Spoon and the CSV input step replaced with an Excel Input Step).

    .. and have TWO Jobs, one calling the other which calls the Transformation. (Not sure why). You can simplify this using Spoon, deleting one job, having the transformation called directly from the other.

     

    Annoyance:

    The DMT can't call an AI Job/Spoon Job with (multiple) spaces in the name. (So use names without spaces.) (Later I forgot, created a Job with ONE space between the several words, DMT launches it just fine. Hmmm. YMMV)

     

    Annoyance:

    Jobs created with the AI Job Wizard or directly in Spoon, both show up in the Atrium Integrator console,

    but still have to be "registered" to show up in the DMT Job Console for Add Step/Load - Parameters (Atrium Integrator Name).

    Registering a new Atrium Integrator Job for a Load Step:

        From the DMT Console, select Other Functions, Atrium Integrator Jobs

        (look at the existing entries for Transaction Asset or CMDB Express to get an idea of what's needed)

        select Create:

        Name MUST match that seen in the Atrium Integrator console/ Spoon Repository and should NOT have spaces.

        PATH must match where the Wizard put it - /servername_BMC.ITSM.CI.DATA/Update Asset Locations_BMC.CORE:BMC_BaseElement (unless you relocated it with Spoon)

        Attachment flag needs to be checked on THIS entry.

        The job LOGGING LEVEL can be set here...

        use the same Data Tag as Transaction Asset job (config-dmt)

        (generally create a Local DataSetID variable with the value BMC.ITSM.CI.DATA (for the BMC Asset sandbox)

     

     

    Annoyance:

    the visuals for the process flow in a Transformation (or Job) looks amateurish (Every time I see a complicated one, I think of Rube Goldberg).

    I tend to use long names (multiple words for Steps) and these tend to overlap, making them hard to read.

    I try to stagger the steps vertically so the step names don't overlap, hiding each other.

    Better would be to use short names, and populate the description field in the step (right-click, select "Edit step description"), this field MIGHT be used to generate documentation automatically.

     

    Annoyance:

    the Excel Input step defaults treat a Text field that looks like a number (has all digits) as a floating point with one decimal position. This means values such as "1000" etc get interpreted as  1000.0 and will not match a target field of "1000"  .

    TIP: on the Fields tab of the Excel Input step, set the Format for any text fields where this might be an issue, to "#" (number, NO decimal places).

    Also consider setting the Trim Type to "Both" for any text fields that might cause a problem if one or more non-printing characters (space, TAB, etc) were accidentally entered instead of leaving it empty.

    (probably worth doing for CSV Input and possibly other Input steps also).

     

    Debugging with parallel execution could be difficult. Recommend setting all steps to X1 and working with one row of data  until debugging is complete. If performance warrants, you can then enable "Number of Copies" for each step to 3 or more.

     

    I'd appreciate corrections, additional info, etc.