child_info_fork error detection script

Version 3

    For the interested, I have written a Perl script that lets me detect child_fork_errors that sometimes happen on application servers, and to isolate which application is having the issue and the job that triggered the error.


    It's meant for environments using Oracle Databases, and you will need to download and unzip the Oracle Instant Client binaries for it to work. It would probably be easily adapted to do the same on MSSQL databases as well.


    Simply replace the variable values at the beginning of the script to match your environment:


    #! perl
    use strict;
    use warnings;
    use DBI;
    use Data::Dumper;
    my $db = 'DB_SERVER_HOSTNAME';
    my $servicename = 'SERVICE_NAME';
    my $username = 'USERNAME';
    my $password = 'PASSWORD';
    my $schema = 'SCHEMA_NAME';
    my $dbh = DBI->connect( "dbi:Oracle://$db:1521/$servicename", $username, $password ) or die($DBI::errstr . "\n");
    my @header = (
        "Event Date",
        "Job Run ID",
        "Job Group",
        "Job Name",
        "App Server"
    my %job_runs = ();
    my $sql = "select jre.job_run_id, g.group_path,, jre.server_name, to_char(jre.event_date,'YYYY-MM-DD HH24:MI:SS') from $schema.job_run_event jre
    inner join $schema.job j on j.job_id = jre.job_id and j.job_version_id = jre.job_version_id
    inner join $schema.v_blgroup_paths g on g.group_id = j.group_id
    where message like '%child_info_fork%' and jre.server_name is not null";
    my $sth = $dbh->prepare($sql);
    while (my @row = $sth->fetchrow_array()) {
        for my $i (0 .. $#row) {
            $row[$i] = '' if (!defined($row[$i]));
        my ($job_run_id,$group_path, $job_name, $target, $event_date) = @row;
        $job_runs{$job_run_id}{'group_path'} = $group_path;
        $job_runs{$job_run_id}{'job_name'} = $job_name;
        $job_runs{$job_run_id}{'targets'}{$target}{'event_date'} = $event_date;
    foreach my $job_run_id (keys %job_runs) {
        $sql = "select message from $schema.job_run_event where job_run_id = $job_run_id and server_name is null and message like 'Executing %'";
        $sth = $dbh->prepare($sql);
        while (my @row = $sth->fetchrow_array()) {
            if ($row[0] =~ /^.+Server:(.+);.+on application server: (.+)$/) {
                $job_runs{$job_run_id}{'targets'}{$1}{'app_server'} = $2 if defined($job_runs{$job_run_id}{'targets'}{$1});
    #Event Date,Target,Job Run ID,Job Group,Job Name,App Server
    print join(",",@header)."\n" if (%job_runs);
    foreach my $job_run_id (keys %job_runs) {
        foreach my $target (keys %{$job_runs{$job_run_id}{'targets'}}) {
            my $event_date = $job_runs{$job_run_id}{'targets'}{$target}{'event_date'};
            my $job_name = $job_runs{$job_run_id}{'job_name'};
            my $job_group = $job_runs{$job_run_id}{'group_path'};
            my $app_server = $job_runs{$job_run_id}{'targets'}{$target}{'app_server'};
            print "$event_date,$target,$job_run_id,$job_group,$job_name,$app_server\n";


    The output generated is a CSV list of all instances of child info fork errors, along with the date, target name, job and application server it happened on. This is especially useful to check which application server is affected and needs a restart (or rebase) and which don't, as it's often only one of the app server in a pool that has the issue and not the whole environment.


    Note, the script is dependent on a custom view named v_blgroup_paths that I have created in our environment to list group paths. The SQL for the view is this:


    WITH group_paths (group_id, group_path)
    (SELECT g.group_id, cast('/' || as varchar2(1000)) AS group_path
        FROM blgroup g
        WHERE parent_group_id=0
        UNION ALL
        SELECT g.group_id, cast(group_path || '/' || as varchar2(1000))
        FROM blgroup g
        INNER JOIN group_paths gp
            ON g.parent_group_id = gp.group_id
    SELECT group_id, group_path
    FROM group_paths