Share:|

There are several different problem symptoms that may be visible when the BCO file system fills up on the BCO Application Server (AS) or a BCO ETL Engine (EE) server.

If e-mail alerting is enabled, the most obvious symptom will be e-mail error messages from BCO reporting a failure of the Local Monitoring task:

  • Caplan Scheduler *** ALERT mail *** Task Local monitoring for Default [50] completed with 1 errors
    Filesystem full clean up

 

If e-mail alerting isn't enabled other symptoms include:

  • BCO Analysis and Predict reports failing to generate output with unexpected errors (such as the time filter not covering a period that contains data when a quick analysis shows that it does)
  • Analysis failing with the error, "java.io.IOException - message: No space left on device to write file: /path/file

 

I broke this post down into 4 sections with the intent that it would make it easier to read and perform - if you needed. This is the cleanup procedure after a filesystem full problem and requires that cleanup be performed on the AS and EE servers. The cleanup of these files is not to reduce space usage - it is to fix the BCO Scheduler and or /Datahub if a file system full condition has corrupted their working set files and now they are failing. So, removing these files really won't reduce the disk space consumption all that much - but it can fix problems the file system full condition caused.

 

Section 1: On AS, stop all the components and clean up the corrupted runtime files

 

1)     Access AS via ssh as BCO OS user.

2)     Change directory to BCO home folder (usually it's /opt/cpit)

                cd /[BCO Installation Directory]
3)     Try to stop BCO DWH in a clean way
               ./cpit stop datahub
4)     Wait five minutes or until the countdown ends
5)     Check that there are no other DWH jboss processes stuck
             ps –ef | grep jboss
6)     Issue a kill -9 $PIDNUMBER for every remaining joss
7)     Check there are no run.sh stuck:
               ps –ef | grep run.sh
8)     Issue a kill -9 $PIDNUMBER for every remaining run.sh
9)    Execute these commands (path relative to the base BCO Installation Directory):
  • BCO 9.5
rm  -rf datahub/jboss/server/default/data/kahadb/*
rm -rf datahub/jboss/dlq_messages/*
rm -rf datahub/jboss/server/default/tmp/*
rm -rf datahub/jboss/server/default/data/tx-object-store/*
  • BCO 9.0                   

rm -rf datahub/jboss/server/all/data/kahadb/*

rm -rf datahub/jboss/dlq_messages/*

rm -rf datahub/jboss/server/all/tmp/*

rm -rf datahub/jboss/server/all/data/tx-object-store/*

 

10)      Stop AS scheduler

                    cd /[BCO Installation Directory]
                    ./cpit stop scheduler

11)     Check there are no other Schedulers stuck

                   ps -ef | grep scheduler

12)     Issue a kill -9 $PIDNUMBER for every remaining scheduler

13)     Clean up the Scheduler task folder on the AS

  • BCO 9.5, 9.0

rm -rf scheduler/task/*

rm -rf scheduler/mif/notdelivered/

rm -rf scheduler/localdb/*

14)      Stop AS datacuum

cd /[BCO Installation Directory]

./cpit stop datacuum

15)     Check there are no other dataccums stuck:

     ps -ef | grep datacuum

16)      Issue a kill -9 $PIDNUMBER for every remaining dataccum

 

Section 2: Access EE using ssh as BCO user

 

1)     Stop EE Scheduler

cd /[BCO Installation Directory]

./cpit stop scheduler

2)     Check there are no other schedulers stuck on the EE:

ps –ef | grep scheduler

3)     Issue a kill -9 $PIDNUMBER for every remaining scheduler on the EE

4)     Clean up the scheduler tasks configurations folder on the EE

  • BCO 9.5

rm -rf scheduler/task/*

rm -rf scheduler/mif/notdelivered/*

rm -rf scheduler/localdb/*

  • BCO 9.0

rm -rf scheduler/task/*

rm -rf scheduler/mif/notdelivered/*

rm -rf scheduler/localdb/*

5)      Stop EE dataccum

                cd/[BCO Installation Directory]

                 ./cpit stop datacuum

6)     Check to see if any other datacuums are stuck

ps - ef | grep datacuum

7)     Issue a kill -9 $PIDNUMBER for every remaining datacuum

 

Section 3: Restart Components

 

1)     Restart the components you stopped to restore the functionality ON BOTH MACHINES

2)     Run the "Component status checker" task
3)     Wait a minute and then Access Administration > System > Status to see the status

 

Section 4: Check ETL and Chain Status

 

1)     Check the Administration >SCHEDULER >ETL tasks page and in Administration >SCHEDULER >System tasks for RUNNING TASKS

           that might have been stuck

2)     Take note of their ids and then force them to be ended over BCO DB:
          update task_status set status= 'ENDED' where taskid in (XX,XX2,XX3);

 

This article can be found in it's entirety, including steps for BCO 4.5 and 4.0 at the BMC Support site knowledge base as KA350370, Steps to recover BCO functionality after the AS or EE file system has become 100% full.

 

We hope you find this article informative - and also hope you never have to use it

 

timo