Seeing the same issues here too. I noticed this thread doesn't have any resolution information from any of the cases raised.
Is this because nobody was able to successfully resolve?
Our issue has the same symptoms but is isolated to a single scanning appliance (we have many) but the scan files fail to be transferred to BOTH of our consolidators.
These CORBA errors in our reasoning logs appear to coincide with the times that the consolidation scan files stop being pushed from scanner to both consolidators....
[tideway@ log]$ cat tw_svc_reasoning.log.2017-07-31 | grep ERROR
140503583876864: 2017-07-31 18:50:16,072: reasoning.ecacontroller: ERROR: preScan failure: DiscoveryCORBA.ExecutionFailure(status_code=-1, message='Command timed out')
140503646807808: 2017-07-31 18:50:16,073: reasoning.ecacontroller: ERROR: preScan failure: DiscoveryCORBA.ExecutionFailure(status_code=-1, message='Command timed out')
140503573387008: 2017-07-31 18:50:16,163: reasoning.ecacontroller: ERROR: preScan failure: DiscoveryCORBA.ExecutionFailure(status_code=-1, message='Command timed out')
140503667787520: 2017-07-31 18:55:16,892: reasoning.ecacontroller: ERROR: preScan failure: DiscoveryCORBA.ExecutionFailure(status_code=-1, message='Command timed out')
140503678277376: 2017-07-31 21:43:52,400: reasoning.ecacontroller: ERROR: preScan failure: DiscoveryCORBA.ExecutionFailure(status_code=-1, message='Command timed out')
140503646807808: 2017-07-31 21:45:52,318: reasoning.ecacontroller: ERROR: preScan failure: DiscoveryCORBA.ExecutionFailure(status_code=-1, message='Command timed out')
Whilst these may well be a red herring as these appear to display connection issues for prescans, I wondered if anyone has see these errors before and can translate these as to what actually may be failing
Nothing in this thread indicated that is was stuck.
Those messages can only be to do with scanning not consolidation. I am rather surprised you would get timeouts for that - by default it has a 15 minute timeout. Have you changed the scan timeout value on the discovery configuration page?
1 of 1 people found this helpful
Since you are bound on finding a root cause for your problem. Here is my 3pennies ... check Performance monitor and verify Hardware section.
I am finding that such a "stuck" runs (which really go at a VERY SLOW pace) are causing an excessive paging on our system. Well, translates in to an excessive IO to the Storage systems which in turn slows Consolidator Appliance to the point hat End Users can not log in ...
Only if I would find the way to cleanly - effectively discard such runs I would be very happy ...
Wishing you all the best
How much memory does your system have and how many ECA engines? If the system has insufficient memory then you end up with lots of paging which can very significantly impact performance. If you discover storage you need to provide more memory as storage systems are normally large and hence take more memory to scan.
There's no issues with scanning. This completes in normal time. We can scan a single IP with nothing else running and yet nothing is pushed to the consolidator........and we've tested the network connectivity and the test connection on the UI during this time which is fine
The issue is that the files just suddenly stop pushing to the Consolidator. And, we are seeing CORBA errors for pre-scans which, whilst not directly related....may be a symptom of the same root cause?
Can you recommend some local tests that we can do on the scanning appliance?
Have you looked at the hardware performance graphs / usr/tideway/log/performance.log.* to look at the load and paging on the machine. For pre-scanning to timeout I imagine the machine is running out of memory and doing a lot of memory paging spending all its time waiting to do useful work.
Hey Yan ..
We are finding ourselves in the identical situation every so often .. and for the same reason. Compacting our db takes time .. and then we have a bunch of catching up .. sometimes is will catch up .. but sometimes it does what it does to you ..
You have mentioned "CANCELING" process you implemented to get rid of it. Does this involve addressing content of the /var/persist/ .. subdirectories ? ;-)..
Could you pleas tell me what you do to Cancel them ? .. as we all know .. GUI Cancel Run option is worthless in such case .. right?
... thanks in advance for sharing ..
Hey Ravi ..
It is form the former BMC communication ... you may need to search for it in archives ...
".....A scan may appear to be hung if one or more IPs are processing a long-running pattern. Research of the reasoning logs may reveal the cause.
Otherwise, a possible workaround is to stop discovery ("stop all scans") and then restart scans.
If this does not work, a "forceful" workaround is to stop the appliance, delete the *.pq files in /usr/tideway/var/persist/reasoning/engine, and restart the appliance. We recommend contacting support prior to doing this. Note: deleting these files can leave the data store in an inconsistent state (this should clear up after a few scans)...."
Use it at your own risk ..
I learned about that method long time ago but treat it as a last resort..
Deleting pq files should only ever be a last resort. You can get inconsistent models which do not clear up after further scans.