1 2 Previous Next 17 Replies Latest reply on Jan 26, 2016 4:32 AM by Daniel Sailer

    JRMP Read timed out errors

    Santosh Kothuru

      We have been noticing "JRMP Read timed out errors" very frequently on our job servers in our environment and need to start the job server instance to fix the issue followed by errors sometimes. Does anyone see the same errors and know the issue why we see them frequently?

       

      I assume these errors caused by number of type 1 NSH jobs and would like to hear if there is any other reason around this issue.

       

      [28 Nov 2015 03:07:36,730] [Event-Transfer-Thread-4] [WARN] [::] [] error during JRMP connection establishment; nested exception is:

              java.net.SocketTimeoutException: Read timed out

      java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:

              java.net.SocketTimeoutException: Read timed out

              at sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)

              at sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)

              at sun.rmi.server.UnicastRef.newCall(Unknown Source)

              at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)

              at com.bladelogic.om.infra.app.service.directory.DirectoryServiceImpl.getRemoteObject(DirectoryServiceImpl.java:213)

              at com.bladelogic.om.infra.app.service.event.EventTransferThread.execute(EventTransferThread.java:86)

              at com.bladelogic.om.infra.app.service.event.EventTransferThread.execute(EventTransferThread.java:23)

              at com.bladelogic.om.infra.app.service.thread.BlBlockingThread.run(BlBlockingThread.java:95)

      Caused by: java.net.SocketTimeoutException: Read timed out

              at java.net.SocketInputStream.socketRead0(Native Method)

              at java.net.SocketInputStream.read(Unknown Source)

              at java.io.BufferedInputStream.fill(Unknown Source)

              at java.io.BufferedInputStream.read(Unknown Source)

              at java.io.DataInputStream.readByte(Unknown Source)

              ... 8 more

        • 1. Re: JRMP Read timed out errors
          Bill Robinson

          are your appservers having communication issues during this time ?

           

          does the load on the appservers go very high when these scripts run ?

           

          are there blcli calls in the type 1 scripts you are running ?

          • 2. Re: JRMP Read timed out errors
            Daniel Sailer

            Hi Bill

             

            I'm facing the same issue. I run a Type 1 NSH script (parallel against 100 servers) which gathers some information on the server (AntiVirus product and version) and then updates the builtin server property ANTIVIRUS accordingly.

            On BL 8.2 SP4 this used to work without an issue. After the BL 8.6 update 2 weeks ago, this job creates the above mentioned errors in the appserver log

             

            [05 Jan 2016 10:04:23,359] [Event-Transfer-Thread-3] [WARN] [::] [] error during JRMP connection establishment; nested exception is:

                    java.net.SocketTimeoutException: Read timed out

            java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:

                    java.net.SocketTimeoutException: Read timed out

                    at sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)

                    at sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)

                    at sun.rmi.server.UnicastRef.newCall(Unknown Source)

                    at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)

                    at com.bladelogic.om.infra.app.service.directory.DirectoryServiceImpl.getRemoteObject(DirectoryServiceImpl.java:213)

                    at com.bladelogic.om.infra.app.service.event.EventTransferThread.execute(EventTransferThread.java:86)

                    at com.bladelogic.om.infra.app.service.event.EventTransferThread.execute(EventTransferThread.java:23)

                    at com.bladelogic.om.infra.app.service.thread.BlBlockingThread.run(BlBlockingThread.java:95)

            Caused by: java.net.SocketTimeoutException: Read timed out

                    at java.net.SocketInputStream.socketRead0(Native Method)

                    at java.net.SocketInputStream.read(Unknown Source)

                    at java.net.SocketInputStream.read(Unknown Source)

                    at java.io.BufferedInputStream.fill(Unknown Source)

                    at java.io.BufferedInputStream.read(Unknown Source)

                    at java.io.DataInputStream.readByte(Unknown Source)

                    ... 8 more

             

            This actually results in the job cancelling:

            [05 Jan 2016 10:03:33,662] [Job-Execution-3] [ERROR] [XYZ:BLAdmins:] [NSHScript] Work item '' has been aborted. Reason: Application server: lXYZ went down during work item execution.

            And application servers crashing (no log in console.log spawner.log appserver.log and in Infrastructure not accessible, even though the processes still running.

             

            Now I can rewrite this script easily to loop itself (which will take longer) but I'd like to know why up to half of our app servers can crash because one single script job is run.

            • 3. Re: JRMP Read timed out errors
              Bill Robinson

              Because spinning up lots of concurrent blcli process (atleast one per target) affects the load on the appserver(s) running the WIT for the job.

              • 4. Re: JRMP Read timed out errors
                Yanick Girouard

                Just set the concurrency of the job to something your app servers can handle, such as no more than 10 up front. That should be fine for most cases.

                 

                We have a few jobs that must use blcli to set server property values (that have spaces in them so can't use the bulk method), and run against all servers daily with 10 in parallel, and our 4 app servers manage it just fine.

                 

                If I'd run the same in a type 1 NSH script loop, it would take forever...

                • 5. Re: JRMP Read timed out errors
                  Daniel Sailer

                  Well I understand that this is some load, but I'm confused because:
                  a) it used to work in 8.2
                  b) we have 6 app servers, so there would be room to spread the load

                  c) it should not result in all 6 app servers being down and not recovering (having to restart the appserver) because of a single job.

                   

                  I did some tests now, and I can run the same NSH Type 1 script job against 200 servers in parallel with no problems, as long as I have the last line commented out, which does the server property update via blcli_execute.

                   

                  blcli_execute Server setPropertyValueByName "${SERVER}" "ANTIVIRUS" "${appname}${appversion}" >/dev/null

                   

                  This line will make the NSH process on the Linux app server to use 100% CPU and crash our environment.

                   

                  Surely that should not happen, don't you think?

                  • 6. Re: JRMP Read timed out errors
                    Santhosh Kurimilla

                    Daniel,

                     

                    Can you please try adjusting the JVM to execute the specific query? So that JVM HeapSize may be under control.

                     

                    blcli_setjvmoption -Xmx128m

                    • 7. Re: JRMP Read timed out errors
                      Santhosh Kurimilla

                      Also, please check the MaxHeapSize setting on your Linux Application Server.

                       

                      nsh

                      blasadmin -a show app MaxHeapSize

                      • 8. Re: JRMP Read timed out errors
                        Daniel Sailer

                        Hi Santhosh

                        Thanks for your reply. I will try the setjvmoption as soon as I can afford the environment to crash. My goal of this question is actually that no matter what any Linux / Windows or BL admin will come up in their script, that the whole environment is not crashing. So the setjvmoption is not the final solution, but it can show where the issue is.

                         

                        I have set the maxheapsize of the appservers to 6144 as described here:
                        https://docs.bmc.com/docs/display/public/bsadoc/Recommendations+for+Application+Servers+of+type+Job

                         

                        But I would have thought - since the spawner is enabled - that the _spawner would reduce the resource requirements?

                        "Spawning processes externally to the Application Server can be beneficial for memory management. Process spawning is primarily used for Network Shell Script Jobs and some types of extended objects."

                        But there are no recommendations for spawner? At least the log entries in spawner.log would suggest that the processes themselves are started via spawner.

                         

                        blasadmin now running against the following deployments: default, _template

                        default:

                        MaxHeapSize:6144

                        _template:

                        MaxHeapSize:6144

                        • 9. Re: JRMP Read timed out errors
                          Santhosh Kurimilla

                          Yes, spawner will be helpful for memory management. But, when it comes to blcli commands, I believe it consumes AppServer's memory.

                           

                          Can you please get the actual OS memory settings on your Linux Application Server?

                           

                          free -m

                          • 10. Re: JRMP Read timed out errors
                            Daniel Sailer

                            sure thing. but obviously that's now without big blcli load...

                             

                                     total   used   free sharedbuffers cached
                            Mem:      7982   7134    848      0    378   2239
                            -/+ buffers/cache:   4516   3466
                            Swap:     4031      0   4031
                            • 11. Re: JRMP Read timed out errors
                              Santosh Kothuru

                              I think you have to use type2 job for updating property value or else it utilizes more memory and please refer the link for more details on same. Resource usage of NSH Script Jobs - BMC Server Automation 8.6 - BMC Documentation.

                               

                              Also, you can use "bulkSetServerPropertyValues" instead of setPropertyValueByName.

                               

                              Server - bulkSetServerPropertyValues - BMC Server Automation Command Line Interface 8.6 - BMC Documentation

                              • 12. Re: JRMP Read timed out errors
                                Santhosh Kurimilla

                                So, you are using a max. of 6GB of 8GB memory on the server just for Application Server? When you are setting MaxHeapSize, it is only for the JVM Heap. As per my understanding, there would be additional memory usage by other processes of the Application server.

                                 

                                Also, please check with the SA and try to get the total memory used by non-BladeLogic processes on the server. So, you will get to know how you are crashing when your maximum HeapSize is used by blcil commands execution.

                                 

                                Overall, in my opinion, your memory settings may need to be reviewed and updated accordingly.

                                 

                                Bill Robinson, Sean Berry - may help you further on it.

                                • 13. Re: JRMP Read timed out errors
                                  Bill Robinson

                                  the blcli has its own jvm and therefore heap, it doesn't use the appserver's jvm.

                                   

                                  so - you have the parallelism setting which says how many possible instances of the job could startup (which is how many possible blcli jvms) and then how many actually spin up depends on how many available WIT there are in the env.  default heap is 256m for the blcli jvm.

                                   

                                  there's a hit to spinning up the jvm and that gets worse the more that are concurrently spun up.

                                   

                                  were you on the same infrastructure in 8.2 ?  same appserver memory settings and such ? same number of targets/parallelism/wit ?

                                  • 14. Re: JRMP Read timed out errors
                                    Sean Berry

                                    On an 8gb system, 6gb for the app server jvm leaves only 2gb for other processes, including the whole of the OS and any other external processes (like the process spawned, your backup and monitoring agents, etc.).  Even if blcli were configured to use 128mb per process, each application server could only execute a small handful of these processes before exhausting the available memory on these systems.  At 200 parallelism, there may easily be 30 of them launching per app server, consuming several GB of memory.

                                     

                                    The best practice for most customers is to have at least 12 if not 16gb OS memory's per app server, and to limit the number of blcli processes executed in parallel.  If this is absolutely not possible, you should plan to allocate as much OS memory as would be needed for these processes.  However, I have found that some simple optimizations around blcli execution can save significant resources.

                                     

                                    Since we often are doing processes like this to set a property, why not roll up all of the property updates into a single blcli call, perhaps by writing out the property values to a common file, and setting them all at once?

                                     

                                    Sent from my iPad

                                    1 2 Previous Next