13 Replies Latest reply on Jul 14, 2017 9:24 AM by Jon Trotter

    Error after upgrading from ADDM 10.1 to 11.0 (Discovery)

    Mark Scherf

      Hello,

       

      I performed an upgrade of ADDM 10.1 to 11.0. When I go to connect to ADDM, I receive a message in the browser saying "The appliance has been shut down, Appliance still unavailable".

       

      I connected to the appliance, stopped tideway, omniNames and appliance.

      Appliance and omniNames started ok but tideway gave the following error.

       

      [tideway@wlgvaddmprodopen ~]$ sudo /sbin/service tideway start

      Traceback (most recent call last):

        File "./control.py", line 219, in <module>

        File "./control.py", line 98, in main

        File "./naming.py", line 150, in resolveName

        File "/usr/tideway/lib/python2.7/site-packages/CosNaming_idl.py", line 415, in resolve_str

          return self._obj.invoke("resolve_str", _0_CosNaming.NamingContextExt._d_resolve_str, args)

      CosNaming.NotFound: CosNaming.NamingContext.NotFound(why=missing_node, rest_of_name=[CosNaming.NameComponent(id='Tideway', kind=''), CosNaming.NameComponent(id='ClusterManager', kind=''), CosNaming.NameComponent(id='ClusterManager', kind='')])

      [tideway@wlgvaddmprodopen ~]$

       

      I am not running ADDM as a cluster.

       

      Can anyone help with this?

       

      Thanks

       

      Mark

        • 1. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
          Yan De Wulf

          The "cluster" mention doesn't mean addm is running as a cluster. This looks like the "cluster services" are not running. Please run "sudo /sbin/service cluster start" before "sudo /sbin/service tideway start"  When I've run into it before, I found it was related to the "cluster manager service" not running and/or needing to be restarted to check run "ps -ef | grep cluster"  it should show:

           

          [tideway@XXXXXXXXX01 log]$ ps -ef | grep cluster

          tideway   6664     1  0 Feb24 ?        00:00:00 python /usr/tideway/python/cluster_manager/main.pyc --daemon start

          tideway   6665  6664  0 Feb24 ?        12:07:37 python /usr/tideway/python/cluster_manager/main.pyc --daemon start

          tideway  22497 21998  0 15:38 pts/1    00:00:00 grep cluster

          [tideway@XXXXXXXX01 log]$

           

          A couple more things:

          1) did server reboot? "uptime" on command line shows how long a server has been up since last reboot

          2) What's listed in the postupgrade log? can ve found w/ "ls -l /usr/tideway/log/* | grep postupgrade"

          • 2. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
            Mark Scherf

            Hi Yan,

             

            When I try to start the cluster, I get this

             

            $ sudo /sbin/service cluster stop

            Stopping local Discovery cluster services

                Stopping Cluster Manager service:                     

                Stopping Privileged service:                          

            $ sudo /sbin/service cluster start

            Starting local Discovery cluster services

                Starting Privileged service:                          

                Starting Cluster Manager service:                     

            Wed Jun 29 10:53:47 2016 : tw_svc_cluster_manager started.

            omniORB: (0) 2016-06-29 10:53:47.880340: Failed to bind to address :: port 25170. Address in use?

            omniORB: (0) 2016-06-29 10:53:47.880387: Error: Unable to create an endpoint of this description: giop:tcp::25170

            Traceback (most recent call last):

              File "./main.py", line 114, in

              File "./main.py", line 96, in main

              File "./clustermanager.py", line 12576, in init

              File "./clustermanager.py", line 775, in __init__

              File "/usr/tideway/lib/python2.7/site-packages/omniORB/CORBA.py", line 389, in resolve_initial_references

                return self._obj.resolve_initial_references(identifier)

            omniORB.CORBA.INITIALIZE: CORBA.INITIALIZE(omniORB.INITIALIZE_TransportError, CORBA.COMPLETED_NO)

            Wed Jun 29 10:53:47 2016 : Watchdog : Service failed in 0.1 seconds (signal 0; exit 1). Exiting.

            tideway: The Discovery service 'tw_svc_cluster_manager' failed to start (returned 1). Check logs.

             

             

            ps -eaf |grep cluster shows this

             

            tideway   1879     1  0 09:17 ?        00:00:00 python /usr/tideway/python/cluster_manager/main.pyc --daemon start

            tideway   1880  1879  0 09:17 ?        00:00:28 python /usr/tideway/python/cluster_manager/main.pyc --daemon start

            tideway  13338  7697  0 10:54 pts/1    00:00:00 grep cluster

            $

             

            Uptime shows 1hour 38 minutes

             

            The log file doesn't show anything

             

            $ ls -l /usr/tideway/log/* | grep postupgrade

            -rw-rr 1 root    root         130 Jun 29 08:35 /usr/tideway/log/postupgrade_11.0.0.1_TODO.log

            $ cat /usr/tideway/log/postupgrade_11.0.0.1_TODO.log

            Export Mapping Sets

            • 3. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
              Yan De Wulf

              last message of cluster start above shows  "Discovery service 'tw_svc_cluster_manager' failed to start" cluster mgr service is not happy.  what the output of "cat /usr/tideway/log/tw_svc_cluster_manager.out"?

              • 4. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                Mark Scherf

                Hi Yan,

                 

                The output is

                 

                $ cat /usr/tideway/log/tw_svc_cluster_manager.out

                Wed Jun 29 10:53:47 2016 : tw_svc_cluster_manager started.

                omniORB: (0) 2016-06-29 10:53:47.880340: Failed to bind to address :: port 25170. Address in use?

                omniORB: (0) 2016-06-29 10:53:47.880387: Error: Unable to create an endpoint of this description: giop:tcp::25170

                Traceback (most recent call last):

                  File "./main.py", line 114, in

                  File "./main.py", line 96, in main

                  File "./clustermanager.py", line 12576, in init

                  File "./clustermanager.py", line 775, in __init__

                  File "/usr/tideway/lib/python2.7/site-packages/omniORB/CORBA.py", line 389, in resolve_initial_references

                    return self._obj.resolve_initial_references(identifier)

                omniORB.CORBA.INITIALIZE: CORBA.INITIALIZE(omniORB.INITIALIZE_TransportError, CORBA.COMPLETED_NO)

                Wed Jun 29 10:53:47 2016 : Watchdog : Service failed in 0.1 seconds (signal 0; exit 1). Exiting.

                $

                 

                Regards

                • 5. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                  Yan De Wulf

                  Please try doing the following in that order:

                   

                  1. sudo /sbin/service tideway stop
                  2. sudo /sbin/service cluster stop
                  3. sudo /sbin/service omniNames stop
                  4. sudo /sbin/service appliance stop
                  5. ps –ef | grep python
                  6. find all python process PIDs
                  7. kill -9 <python process PIDs>
                  8. sudo /sbin/service appliance start
                  9. sudo /sbin/service omniNames start
                  10. sudo /sbin/service cluster start
                  4 of 4 people found this helpful
                  • 6. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                    Mark Scherf

                    Hi Yan,

                     

                    That worked. We are now back online. Thanks for your help.

                     

                    Regards

                    • 7. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                      Yan De Wulf

                      Cool, beans, like any application you just need to slap it around a little so it knows who boss   

                      • 8. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                        Chris Hughes

                        Right on!  LOL!

                        1 of 1 people found this helpful
                        • 9. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                          Yan De Wulf

                          Chris how does one mark a questions as being "Answered"?

                          1 of 1 people found this helpful
                          • 10. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                            Chris Hughes

                            Whoever asked the question (Mark Scherf in this case), can mark it as answered.  An admin can also mark the correct answer.  I'll give Mark a day or two to take action!

                            • 11. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                              Kapil Nemade

                              Hi Yan,

                              i followed the steps shared by you but still i am getting Appliance shut down error.

                              Status of all services mentioned below. I am not sure what exactly could have gone wrong. Any suggestion or guidance to be referred here.

                               

                              ADDM application services are running

                                  Security service:                        7521          [  OK  ]

                                  Model service:                           7575          [  OK  ]

                                  Vault service:                           7694          [  OK  ]

                                  Discovery service:                       7745          [  OK  ]

                                  Mainframe Provider service:              7820          [  OK  ]

                                  SQL Provider service:                    7905          [  OK  ]

                                  CMDB Sync (Exporter) service:            7992          [  OK  ]

                                  CMDB Sync (Transformer) service:         8083          [  OK  ]

                                  Reasoning service:                       8261          [  OK  ]

                                  Tomcat service:                          8996          [  OK  ]

                                  Reports service:                        10789          [  OK  ]

                                  Application Server service:             11366          [  OK  ]

                              • 12. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                                Marlon Balie

                                Hi Yan

                                 

                                How does one do this if you are running within a cluster environment?

                                 

                                I am trying to run the upgrade 11.1.0.5 and get these python errors.

                                • 13. Re: Error after upgrading from ADDM 10.1 to 11.0 (Discovery)
                                  Jon Trotter

                                  There are commands in ~/bin for cluster control to start and stop services for the cluster nodes.

                                   

                                  tw_cluster_control

                                   

                                    --become-coordinatorMake this machine the coordinator
                                    --cluster-start-services
                                            Start the services across the cluster
                                    --cluster-stop-message=MSG
                                            Message giving the reason for stopping the services across the cluster (implies --cluster-stop-services)
                                    --cluster-stop-services Stop the services across the cluster
                                    --fix-interrupted Unlock the system after cluster manager failure
                                    --force           Do not ask for confirmation
                                    -h, --help             Display help on standard options
                                    --loglevel=LEVEL   Logging level: debug, info, warn, error, crit
                                    -p, --password=PASSWD Password
                                    --passwordfile=PWDFILE  Pathname for Password File
                                    --remove-broken   Remove all broken members from the cluster
                                    --replace-vm-uuid Update expected VM UUID with current value
                                    --revert-to-standalone  Remove cluster configuration from this machine
                                    --show-members     Show all members
                                    --show-pending     Show pending changes
                                    -u, --username=NAME   Username
                                    -v, --version         Display version information

                                   

                                  There are also options in the UI under Administration > Control to restart services when clustering.