1 2 Previous Next 15 Replies Latest reply on Dec 11, 2018 3:08 PM by Vinnie Lima

    Rollback callouts on deployment failure

    William Thomas

      I have a problem with service rollbacks when an error in deploying a service (or even a single server) in CLM 4.6.05 occurs. During deployment, several AO callouts are run. What should happen is, if at any stage during deployment an error occurs, then a set of decommissioning callouts should trigger and rollback the infrastructure to the point it was at before the attempt to deploy the service. For example, if a service request completed the allocation of storage but the VM was not moved to the correct designated container, then the rollback callouts initiated should include unregistering the service in DNS and releasing the reserved IP address(es).

       

      The problem is that no matter how or when or at what stage a service deployment fails, the only callout that runs is the first in the sequence; to delete the DNS entry. Consequently, other decommissioning activities such as reallocating storage, updating the IP address pool, server naming convention for the next server, etc., must then be done manually. Note that this happens at whatever stage a service deployment fails; if it fails, the only rollback is to delete the DNS entry.

       

      On a perhaps related note, what is the difference between ComputeContainer_DECOMMISSION and ComputeContainer_DELETE?

       

      I have tried both operations in my callouts and have resulted in the same actions (and inactions). What are the differences between the two and when should I use one as opposed to the other (in this particular use case as well as generally)?

       

      It should be emphasized that most service deployments are successful. Moreover, I can manually run the decommission process and it works without issue. The issue I am dealing with is when the deployment is unsuccessful, then the rollback callouts are not running successfully.

        • 1. Re: Rollback callouts on deployment failure
          Devendra Dehadaraya

          For Rollback callout, what is the operation to which you have attached the workflows?

           

          During decommission/rollback case, why is there a need to hook different workflows? Can the 1st workflow in sequence take care of cleaning up?

           

          This is just to understand, the problem and given the behavior wont change, trying to think if things can be integrated bit differently.

          • 2. Re: Rollback callouts on deployment failure
            William Thomas

            Thanks, Devendra. I cannot answer many of the "why" questions because I am just taking over this CLM project: the details and design decisions were made months and even years before, many by individuals who are no longer available. I am not saying I disagree with the decisions made, and indeed generally agree with the decisions made, but only that for some decisions I am not privy to the reasoning behind them. Having said that:

             

            There are actually three rollback callouts which are run in order: 1) Delete DNS, 2) Remove Citrix access, and 3) Mark hostname deleted. Each workflow initially had a post operation of ComputeContainer_DELETE; which I changed to ComputeContainer_DECOMMISSION in my first attempt to resolve the issue. As a result, nothing changed; what worked still worked and what did not work still did not work. Afterwards, in troubleshooting/panic mode, I tried substituting various class operations and combinations such as VirtualGuest_DESTRUCTOR and VirtualGuest_OFFBOARD in various mix-and-match combinations. Again, the results did not change.

             

            It should be noted that our service deployment consists of seven different callouts. Ideally, I would like to change that except that service deployments are working and, in fairness, there may have been (and are?) valid reasons to use multiple callouts for every service deployment. Again, it works. If I completely rewrite and combine the service deployment callouts into one, then would I think if anything in that one callout fails then the entire deployment would rollback. That is what I want but, again, that is not within the criteria I have been given. Therefore, the need for different workflows.

             

            Hope this helps explain my problem and use case. Despite everything, I do think using the seperate operations should work but I admit that I am flummoxed.

            • 3. Re: Rollback callouts on deployment failure
              Vinnie Lima

              Here you go:

               

              Executing AO Callout if provisioning fails

               

              You're welcome

              1 of 1 people found this helpful
              • 4. Re: Rollback callouts on deployment failure
                William Thomas

                Thanks, Vinnie!

                 

                That seems to be exactly what I need, with a possible caveat or two. I hate to be picayune but where or what is the ComputeContainer_Destructor callout, is it the ComputeContainer_DECOMMISSION or the ComputeContainer_DELETE or something else?

                 

                I ask because I have tried both the ComputeContainer_DECOMMISSION and ComputeContainer_DELETE callouts and neither seemed to work for me. Of course, I could very well be misusing both and/or there is a problem with my callout but I just want to confirm that we are talking about the same one, ComputeContainer_Destructor (which I don't see).

                 

                And a related question, is there anything "special" or additional that you had to do to initiate the rollback callouts?

                 

                But your use case and answer seems to be exactly what I need. Thanks, again!

                • 5. Re: Rollback callouts on deployment failure
                  Vinnie Lima

                  Good question.  I actually did not get to implement this, and depending on version of CLM (I assume you are using 4.6.X), it may have changed.

                   

                  I would look at the registered providers and see which ones are available for "ComputeContainer".  Then it may be that you have to test each one to see if you get desired result.

                  • 6. Re: Rollback callouts on deployment failure
                    Devendra Dehadaraya

                    On a high level -  ComputeContainer_Destructor is called when there is a failure in a flow.

                     

                    ComputeContainer_Decommission is called in a decommissioning flow.

                     

                    ComputeContainer_DELETE may not work. I will not be suggesting to use this operation.

                    • 7. Re: Rollback callouts on deployment failure
                      William Thomas

                      I may be missing (or misunderstanding) something very simple. I don't see ComputeContainer_Destructor. How do you call it?

                      • 8. Re: Rollback callouts on deployment failure
                        Devendra Dehadaraya

                        ComputeContainer_Destructor doesnt seem to be in the list. You can try on ResourceSet_Destructor

                        1 of 1 people found this helpful
                        • 10. Re: Rollback callouts on deployment failure
                          William Thomas

                          ResourceSet_DESTRUCTOR did not work but so far, using ResourceSet_BULKDESTRUCTOR seems to work but I haven't been able to test it against all possibly relevant failure scenarios yet. This was actually suggested by BMC Support and I am thinking this callout may be relatively new as I don't remember seeing it before. Or, more likely, I overlooked it. But again, so far this looks like the answer.

                           

                          Please let me know how it fares in your use case scenario.

                          • 11. Re: Rollback callouts on deployment failure
                            Devendra Dehadaraya

                            ResourceSet_BULKDESTRUCTOR calls the ResourceSet_DESTRUCTOR flow. Not sure where exactly is your failure. It seems that the failure in your case comes around resourceset create.

                             

                            In a  ResourceSet_DESTRUCTOR you get each resourceset at a time while the bulk picks the whole list that's the only difference.

                            • 12. Re: Rollback callouts on deployment failure
                              Anuvind Tiwari

                              A relatively late response but we have a similar use case where we need to track the provisioning failures and perform some action in case of failures. We are on CLM version 4.6.06.

                              I can see the ResourcSet_BULKDESTRUCTOR getting called in case of few failures (Not sure if this covers all the failing scenarios) but I am unsure what value in the input CSM XML will trace back to the hostname which has failed!

                               

                              I have tried the AO post callouts on the ServiceOfferingInstance_Delete. This gets called in case of failure (again not sure if this will cover all the failure scenarios). This seems to send the SOI ID in the input CSM XML and we can use this to fetch the hostnames/other details etc.

                              • 13. Re: Rollback callouts on deployment failure
                                William Thomas

                                Using both ComputeContainer_DECOMMISSION and ComputeContainer_DELETE together works as "Post-" action callouts works. Note that using them separately did not work. It's on me that I did not try them together before: got the solution from BMC Support. Interesting that both were required.

                                 

                                By the way, ResourceSet_BULKDESTRUCTOR would not work (or at least, did not work for me) in all use cases because it does not not include all the required subclasses.

                                1 of 1 people found this helpful
                                • 14. Re: Rollback callouts on deployment failure
                                  Devendra Dehadaraya

                                  Destructor call depends upon at what point the request in-flight has failed.

                                  1 2 Previous Next