CleanXML: Perl script to cleanly indent XML

    ## History ##

     

    [Version 1.2]

     

    - It seems certain versions of Perl's XML Parser.pm are returning the error "xml declaration not at start of external entity...." on XML documents starting with  "<?xml version="1.0" encoding="UTF-8"?>". Added a regexp to strip this line prior to sending it to the parser.

     

    [Version 1.1]

     

    - First released version

     

     

     

    Ever caught yourself adding indents to an XML structure you captured in a process transaction somewhere, just so you can more easily build your XPATH transformations? This morning I did just that ... again.

     

    While trying to troubleshoot an issue with a rule on a workflow listening for VMWare events, I had configured the workflow to send me a copy of the incoming input-event so I could have a better look at its structure. And Oh-Boy, was it ugly. Here is just a snippet:

     

    <adapter-event>

    <source-adapter>Adapter_VMWare_Monitor</source-adapter>

    <event>VMware Events Monitor adapter:</event> <data> <vmware-monitor-event> <returnval> <version></version> <changeSet> <name>latestPage[37425704]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425705]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425706]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425707]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425708]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425709]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425710]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425711]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425712]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425713]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425714]</name> <op>add</op> <UserLogoutSessionEvent> <key>37425714</key> <chainId>37425714</chainId> <createdTime>2012-03-05T16:26:44.481262Z</createdTime>

    <userName>userid</userName>

    ......

     

    I started off copy'ing and pasting the structure in my favorite text-editor and was a couple minutes into manually indenting it when I realised this is just a tremendous waste of time. You see, this wasn't the first time I was doing this and most likely wouldn't be the last.

     

    So instead I spend about 45 minutes on writing a perl script to do just that. Nicely indenting the XML for easier reading.

     

    Isn't this much easier on the eyes?:

     

    <adapter-event>

       <source-adapter>Adapter_VMWare_Monitor</source-adapter>

       <event>VMware Events Monitor adapter:</event>

       <data>

          <vmware-monitor-event>

             <returnval>

                <version></version>

                <changeSet><name>latestPage[37425704]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425705]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425706]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425707]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425708]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425709]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425710]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425711]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425712]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425713]</name><op>remove</op> </changeSet>

                <changeSet>

                   <name>latestPage[37425714]</name>

                   <op>add</op>

                   <UserLogoutSessionEvent>

                      <key>37425714</key>

                      <chainId>37425714</chainId>

                      <createdTime>2012-03-05T16:26:44.481262Z</createdTime>

                      <userName>userid</userName>

     

     

    Attached to this document you will find the perl script I wrote. It is called CleanXML.pl and requires the XML::Twig module.

     

    cleanxml1.png

     

    -input is required and should point to your input XML document.

     

    -output is optional. When provided, the indented XML will be written to this output document. When omitted, it will be printed to the console instead.

     

    For example:

     

    cleanxml2.png

    or

     

    cleanxml3.png

     

    Enjoy!

     

       Richard