CleanXML: Perl script to cleanly indent XML

Version 4

    ## History ##


    [Version 1.2]


    - It seems certain versions of Perl's XML are returning the error "xml declaration not at start of external entity...." on XML documents starting with  "<?xml version="1.0" encoding="UTF-8"?>". Added a regexp to strip this line prior to sending it to the parser.


    [Version 1.1]


    - First released version




    Ever caught yourself adding indents to an XML structure you captured in a process transaction somewhere, just so you can more easily build your XPATH transformations? This morning I did just that ... again.


    While trying to troubleshoot an issue with a rule on a workflow listening for VMWare events, I had configured the workflow to send me a copy of the incoming input-event so I could have a better look at its structure. And Oh-Boy, was it ugly. Here is just a snippet:




    <event>VMware Events Monitor adapter:</event> <data> <vmware-monitor-event> <returnval> <version></version> <changeSet> <name>latestPage[37425704]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425705]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425706]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425707]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425708]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425709]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425710]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425711]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425712]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425713]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425714]</name> <op>add</op> <UserLogoutSessionEvent> <key>37425714</key> <chainId>37425714</chainId> <createdTime>2012-03-05T16:26:44.481262Z</createdTime>




    I started off copy'ing and pasting the structure in my favorite text-editor and was a couple minutes into manually indenting it when I realised this is just a tremendous waste of time. You see, this wasn't the first time I was doing this and most likely wouldn't be the last.


    So instead I spend about 45 minutes on writing a perl script to do just that. Nicely indenting the XML for easier reading.


    Isn't this much easier on the eyes?:




       <event>VMware Events Monitor adapter:</event>





                <changeSet><name>latestPage[37425704]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425705]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425706]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425707]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425708]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425709]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425710]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425711]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425712]</name><op>remove</op> </changeSet>

                <changeSet><name>latestPage[37425713]</name><op>remove</op> </changeSet>











    Attached to this document you will find the perl script I wrote. It is called and requires the XML::Twig module.




    -input is required and should point to your input XML document.


    -output is optional. When provided, the indented XML will be written to this output document. When omitted, it will be printed to the console instead.


    For example: