CleanXML: Perl script to cleanly indent XML

## History ##

 

[Version 1.2]

 

- It seems certain versions of Perl's XML Parser.pm are returning the error "xml declaration not at start of external entity...." on XML documents starting with  "<?xml version="1.0" encoding="UTF-8"?>". Added a regexp to strip this line prior to sending it to the parser.

 

[Version 1.1]

 

- First released version

 

 

 

Ever caught yourself adding indents to an XML structure you captured in a process transaction somewhere, just so you can more easily build your XPATH transformations? This morning I did just that ... again.

 

While trying to troubleshoot an issue with a rule on a workflow listening for VMWare events, I had configured the workflow to send me a copy of the incoming input-event so I could have a better look at its structure. And Oh-Boy, was it ugly. Here is just a snippet:

 

<adapter-event>

<source-adapter>Adapter_VMWare_Monitor</source-adapter>

<event>VMware Events Monitor adapter:</event> <data> <vmware-monitor-event> <returnval> <version></version> <changeSet> <name>latestPage[37425704]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425705]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425706]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425707]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425708]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425709]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425710]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425711]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425712]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425713]</name> <op>remove</op> </changeSet> <changeSet> <name>latestPage[37425714]</name> <op>add</op> <UserLogoutSessionEvent> <key>37425714</key> <chainId>37425714</chainId> <createdTime>2012-03-05T16:26:44.481262Z</createdTime>

<userName>userid</userName>

......

 

I started off copy'ing and pasting the structure in my favorite text-editor and was a couple minutes into manually indenting it when I realised this is just a tremendous waste of time. You see, this wasn't the first time I was doing this and most likely wouldn't be the last.

 

So instead I spend about 45 minutes on writing a perl script to do just that. Nicely indenting the XML for easier reading.

 

Isn't this much easier on the eyes?:

 

<adapter-event>

   <source-adapter>Adapter_VMWare_Monitor</source-adapter>

   <event>VMware Events Monitor adapter:</event>

   <data>

      <vmware-monitor-event>

         <returnval>

            <version></version>

            <changeSet><name>latestPage[37425704]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425705]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425706]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425707]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425708]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425709]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425710]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425711]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425712]</name><op>remove</op> </changeSet>

            <changeSet><name>latestPage[37425713]</name><op>remove</op> </changeSet>

            <changeSet>

               <name>latestPage[37425714]</name>

               <op>add</op>

               <UserLogoutSessionEvent>

                  <key>37425714</key>

                  <chainId>37425714</chainId>

                  <createdTime>2012-03-05T16:26:44.481262Z</createdTime>

                  <userName>userid</userName>

 

 

Attached to this document you will find the perl script I wrote. It is called CleanXML.pl and requires the XML::Twig module.

 

cleanxml1.png

 

-input is required and should point to your input XML document.

 

-output is optional. When provided, the indented XML will be written to this output document. When omitted, it will be printed to the console instead.

 

For example:

 

cleanxml2.png

or

 

cleanxml3.png

 

Enjoy!

 

   Richard