This week a follow-up on what we discussed last time: backchannel validation errors and network analysis. Get your Network Sniffer out because we’re going to look at some nasty TCP traffic.
Say you’re in ITSM going through your work orders. You’re assigning a few, closing a few others, and then suddenly that annoying error comes up: ARERR9350. It informs you it’s a network protocol or data error and popped up when you were apparently performing a data operation. Data operation, what does that mean? As far you were aware you were just sorting out some work orders. Good thing it also comes with a suggestion what to do next: it cheerfully informs you to contact your administrator, he will take care of the rest. But what if you are the administrator, then what?
I’m glad you asked. It’s a pretty tricky error, but if you understand why it happens, there are certainly a few things you can do about it. Last time we had a look at ARERR9351 – remember I explained they’re thrown when backchannel requests fail? I won’t go over this again, but in a nutshell, the error is thrown when HTTP requests for data or instructions sent in the background (known as backchannel requests) result in HTTP status codes with which the browser cannot work. We expect HTTP200, if it’s something different we throw ARERR9351. Anyway, I said enough on the subject last time, just read my last article if you want to know more.
ARERR9350 also relates to backchannel requests but this time we’re looking beyond the browser – to web server communication. Similar to ARERR9351, it’s an error check, but now the error check happens on the Mid-Tier server, not the browser. We’re sending a lot of backchannel requests between the browser and the Mid-Tier server, and of course need to check if the requests are okay from an HTTP perspective, but we also need to check if Mid-Tier can actually use it. It’s all very well that the web server likes it, but is it something that Mid-Tier can work with? What if the instructions in the request don’t make any sense? We need to let the end user know, so we throw ARERR9350.
I’m probably confusing you now, but it’s important you understand the difference between the two errors. They’re similar in nature, but have very different root causes. ARERR9351 is a check on the browser side, to validate the outcome of a HTTP request sent by the browser to the web server. We’re talking about HTTP requests here, we’re checking if the outcome of the request is correct and if the web server accepted it or not. Once a request is accepted by the web server, it is forwarded to the JSP engine which resides behind the web server. That’s where Mid-Tier comes in and that’s where our check happens.
See? We’re still dealing with backchannel requests, but now we’re on the Mid-Tier side. Mid-Tier will now pick up the backchannel request, extract it, process it, create the necessary API calls, send them to AR server, wait for the reply, format it into something the browser would understand and send it back via the web server. Quite a lot to do there, isn’t it? Better make sure the request is correct before we start.
Let’s have a look at an example. Say I have my simple test case again: a form with a table field. I click on the refresh button and a backchannel request is fired from the browser towards the web server. The web server accepts it and passes it on to the JSP engine. Mid-Tier’s server code comes into action and does the check. What does the browser want? Does it make sense? Is the request correct? This is my request on HTTP level:
This is Mid-Tier reporting it (I’m using Performance/Servlet logging):
10-Feb-2013 15:36:39 - WARNING (com.remedy.log.SERVLET) : (Thread 24) com.remedy.arsys.log.Log log Caught GoatException ARERR  Network protocol/data error when performing data operation. Please contact administrator.
at com.remedy.arsys.stubs.BackchannelServlet.doRequest(Unknown Source)
at com.remedy.arsys.stubs.GoatServlet.postInternal(Unknown Source)
It tries to validate it but doesn’t like it, and it reports the error. It needs to let the browser know the backchannel request failed, so it sends this HTTP response back:
HTTP200, so we won’t see that dreadful ARERR9351 and we know something else is going wrong here. The browser dutifully reports back the error.
Here’s the good news, since this all happens on the Mid-Tier side, it’s a lot easier to find the error as the error not only appears on the browser but is also logged by Mid-Tier. That’s good for us, we don’t need to start by getting HTTP logs, we can just review the Mid-Tier logs to start with (providing you enabled to the right categories of course).
It’s a good starting point, so let’s have a look in more details what’s going on. The first thing I’d do is isolate the full request in the Mid-Tier logs. I do this by using the thread ID. Every line in the Mid-Tier logs displays this ID (if you have the right log level of course) and all lines belonging to a specific backchannel request use the same thread ID. The IDs are recycled quite often, but I can use this to identify all the lines belonging to the request. In our example, it’s thread 24, so I look for everything in my log file that contains this thread ID.
My editor of choice is Notepad++, which allows me to do the kind of editing I need, but you use whatever works for you. I go back to my error line and look at the first line of the request. This is what I find:
To me, this doesn’t look right, and I will tell you why. Every backchannel request received by Mid-Tier is logged the same way: we log the URL, the session data (I left this out) and then all the parameters Mid-Tier finds, next we report when we’re done. So this is what I’d expect to find:
It doesn’t look anything like that, so that sets off some alarm bells right away. The only things Mid-Tier displays are the URL and the session information, so what happened to the parameters? If Mid-Tier doesn’t display the parameters, it’s probably because it didn’t receive them. That’s what’s going on here: Mid-Tier received a request but couldn’t find any parameters, so returned ARERR 9350.
But there’s a spanner in the works, I’ve been running tests with Fiddler to see what’s going on between the browser and the web server. You see, based on what I’m seeing in the Mid-Tier logs, I’d expect to see backchannel requests where the instructions are somehow missing.. Even after days of testing, however, I don’t see it.
Fiddler is great, but it has a weakness: it operates via a proxy, it redirects traffic from the browser to itself and then to the server. It tampers with the route, and that sometimes can have an influence on the end result. So what to do?
Enter my next favourite analyser: the network sniffer. Network sniffers like Wireshark capture traffic on the OS level – they use the network card, so there’s no little interference. The downside of this approach is the overheap of data you don’t really need – Wireshark will capture everything, and you’ll be surprised at the amount of network data that’s sent and received. Another problem is that, unlike Fiddler, there’s no easy way in Wireshark to deal with SSL data: by the time Wireshark logs it, it’s already encoded. There are ways around this, but it’s pretty tricky.
So let’s get Wireshark installed on one of the clients and let it run for a while until the error happens. Of course, we have Mid-Tier logging on the server enabled so I can cross reference if necessary. As soon as you start logging, you’ll notice the enormous amount of data that’s coming in. To make sure I only capture the right data, I add a server filter to make sure I capture only TCP traffic.
Still a lot of traffic coming in, but at least it’s making a bit more sense. I like to look at the data as TCP packages, rather than HTTP data. HTTP is an application layer protocol and runs on top of TCP, which is in the transport layer. I won’t go into too much detail, but let’s just say that HTTP is made up of TCP packages so when things fail and you don’t see any obvious reason, you might as well go directly to the building blocks.
The big question of course is, what are we going to do with these log files, how do you make sense of them? The first thing you need to do is identify a package containing the error. Easy enough: just use the Find function:
Notice I’m searching in the packet bytes. This is what I find:
Those are the actual TCP packages, but it’s a lot easier to read if you use the Follow TCP Stream function (right click on the line), which strings everything together so it’s lot more readable. Here’s our error:
Again, you have to keep in mind how these requests are supposed to look. What you’re looking at is the request sent by the browser (in red) and the answer sent by the web server (in blue). This is a POST request, so the request details (330/GETQBETableEntryList/1/325/…) are placed in the body of the request rather than the header. This is what I’d expect:
My method (POST), my destination (/arsys/…) and protocol (HTTP 1.1) are in the header, followed by the various parameters (accept, referrer, content-type). Next the body with my request details (we separate this by an empty line). Now check again what it looks like during this session and you’ll notice that the body of the request is sent too late. It’s delayed, by the time it finally arrives, the web server already started responding.
So what happens here is that Mid-Tier receives a request for a backchannel, but without any information – it doesn’t know what to do with it, so can only respond with ARERR9350. Is this Mid-Tier’s fault? Not really, it can’t really help if HTTP communication is broken up and one part is sent far too late.
Who’s to blame is difficult to say. You see, I happen to know that the body of a POST request is always sent in a separate TCP package, so it makes perfect sense that the web server first receives the request header and that the body follows. As we’re looking at TCP packages anyway, you can actually trace this in Wireshark using the frame information. I won’t go into this now as it’s quite a deep dive into TCP troubleshooting. There’s an easier solution anyway. Since it appears that individual TCP packages are getting lost in this particular network, I’d suggest to switch to using GET requests so we avoid the package separation.
I hope you‘ve borne with me here. I know I used a rather unusual example, but at least I had the chance to introduce Wireshark to you. If you ever need it, at least you get the basics. Here are a few common ones:
- proxy, load balancing or generic network problems (usually in combination with ARERR9351 errors)
- authentication problems resulting in expired sessions
- format problems in qualifications
- reporting errors (again usually in qualifications)
- JSP engine defects (usually in handing large packages)
- timeout errors when large amounts of data are requested
There are many reason why might get ARERR9350 errors, but keep in mind that understanding the backchannel request is key to understanding why the errors are thrown. Start with the Mid-Tier log and check what the backchannel request as captured by Mid-Tier looks like. If this doesn’t provide a clue, get some form of HTTP logging: Fiddler, Wireshark, web server logs, whatever works for you. Compare what’s sent to what’s logged by Mid-Tier and hopefully this will explain where the error comes from. And if it doesn’t? Well, you can always log a ticket with BMC Support and we’ll be happy to look into it.
Right, that’s all for now. Until next time,
- You can find previous Pulse blogs on Mid-Tier and related products in this document:
- Wireshark can be downloaded here: http://www.wireshark.org/