Share:|

These steps will speed up your analysis time for you and BMC support to get your resolution quicker.  In my case, I had BMC & Microsoft involved in troubleshooting same issue. There are mid-tier tuning documents and blank.  (try to find that doc)

 

These steps are for a mid-tier problem that is not too obvious based on log files.  If you running to such issue.  I would recommend the following step to be followed before opening a ticket with BMC.  There might be a customer who uses IIS instead of Tomcat.  Either way, these tips will help you get to the bottom of the problem.

 

Tomcat Stuff:

  1. Edit the Java options on Tomcat with the following settings.  (The last 4 lines are to set thread dump or application profile for Tomcat)
    • -XX:PermSize=256m
    • -XX:+UseConcMarkSweepGC
    • -XX:+UseParNewGC
    • -XX:NewRatio=3
    • -XX:+HeapDumpOnOutOfMemoryError
    • -Dcom.sun.management.jmxremote
    • -Dcom.sun.management.jmxremote.port=8086
    • -Dcom.sun.management.jmxremote.authenticate=false
    • -Dcom.sun.management.jmxremote.ssl=false

 

     2.  Connect to your JMX remote.  You have to have JDK installed to do this part.

     3.  Schedule a restart of the mid-tier.  Wait for the problem to reoccur.

    • go to C:\Program Files\Java\jdk%version%\bin
    • Execute jvisualvm.exe
    • If you’re local, you can connect the port set 8086.  If your remote, you have named the host name
      • Once you are connected to the remote host. Create a JMX connection using the port number
      • Here you can validate the performance of the JVM and its settings etc.

 

Capture.PNG

 

4.  Create an application profile and thread dump for support.

5.  Create a BMC support ticket.

 

If you running IIS.  Then you would do a thread dump of IIS from task manager.

 

IIS stuff:

 

Microsoft has a very good section on the symbol and debugs tool.  The following KB article has the details.

https://support.microsoft.com/en-us/kb/919790

 

WebSphere stuff:

 

Here is a website that shows how to do API dump for WebSphere.  Here is tuning guide for WebSphere.

 

TCP/OS Tuning:

 

I found few things that help tune the mid-tier at the network layer.  Here some windows registry setting that improves TCP/IP performance.  These setting are well documented with Microsoft Technet.  In addition, you want to adjust your RX buffer on the NIC.

 

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\inetinfo]

 

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\inetinfo\Parameters]

"ObjectCacheTTL"=dword:00000030

"MemCacheSize"=dword:00002048

"MaxCacheFileSize"=dword:00000256

"PoolThreadLimit"=dword:00000010

"ListenBackLog"=dword:00000250

"SynAttackProtect"=dword:00000000

"TcpTimedWaitDelay"=dword:0000001e

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server]

 

      

 

Here is TCP/OS tuning for different favor UNIX and Windows:  Operating System Tuning

 

Here is power shell script that looks at port activity using netstat -ano. This is for windows.  http://www.core-admin.com/portal/kb-24032014-001-dealing-with-time-wait-exhaustion-no-more-tcp-connections

7-1-2015 2-50-21 PM.png

Here way to do the same thing in UNIX for port/socket status.

netstat -n

Port Exhaustion does occur believe it or not.  There is lots of documentation on the internet about it.  Most network people will argue against this notion happening on our network and pointed to the application or the web server causing the problem based on resource limitation.  Moving on with how to gather information for vendor support to resolve your issue quickly.

 

Adjusting TCP Settings for Heavy Load on Windows

 

The underlying Search architecture that directs searches across multiple physical partitions uses TCP/IP ports and non-blocking NIO SocketChannels to connect to the Search engines. These connections remain open in the TIME_WAIT state until the operating system times them out. Consequently, under heavy load conditions, the available ports on the machine running the Routing module can be exhausted.

 

On Windows platforms, the default timeout is 120 seconds, and the maximum number of ports is approximately 4,000, resulting in a maximum rate of 33 connections per second. If your index has four partitions, each search requires four ports, which provides a maximum query rate of 8.3 queries per second.

 

(maximum ports/timeout period)/number of partitions = maximum query rate

 

If this rate is exceeded, you may see failures as the supply of TCP/IP ports is exhausted. Symptoms include drops in throughput and errors indicating failed network connections. You can diagnose this problem by observing the system while it is under load, using the netstat utility provided on most operating systems.

 

To avoid port exhaustion and support high connection rates, reduce the TIME_WAIT value and increase the port range.  Note: This problem does not usually appear on UNIX systems due to the higher default connection rate in those operating systems.

 

To set TcpTimedWaitDelay (TIME_WAIT):

 

  1. Use the regedit command to access the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\ Services\TCPIP\Parameters registry subkey.
  2. Create a new REG_DWORD value named TcpTimedWaitDelay.
  3. Set the value to 60.
  4. Stop and restart the system.

 

To set MaxUserPort (ephemeral port range):

 

  1. Use the regedit command to access the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\ Services\TCPIP\Parameters registry subkey.
  2. Create a new REG_DWORD value named MaxUserPort.
  3. Set this value to 32768.
  4. Stop and restart the system.

 

Need to write a section about mid-tier server.xml options.

 

Let's move the discussion to create API dump for windows & JVM.  I am assuming that you know how to read the dump

 

Windows API Dump:

 

00000000`2faee630 00007ff8`0d690f15 mswsock!WSPSelect+0x7e9

00000000`2faee7d0 00000000`50f74d93 ws2_32!select+0x1f9

00000000`2faee8c0 00000000`50f73b33 net!NET_Timeout+0x73

00000000`2faeeb30 00000000`01598b50 net!Java_java_net_SocketInputStream_socketRead0+0xdb

 

Java API Dump:

 

"http-bio-80-exec-381" - Thread t@668

java.lang.Thread.State: RUNNABLE

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(Unknown Source)

        at java.net.SocketInputStream.read(Unknown Source)

 

   

 

If you notice what's highlighted in red for Windows dump "socketRead" and it's repeated in the Java Dump has "socketRead0."  If both API dump confirms a pattern.  Then you can begin to ask why am I getting this "sockeread" at the API level or vendor support?  Thus, you can get to the bottom of the issue quicker or escalation become seamless.  Having information above on re-occurs problem when support isn't available to help support you or your team understand the issue can be frustrating at sometime.  Having some of knowledge above can help elasticated the issue quicker to expert that deals with Tomcat or IIS.  (3rd level support or developer)

 

Now this part is above and beyond for most of you.  There is sniffer tool call wire shark and Windows network Monitor.  These tool can be helpful to you and your network team.  Your ability to capture the issue while it's occurring at this level also help everyone to understand the problem better alone with the API dumb.

 

 

More edits to come...