Since supporting the AR System Server component of the Remedy product suite, one thing I've come across time and again is reports of the arserver.exe or arserverd process leaking memory.
In nearly all cases, after investigating - the problem is due to Admin Changes causing a CopyCache action which can significantly increase the memory footprint of the process.
Some background information on why Cache is important to a Remedy application:
The AR Server end user application code and data is stored in the database and a significant part of that code is loaded into an in-memory cache on startup of the server.
The size of this cache varies with the amount of code in use (just AR Server, or AR Server + CMDB & ITSM etc) and the localisations installed. The size can range from several 100's of MBs to over 1GB.
During normal operation of the AR Server there are times when this cache has to be modified to install code updates and the process used by the application is to create a copy of the cache in memory and modify this. Once this is done the original cache is released.
It is during this recaching operation when most Malloc (ARERR 300) problems are observed.
Typically we see that the first time this recaching takes place is after an application restart - the allocation of memory during the copy cache is very quick, almost instantaneous. However, on each subsequent copy, the time taken to allocate the memory increases, sometimes taking in the order of several minutes.
The AR Server cache consists of many discrete elements and the copy process involves a large number of small allocations using malloc() to copy these. The allocation sizes vary but the majority are the 1-100KB range.
The AR Server has no specific memory management functions beyond those provided by the OS (malloc, free, calloc etc.).
Why is this important?
Caching is a fundamental part of AR Server operations.
To improve performance, the AR System server caches all forms and their related objects (all users and groups and active links, etc.) into memory at startup in order to access it faster. Each request for a form is retrieved from the cache and not physically from the database. This significantly improves performance for users accessing the AR Server.
Additionally, each OS has different limitations of how much memory a single process can use.
The AR Server processes concerned are:
- arserver.exe (Windows)
- arserverd (UNIX & Linux)
AR System 7.1.00 and earlier versions are 32-bit processes (7.5 on Windows was 32-bit also), so the maximum amount of memory that they can address is 4 GB. The actual limit, however, depends on variables such as the Operating System on which the server runs, the number of applications running on the same system, the amount of free memory available to the server, and the amount of memory that the Operating System uses for 'housekeeping'. Therefore, the actual limit might sometimes be much lower than 4 GB.
Once this limit is hit, memory related problems can be seen with the AR Server process.
Understanding all the above concepts is vital in order to understand why memory related problems can occur with AR Server.
One very important point to note is that the AR Server does NOT manage memory.
What is an Admin Change?
An Admin Change is a change made to a Server Object (Form, Filter, Active Link etc) - a code update on a live environment.
The key point is that an Admin Change triggers a CopyCache, which causes growth in the memory footprint of the arserver process.
How can you tell if an Admin Change is taking place?
Easy - you just need to add the following parameter to the ar.cfg/conf file:
And enable Thread logging. Once a CopyCache is triggered you'll see it in the Thread log. This is known as CopyCache logging.
(You will need to restart AR Server for the parameter to take effect)
Example entries in the Thread Log:
<THRD> /* Fri Jul 19 2013 16:15:03.0630 */ Defn change received for copycache: tid=3392 user=Demo
<THRD> /* Fri Jul 19 2013 16:15:03.0630 */ CopyCache: CreateNewCache tid=3392
<THRD> /* Fri Jul 19 2013 16:15:03.0630 */ CopyCache Begin: cacheId=1 rpcCallProc=63 user="Demo" tid=3392 rpcId=955 numCaches=2 maxNumCaches=2147483647, alloc=0 bytes
<THRD> /* Fri Jul 19 2013 16:15:03.2200 */ CopyCache End: new cacheId=2, old cacheId=1 threadCount=1 tids=3392
<THRD> /* Fri Jul 19 2013 16:15:03.2200 */ Cache Change Completed: tid=3392
<THRD> /* Fri Jul 19 2013 16:15:03.2200 */ Defn change received for copycache: tid=3392 user=Demo
<THRD> /* Fri Jul 19 2013 16:15:03.2200 */ CopyCache: existingAdminCache tid=3392
<THRD> /* Fri Jul 19 2013 16:15:03.2510 */ Cache Change Completed: tid=3392
<THRD> /* Fri Jul 19 2013 16:15:03.2510 */ Defn change received for copycache: tid=3392 user=Demo
<THRD> /* Fri Jul 19 2013 16:15:03.2510 */ CopyCache: existingAdminCache tid=3392
<THRD> /* Fri Jul 19 2013 16:15:03.2510 */ Cache Change Completed: tid=3392
<THRD> /* Fri Jul 19 2013 16:15:30.6100 */ Admin Cache Promoted: cacheId=2 rpcCallProc=0 user="NULL" tid=3124 rpcId=0 cacheIdToFree=1
<THRD> /* Fri Jul 19 2013 16:15:30.6100 */ FreeServerCache Begin: cacheId=1 rpcCallProc=0 user="NULL" tid=3124 rpcId=0, alloc=0 bytes
<THRD> /* Fri Jul 19 2013 16:15:30.6420 */ FreeServerCache End: cacheId=1 rpcCallProc=0 user="NULL" tid=3124 rpcId=0, alloc=0 bytes, diff=0 bytes
Observing the memory consumption of the arserver.exe/arserverd process
You can get a general picture of the memory consumption of the arserver.exe/arserverd process from native Operating System tools, however a better way to track this is via monitoring tools or using the following scripts as provided by Mark Walters
Both of which are attached to this blog post.
The output of this is intended for tracking memory over time and correlating changes with Thread and SQL activity.
Here's the results for the same time period as seen above in the Thread Log:
Fri 07/19/2013 16:14:31.40,3164,229,arserver
Fri 07/19/2013 16:14:41.65,3164,229,arserver
Fri 07/19/2013 16:14:51.82,3164,229,arserver
Fri 07/19/2013 16:15:01.98,3164,229,arserver
Fri 07/19/2013 16:15:12.14,3164,230,arserver
Fri 07/19/2013 16:15:22.29,3164,230,arserver
Fri 07/19/2013 16:15:32.45,3164,229,arserver
Fri 07/19/2013 16:15:42.59,3164,229,arserver
Fri 07/19/2013 16:15:52.75,3164,229,arserver
Fri 07/19/2013 16:16:02.95,3164,230,arserver
Fri 07/19/2013 16:16:13.11,3164,230,arserver
Fri 07/19/2013 16:16:23.25,3164,230,arserver
Fri 07/19/2013 16:16:33.39,3164,230,arserver
Fri 07/19/2013 16:16:43.53,3164,230,arserver
Fri 07/19/2013 16:16:53.70,3164,230,arserver
Fri 07/19/2013 16:17:03.87,3164,230,arserver
Fri 07/19/2013 16:17:14.01,3164,230,arserver
Fri 07/19/2013 16:17:24.17,3164,230,arserver
Fri 07/19/2013 16:17:34.39,3164,230,arserver
Fri 07/19/2013 16:17:44.59,3164,230,arserver
Fri 07/19/2013 16:17:54.75,3164,230,arserver
Fri 07/19/2013 16:18:04.87,3164,230,arserver
Fri 07/19/2013 16:18:15.01,3164,230,arserver
Fri 07/19/2013 16:18:25.15,3164,230,arserver
This was from a 7.5 AR Server with no CMDB or ITSM installed.
The below snippet is from an fully blown Remedy 8.1 environment, where the memory footprint of the arserver.exe process is far higher:
Mon 07/29/2013 10:52:34.73,2356,2133,arserver
Mon 07/29/2013 10:52:45.09,2356,2133,arserver
Mon 07/29/2013 10:52:55.34,2356,2133,arserver
Mon 07/29/2013 10:53:05.85,2356,2940,arserver
Mon 07/29/2013 10:53:16.48,2356,3511,arserver
Mon 07/29/2013 10:53:26.95,2356,3717,arserver
Mon 07/29/2013 10:53:37.58,2356,3780,arserver
Mon 07/29/2013 10:53:47.90,2356,3843,arserver
Mon 07/29/2013 10:53:58.15,2356,3841,arserver
Mon 07/29/2013 10:54:08.43,2356,3841,arserver
Mon 07/29/2013 10:54:18.70,2356,3841,arserver
Mon 07/29/2013 10:54:28.98,2356,3841,arserver
Mon 07/29/2013 10:54:39.27,2356,3841,arserver
Mon 07/29/2013 10:54:49.52,2356,3841,arserver
Mon 07/29/2013 10:54:59.77,2356,3841,arserver
Mon 07/29/2013 10:55:10.04,2356,3841,arserver
Mon 07/29/2013 10:55:20.28,2356,3841,arserver
Mon 07/29/2013 10:55:30.55,2356,3841,arserver
Mon 07/29/2013 10:55:40.83,2356,3841,arserver
Mon 07/29/2013 10:55:51.10,2356,3841,arserver
Mon 07/29/2013 10:56:01.39,2356,3841,arserver
Mon 07/29/2013 10:56:11.67,2356,3841,arserver
Mon 07/29/2013 10:58:17.43,2356,3841,arserver
Mon 07/29/2013 10:58:27.70,2356,3841,arserver
Mon 07/29/2013 10:58:37.96,2356,3841,arserver
Using the above information, along with the CopyCache logging (in the Thread log) will help you identify at what times the memory footprint of the arserver.exe/arserverd process was increasing and if there is a direct correlation with Admin Changes.
What is Cache-Mode?
There are two cache modes that you can set for your server operation:
- Production Cache Mode (Cache-Mode: 0)
- Development Cache Mode (Cache-Mode: 1)
Production mode is the default and is appropriate when operations by application users should not be delayed by administrative operations or when there is a large number of active application users. In this mode, administrative operations cause the server to create an administrative copy of its cache (CopyCache), so that other users can continue using the shared cache while administrative operations are carried out.
In development mode, administrative operations do not cause a new copy of the cache to be made; instead they lock other users out of the shared cache and wait for users currently accessing that cache to complete their operations before performing changes.
Escalation and Archive threads must also complete their operations before Admin thread changes can be completed. Therefore, potentially long running tasks like escalations are not compatible with Admin thread changes in this mode and can lead to long delays.
Refer to KA366897 and the new functionality involving the parameter Long-Running-Escalation-Logging-Threshold:
Access to the shared cache is restored when the administrative operation is complete. No time is spent copying the cache, so operations have a smaller memory footprint and are performed faster than in production mode.
Development mode is intended for servers whose main purpose is application development. It is unsuitable for servers with a large number of application users in a production environment because the operations of those users are blocked when forms and workflow are changed, especially when many structures are changed, such as when importing an application.
To set the cache mode:
- In a browser or BMC Remedy User Tool, open the AR System Administration Console, and click System -> General -> Server Information. The AR System Administration: Server Information form appears.
- Click the Configuration tab.
- Select or clear the Development Cache Mode check box.
In the ar.cfg/conf file, change/add the the Cache-Mode: parameter value. Valid values are 0 or 1.
(You will need to restart AR Server for the parameter to take effect)
The relevant parts of the documentation that have this information are:
|7.5||Configuration Guide page 162|
|7.6.04||Configuration Guide page 168|
Configure the AR System Server to Control Memory Use
Large searches can also consume memory - typically not as much as a CopyCache would, but they can still be avoided
Implement the following AR System Server configuration best practices:
- Limit large searches
- Do not allow users to perform unqualified searches
To remove all unqualified searches, review existing workflow and modify it as necessary.
Set the following AR System Server configuration options appropriately:
(All can be found on the Server information form -> Configuration tab)
|Parameter in ar.cfg/conf||Field in Configuration tab||Comments|
|Delay-Recache-Time||Recache Delay (seconds)||Specifies the number of seconds before the server makes the latest cache available to all threads. The minimum is 0, which means every API call will get the latest cache (that is, the cache will be copied for every administrative call or for operations such as arsignal). To permit only one admin copy cache to occur every hour for multiple administrative changes, set this option to the maximum (3600 seconds).|
|Cache-Mode||Development Cache Mode|
When set to development cache mode (Cache-Mode: 1), this prevents the AR System Server from creating a second cache during administrative changes and thus prevents huge memory expansion due to a single action.
Development cache mode has these drawbacks, however:
Hence, this mode is not recommended for a production environment.
|Max-Entries-Per-Query||Max Entries Returned by GetList||Specifies the maximum number of requests returned by a search. Use this option to prevent large queries that cause the server process to greatly increase it's memory use from being issued.|
|Cache-Display-Properties||Cache All Display Properties & Cache Only Server Display Properties|
Can be set to restrict the number of form display properties the server loads into memory at startup. This results in less memory use at startup, so more memory is available for the server process to grow. If an unloaded display property is needed, the server loads it on demand instead of caching it up front.
This does not decrease startup time because the server still must read all the properties to select those to load.
The reduced memory use comes at the expense of optimum performance later when data must be read from the database. It might also adversely affect server performance, database performance, or both when used with mid-tier caching.
Disables automatic signals triggered by changes to the following data on a server group’s administrative server:
These signals can cause recaches on target servers that significantly increase memory use temporarily. To change this data on the administrative server without impairing the performance of target servers, set this option to T to disable automatic signaling (the default is F). Later, when memory use is low, you can manually send the signals to the target servers by using the arsignal program.
Further important reading
The information provided above is just a short summary of the important facts needed to understand the impact of updating an AR Server environment.
For a more comprehensive understanding, review the following White Paper - Caching in BMC Remedy Action Request System
One question that does get asked a lot is:
"OK, the Operating System manages the memory, but it manages assigning the free memory to be used. The release of consumed memory relies on the process/application".
Page 20 of the Whitepaper available in the link above states:
Releasing memory for reuse
On a UNIX Operating System, after a process is allocated memory, the memory is not released by the Operating System. The AR System Server process releases memory back to the Operating System when it completes an operation, but the UNIX Operating System does not release the memory for reuse. Therefore, the amount of memory consumed by the process can increase, but it does not decrease after the process is finished using the memory. This is a limitation of the UNIX operating system’s memory managers.
Not all Operating System memory managers behave this way. For example, Windows eventually releases unused memory back to users. For more information, contact the operating system vendor.
These are the latest patches/Service Packs:
7.1 - patch 11
7.5 - patch 8
7.6.03 - patch 2
7.6.04 - SP4 (SP5 out soon)
8.0 - patch 3
8.1 - no patch available