1 of 1 people found this helpful
I have spent 3 weeks intensive analysing CAM and SAAM for a training course. There is no substitute for Visualizer, the Query Language and a good Brain. My attempt to create a formula or spreadsheet for the task was not right. It can't be done. The best advice is some rules of thumb and examples:
- Start with the database - search DatabaseDetail
- For the application server layer - search for SIs - but if you find none look at processes and services
- For the web layer - search Software Components
- Beware using Observed Communication. The application may not have been developed with permanent Connection pools. Next scan the coms may be gone - causing a map flip-flop.
There are also some rules of thumb around performance. For example "ends with" can be very slow. Need to make sure your regex'es are optimized. I am still trying to figure out what's indexed in ADDM and what's not. The documntation is good - but scant in certain areas such as how ADDM uses hash keys and b-trees and regex. For example "contains word" is just a regex in TPL with a look ahead and look behind assertion. Bad regex's can be slow. I am just testing an Advanced Query regex for the string "fred$". I only selected DiscoveredProcesses. We have 5 million. My Search is still running after 43 minutes! I will do a post about this.
3 of 3 people found this helpful
Exactly. There cannot be a single "right formula" for finding the parts of an application. Different applications can be very different from each other, so the best way to find their constituent parts and model them within Discovery varies too. Clearly, if you have a large number of in-house developed applications that all have the same basic structure, then it makes sense to have a common approach to modelling them, but for the general case of diverse applications, it is counter-productive to try to limit the approach.
On the subject of performance, it is important to distinguish between finding nodes in the first place, and filtering nodes when you already have some. When finding an initial set of nodes, the data store uses its indexes; when filtering existing nodes it does not. Absolutely all the data is indexed in a full-text word/phrase index. String values are also indexed by hash. That means that if you are searching for nodes, it is fast to do exact equality tests and to do subword tests. Substring tests are worse, but can often use the word indexes too. A very few special cases of regular expressions are handled as substring or subword tests, but in general regular expressions involve retrieving all the data for the attribute(s) in question, and testing against the regex.
When you already have a set of nodes, either because a pattern is acting on some data, or because you have a search that filters the results of a traversal, the indexes are not used. The data store just directly retrieves the data it needs to evaluate the filtering expressions.