How RKM search works includes video

Version 8
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    Remedy Knowledge Management Application


    COMPONENT:

    Remedy Knowledge Management Application



    DETAILS:

     

    How Search Works in RKM

    This document is about how RKM search really works and how to ensure that one gets “relevant” search results.  

    RKM Search Stack

    Lets first take a look at the RKM Search Stack. 
    RKM provides Global Search and Knowledge Search functionality for searching different indexed entities in RKM/ITSM/SRM like Tickets, Tasks, Articles, etc. Here is a set of steps which occur when you perform search:                                                                                                                      
    DescriptionSystem Component
    Enter Search text and perform search. RKM:SearchDialog workflows are triggered which build a search qualification and query AR’s FTS engine via “AR System Multi-Form Search” interface form.RKM
    AR Full Text Search Engine [plugin] would get invoked and would utilize Apache Lucene search engine to perform actual search.AR FTS Engine
    Apache Lucene (https://lucene.apache.org/core/) performs the search using an Index which would have got already generated based on indexing of various entities [Tickets, Tasks, Articles…]. Lucene would use its own Relevance Algorithm to order the search results. By default multiple terms provided as part of the search query are “OR”ed together [i.e. even if anyone term matches, the record is still returned as part of results]Apache Lucene
    AR would perform a post processing and would ensure Row Level Security by eliminating records not meant to be visible to the current userAR Post processing
    Search results are displayed by the Relevance Score/Weight returned by FTSRKM
    In order for the search to find relevant entities in ITSM/SRM, BMC indexes Tickets, Articles etc when that information is either created or modified. 
    In below sections, terms AR, FTS and Lucene are used interchangeably to describe the indexing and relevance concept. Similarly, terms Document and Record [or Entry in AR] are used interchangeably.  

    Lucene Relevance Scoring

      

    Indexing

    When information like Tickets or Articles are modified, Lucene reindexes the information. This involves analyzing the information. For Lucene everything is a document and a field within the document. From AR perspective a record or any entry is a document for Lucene and a column of the record marked for indexing is a field within the document. So an HPD:Help Desk form becomes a document and Summary or Notes fields which are marked for “MFS Only” or “FTS and MFS” indexing become the fields to be indexed by Lucene within that document. Note that Lucene builds index at Field level. 
      
    While indexing a field, Lucene  
       
    • Extracts keyword and calculate the number of occurrences per field and per document [Term Frequency]
    •  
    • Uses “root words” [Stemming]
    •  
    • Can be supplied a dictionary for similar words [Synonyms]
    •  
    • Can be supplied an “ignore words” list [Stop Words]
    Searching 
    Based on the search terms supplied, Lucene would use the already generated index to find similar or matching documents. It tries to find the “relevancy” of the document against the search terms supplied and gives a score to each document. The score is determined by 3 key factors described below:                                                                               
    How often does the search terms appear in the documentThe more often the terms is found, the higher the score. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.Technical: Term Frequency
    Tf = √frequency of the term in the field
    How often does the search term appears across the documents in the collectionThe more often the term is found, the lower the score. So common terms like go, find contribute little to relevance, unlike uncommon terms like MongoDB, Outlook etc.Technical: Inverse Document Frequency
    Idf = 1 + log (numDocs / (docFreq + 1))
    How long is the field in which search terms appearsThe shorter the field, the higher the score. If a term appears in a shorter field like Title or Keyword, its more likely describing the whole document rather than say a body field.Technical: Field-length Norm
    Norm = 1 / √numTerms
    In case, multiple fields on the same document are setup for FTS indexing then the above scores are aggregated across the field level scores for each document. The boost factors explained in below sections also come into play for overall score. 
    This can be further combined with other factors like Term Proximity in case of Phrase Queries Term Similarity in Fuzzy queries. RKM doesn’t use Phrase query by default and doesn’t support Fuzzy queries.  

    Summarizing Lucene Relevance

    In layman’s terminology, all of the following considerations come into play while deciding relevancy of results [and hence order of documents in the results as well]  
       
    • Documents containing all the search terms stand best chance of appearing on the top
    •  
    • Matches on rare words are better than the common words [i.e. most commonly found across documents]
    •  
    • Long documents or longer field text content is not as good as a short one
    •  
    • Documents which mention the search terms many times are good
      

    Improving Relevance of RKM Searches

    Lets look at some ways to improve the relevance of RKM searches, so that most appropriate articles are at the top of the results.  

    Tuning Guidelines

    RKM searches can be tuned by going thr’ the two phase process of verifying current status and then tuning the search relevance:                                                                                                                                                                                                                                                                                                                                                                                                                    
    PhaseStepWhat to verify/tune
    Verification
     Indexing Status of Knowledge TemplatesMidtier->Knowledge Management Console->Manage Knowledge Sources-> Knowledge Template date
     Relevancy Fields correctly mappedDeveloper Studio->Various *_Manageable_Join Form -> Definitions View -> FullTextSearch -> Title/Environment/Keyword Relevancy Field Mapping
     Relevancy Field WeightsMidtier->Server Information->FTS Tab->Title/Env/Keyword Field Weights
     Ignore fields listMidtier->Server Information->FTS Tab->Ignore Words List
     For articles which appear on top but should not - check Use and View CountAre the counts looking reasonable or have excessively high values?
     Use and View Count BoostMidtier->Application Admin-> Custom Configuration-> Knowledge Management-> Application Settings
    Do you need Defensive or Aggressive?
     Role of Article level AttachmentsCheck how the search text is matching
    Do you want the Attachments to be indexed?
    Search Tuning
     Identify keywords to search and identify articles which should be on topIs Title correctly worded?
    Is Keyword having required search terms present?
    What words can be ignored during search?
     Edit Article - Add needed  KeywordsWhich sections/fields should FTS search upon? Which sections/fields it should not?
     Set fields to index for KCS Template [Decide whether all OOTB fields need to be indexed for FTS and remove unwanted fields]Midtier->Knowledge Management Console->Manage Knowledge Sources->KCS Template->Content Fields
     Set Relevancy Field WeightMidtier->Server Information->FTS Tab->Title/Env/Keyword Field Weights
    Typically observed settings are Title:5, Keywords:2
    Recommended Range is Title:{4-6}, Keywords: {2-4}
     Set words to IgnoreMidtier->Server Information->FTS Tab->Ignore Words List
     Decide whether default Boost is aggressive for Viewing and Usage/Linking of articleMidtier->Application Admin->Custom Configuration->Knowledge Management->Application Settings
    This could have biggest impact on relevancy. If the articles are used very frequently and result in a bigger Use count or View count – like 500 or 1000 – then you need to make sure that the Use/View Boost is more defensive in nature E.g. View Boost ~= 0.000001 and Use Boost ~= 0.00001]
    Overall the multiplying factor should not cross beyond value of 3 or 4.
       
     Finally perform complete re-indexing      
           
    1. Cleanup the FTS collection folder
    2.      
    3. Remove records from FTPending table
    4.      
    5. Reindex from the Midtier->Server Information->FTS Tab
    6.     
       

    Article Writing Guidelines

    After having performed the Search Tuning described earlier, there are few practices one can follow while writing the Articles  
       
    • Title should have representative words describing the problem the article is trying to solve.
    •  
    • The section inside the article which describes the problem statement should be a brief section written in a language and terminology which the consumers of the article would use.
    •  
    • Make sure to utilize the keywords field to enter only singular keywords as well as synonyms which represent the article.
    •  
    • Overall at FTS/Lucence level, make sure that you utilize the Dictionary or Synonym facility to define similar words.
    •  
    • Identify stop-words which could be creating a lot of clutter in search results.
    •  
    • Study the "No Search Results" Report and identify either missing articles or missing keywords in the existing articles.
    •  
    • Visibility Groups should be in place to reduce the clutter in search results.
      Note: You may want to check Out of the box BMC Knowledge Management reports:   Reports
       

    Technical Reference on Algorithm

    When a search query is provided to Lucene, it will find the documents matching the query. As soon as any matching document is found, Lucene calculates the score for the document against the supplied query. It combines score of each matching term. The actual formula used for calculating relevancy score is 
          Score (q,d) = queryNorm(q) * coord(q,d) * ∑(t in q) ( tf(t in d) * idf (t)² * t.getBoost() * norm (t,d))
    Here, 
    Score(q,d) is the score of document d for query q 
    queryNorm(q) is query Normalization factor for term q so that all terms are brought at a same normalization level 
    coord(q,d) is a query Coordination factor which gives more weightage to those documents which contain higher percentage of terms. Thus a document having more query terms is expected to be a good match for the query. 
    (t in q)   is sum of Weights of each search term t in the query q for document d 
    Tf (t in d) is a Term Frequency of the term t in document d. This ensures that more often a term appears in the document, higher the weightage (i.e. more relevant the document is to the query) 
    Idf (t)  2 is an Inverse Document Frequency. This ensures that if a term appears more commonly across all documents in a collection/database, then lower the weightage (i.e. less relevant the document is to the query) 
    t.getBoost() is Query Time Boosting of a Field in the document over other Fields. 
    Norm (t,d) is Field Length Norm which is influenced by how long the field contents are. Shorter the field content length, higher the weight (i.e. more relevant the document is to the query). This is combined with Index Time Boosting of a Field where the boost (or multiplication) is applied to every term in the field, rather than to the field itself. 


      

     


    Article Number:

    000129846


    Article Type:

    Product/Service Description



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles