Machine learning metrics to evaluate BMC Helix Innovation Suite Cognitive Service

Version 2
    Share:|

    BMC Helix Innovation Suite Cognitive Service provides auto-categorization, auto-assignment, and chatbot capabilities. To improve the cognitive service data sets for these functionalities, BMC Helix Innovation Suite version 18.11 provides a tool to leverage the standard machine learning metrics.

     

    Benefits

    • You do not require prior knowledge of data science to use this tool.
    • Helps to evaluate the cognitive service based on standard machine learning algorithms.
    • Helps identify the exact area of problem so that you can rectify the data sets to improve the performance of the cognitive service.
    • Provides a history of the test results.

    Metrics derived after testing the cognitive data sets

    • Accuracy—Accuracy is the ratio of the number of correct predictions to the total number of input samples.
      For example, if the test results indicate that 9 out of 10 variations of increase RAM request are correctly predicted, the accuracy is 9/10 = 0.9.
    • Recall—Recall is the number of correct positive results divided by the number of positive results predicted by the cognitive service.
      For example, for a search query that contains increase RAM, if the system returns 10 results that contain both increase and RAM and 8 of those results include the phrase increase RAM, the precision is 8 out of 10. If 20 more instances are related to increase RAM, the recall is 8 out of 30.
      Higher recall indicates higher viability of the data sets.
    • Precision—Precision is the number of correct positive results divided by the total number of relevant samples.
      For example, for a search query that contains increase RAM, the system returns 10 results that contain both increase and RAM and 8 of those results include the phrase increase RAM. In this case, the precision is 8/ 10 = 0.8.
      Higher precision indicates higher viability of the data sets.
    • F-score—F-score is the harmonic average of precision and recall. F-score reaches its best value at 1 (indicating perfect precision and recall) and worst at 0.
      Traditionally, F-score is calculated as F = 2 × (Precision × Recall) / (Precision + Recall)

    The following image is an example of the test results file:

    1811_test results CSV.png

     

    The following image is an example of a test results file depicting the exact area of problem where the cognitive service did not predict correctly:

    1811_test data.png

     

    For more information, see Leveraging machine learning metrics to improve cognitive service data sets.