9 Replies Latest reply on Oct 16, 2018 3:29 AM by Rémi Chaffard

    json decode utf8 strings

    Rémi Chaffard

      Hi,

       

      I just get an error in the Openshift patterns when it tries to decode a json result of a command:

       

      RuleError on rule tpl_defn__KubernetesFunctions__fn__getResources_body_0 due to: Error in action tpl.json.decode -- UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)

       

      I think we'll not be able to avoid having utf8 characters in the output, is it possible then for the json.decode function to handle utf8 ?

       

      Thanks a lot

      Rémi

        • 1. Re: json decode utf8 strings
          Andrew Waters

          There is a defect associated with this (DRUD1-23580). IT is the json decoder which gets confused when there are Unicode escape characters in a UTF-8 string.

          1 of 1 people found this helpful
          • 2. Re: json decode utf8 strings
            Rémi Chaffard

            OK good to know.

            Is there any way to modify the string with tpl code before giving it to the decoder so we can at least parse something and avoid stopping the complete pattern ?

            Thanks

            • 3. Re: json decode utf8 strings
              Andrew Waters

              Do you have some Unicode escape sequences in the JSON? Backslash u followed by 4 hex characters, i.e. matching the regex \u[a-fA-F0-9]{4}

               

              Replacing those works for the cases I have seen.

              1 of 1 people found this helpful
              • 4. Re: json decode utf8 strings
                Rémi Chaffard

                I tried to do this (not efficient enough since it will replace multiple times the same sequence) :

                 

                unicodes := regex.extractAll(output,regex'(\\u[a-fA-F0-9]{4})');
                if unicodes then
                  for u in unicodes do
                    output := text.replace(output,u,'');
                  end for;
                end if;
                
                // Convert JSON data
                result := json.decode(output);
                

                 

                Problem is after that the result variable is None and the rest of the pattern fails.

                 

                What did I do wrong here ?

                Thanks a lot

                Rémi

                • 5. Re: json decode utf8 strings
                  Andrew Waters

                  From just looking at, nothing obvious. I would try logging it before and after and using a diff tool.

                   

                  It could be something horrible like you have output \\u so you end up with an invalid escape character.

                  1 of 1 people found this helpful
                  • 6. Re: json decode utf8 strings
                    Rémi Chaffard

                    OK, I wanted to avoid this since the file is pretty big. I will try however, but then how should I proceed to understand what is the decode error, is there any way to simulate through python code or something ?

                    Without error from json.decode function, there's no way to find the error just by reading the file.

                    • 7. Re: json decode utf8 strings
                      Rémi Chaffard

                      OK, I found out why.

                      The output is containing nested Json, and the nested part is escaped, it means that unicode characters like \u0026 are in fact \\u0026. After the replacement, we get a remaining backslash in the middle of nowhere, making the parser to fail.

                       

                      I used https://jsonlint.com/ to check.

                       

                      Rémi

                      1 of 1 people found this helpful
                      • 8. Re: json decode utf8 strings
                        Rémi Chaffard

                        Ok, one question then, is there any way to easily sort a list in tpl ?

                        In fact the piece of code I'm using to replace unicode characters will replace in the order they occur in the input string. It mean that we may try to replace \u0026 before \\u0026, which makes it incorrect because then occurrences of the second has been replaced by \ before.

                         

                        I need a way to sort the list of characters I will replace, starting by the biggest number of \, so the replacements will come in the correct order.

                         

                        Or is there any way to replace direcly by regex ?

                         

                        Thanks

                        Rémi

                        • 9. Re: json decode utf8 strings
                          Rémi Chaffard

                          Hi,

                           

                          I did some dirty code to handle that to finalize my tests. This looks like this if someone is interested

                           

                          unicodes := regex.extractAll(output,regex'(\\+u[a-fA-F0-9]{4})');
                                  replacements := [];
                                  if unicodes then
                                    for u in unicodes do         
                                      if not u in replacements then
                                        idx := findIndexToInsert(replacements,u);
                                        if idx = 0 then
                                          replacements := [u] + replacements;
                                        elif idx = size(replacements) then
                                          replacements := replacements + [u];
                                        else
                                          replacements := replacements[:idx] + [u] + replacements[idx:];
                                        end if;
                                      end if;
                                    end for;
                                   
                                    for r in replacements do
                                      output := text.replace(output,r,'');
                                    end for;
                                  end if;
                          

                           

                          And the findIndexToInsert function:

                           

                            define findIndexToInsert(tab, val) -> idx
                                '''
                                '''
                                idx := 0;
                                for item in tab do
                                  if item >= val then
                                    break;
                                  end if;
                                  idx := idx + 1;
                                end for;
                                return idx;
                              end define;
                          

                           

                          All the code is about sorting the list of unicode characters to replace, in order to replace those having the biggest number of \ first.

                          I did not tested it deep, it does what I need for now. I will wait for teh defect to be resolved.

                           

                          It could be good anyhow to have some more list management functions in tpl like sorting and removing duplicates, sometimes it can help. What do you think ?

                           

                          Thanks

                          Rémi