Technical Notes |
|
When using Verastream to integrate different platforms and databases, you may encounter differences in character encoding methods, which can cause problems. These variances are most often seen when using XML or when integrating multiple platforms and databases, particularly ones from different countries. This technical note discusses XML character encoding makes suggestions for working in this environment.
XML messages can use various character encodings, such as UTF-8, Unicode (UTF-16), or ISO-Latin1. Because each of these encodings use the same first 128 characters (the ASCII characters), you may not notice any problems when working with more than one character encoding type in your software until you start to use special characters (such as ë or Ä) that are outside of the ASCII set.
Verastream assumes that all it's strings are in the native character set encoding (the character set used by the operating system where Verastream Integration Broker is running), which is either ISO-Latin1 or EBCDIC. Therefore, if you use a script to manipulate an XML message, rather than simply passing the XML along, you need to verify that you are using the correct character encoding.
When working with character sets, consider the following information:
For example:
It is generally best to use the Binary form (and thus a field of type byte) to deal with XML messages in your application; however, if you need to manually manipulate the XML message using a string function (in a script), you may need to choose Text representation and use the corresponding Text methods to hand the XML message to an XML component.
For example, when running Microsoft Windows, this means that a text file with CR-LF characters results in a field with one LF character.
Therefore, if you read a normal ASCII text file on Microsoft Windows, a byte with ASCII value 98 (decimal) represents a 'b' and on OS/390 a byte with a decimal value of 130 represents a 'b'.
If you then use the "<name>Text" method (a method with a name that ends "Text") to read this information into Verastream's XML parser, the BOM is converted incorrectly. To avoid this problem, use the "<name>Binary" method to hand over the field to the XML parser. Using this method, the BOM is not translated and is properly recognized by the XML parser.
If you want to construct or manipulate an XML message using string manipulation, and hand over the XML message to an XML component using the "<name>Text" method, you should not specify an XML declaration in that field. That is, the first line should not read <?xml ...?>.
If you read the document from file, or receive it in a non-binary form (so that the field of type text or string contains native characters), and you want to perform string manipulation, remove the <?xml ... ?> XML declaration line (that could define the character encoding).