Re: How to write UTF-8 encoded XML data to XML column?

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Hi.

Nigel Charman wrote:
I am having trouble writing XML data to an XML column using a PreparedStatement, when the XML data contains the XML declaration
<?xml version="1.0" encoding="UTF-8"?>.


As an example, the following java code succeeds:

Statement stmt = con.createStatement();
stmt.execute("Insert into xmltab VALUES('<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><a/></root>')");


while this version fails:

PreparedStatement ps = con.prepareStatement("Insert into xmltab VALUES(?)");
ps.setString(1, "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><a/></root>");
ps.execute();


with the exception:
com.microsoft.sqlserver.jdbc.SQLServerException: XML parsing: line 1, character 38, unable to switch the encoding
[...]

I can't give you any help with your specific JDBC driver problem.
But maybe I can give you a hint regarding XML encoding.

IMHO you are mixing two layers - character encoding and strings.

If you have bytes (byte[] or InputStream) you will need to decode
the bytes to get text, e.g. a String.

If you have a String then there is nothing like an "encoding".
OK, under the hoods the JVM encodes every String in memory as UTF-16
but this is an implementation detail.

So something like:

String foo = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";

is a contradiction. You allready have String (= sequence of unicode characters) and do not need an encoding any more.

The XML Specification defines the encoding as a help for XML parsers
which parses _binary_ data. There a a lot of XML parser APIs which allow to parse Strings, but in most cases the behaviour is undefined or simply broken.


So to be sure I would allways recommend to not parse XML Strings.

Back to the JDBC problem. It seems that the driver is parsing the string
value and it stumbles over the exact problem - a mismatch between the XML declaration and the String (or the internal String encoding).


Have you tried to use a byte[] with the text UTF-8 encoded instead of the String?

Ciao, Olli
.



Relevant Pages