Subject: UTF-8 parsing issue: No mapping for the Unicode character exists in the target multi-byte code page.
The following problem is happening from time to time:
After that, MatriX closes the stream.
The reason for it is that the server's roster response contains UTF-8 characters. As far as I understood by the stack trace, Matrix calls await dataReader.LoadAsync(...) and then dataReader.ReadString(...). Therefore, when LoadAsync loads a buffer of bytes having a last byte that is a first byte of a two-byte Unicode sequence, the next iteration with a call to ReadString will likely fail, because the first byte is greater than 127. The following code shows this problem with a Unicode string and a DataReader over it:
Add the code provided above to the OnLaunched event handler of a new Windows Store application and it will fail with an exception. The solution would probably involve avoiding calls to dataReader.ReadString and instead reading all the packet's bytes first, and then decoding them through UTF-8.
Until this issue is resolved, MatriX for WinRT may randomly fail and close the stream if there are any Unicode characters. (By the way: it might be preferred not to close the entire stream should anything fail while parsing it.)
<!-- ++ 18:28:52 06.08.2013 ++ ++ XMPP/SendXml ++ ++ -->
<iq id="MX_9" type="get" xmlns="jabber:client">
<query xmlns="jabber:iq:roster" />
</iq>
<!-- !! 18:28:52 06.08.2013 !! !! XMPP/Error !! !! -->
System.ArgumentOutOfRangeException: No mapping for the Unicode character exists in the target multi-byte code page.
at Windows.Storage.Streams.DataReader.ReadString(UInt32 codeUnitCount)
at Matrix.Net.ClientSocket.#=qgJNXI0mSiTn0jJtXnbbtaXhf_aH_s$hK_aH0R2ryLbk=(UInt32 #=qNgllZZgtaERkuEdpmbMd2A==, DataReader #=q7gKjvNH6BmgzOEL4UnltqQ==)
at Matrix.Net.ClientSocket.#=qz3YXVyG7Ca6euUSq9ErtzA==(IAsyncOperationWithProgress`2 #=qN2QS$OxFpI_4_ipPawmcag==, AsyncStatus #=qVHcWN9EoMF59r6A_9tYgQQ==)
<iq id="MX_9" type="get" xmlns="jabber:client">
<query xmlns="jabber:iq:roster" />
</iq>
<!-- !! 18:28:52 06.08.2013 !! !! XMPP/Error !! !! -->
System.ArgumentOutOfRangeException: No mapping for the Unicode character exists in the target multi-byte code page.
at Windows.Storage.Streams.DataReader.ReadString(UInt32 codeUnitCount)
at Matrix.Net.ClientSocket.#=qgJNXI0mSiTn0jJtXnbbtaXhf_aH_s$hK_aH0R2ryLbk=(UInt32 #=qNgllZZgtaERkuEdpmbMd2A==, DataReader #=q7gKjvNH6BmgzOEL4UnltqQ==)
at Matrix.Net.ClientSocket.#=qz3YXVyG7Ca6euUSq9ErtzA==(IAsyncOperationWithProgress`2 #=qN2QS$OxFpI_4_ipPawmcag==, AsyncStatus #=qVHcWN9EoMF59r6A_9tYgQQ==)
After that, MatriX closes the stream.
The reason for it is that the server's roster response contains UTF-8 characters. As far as I understood by the stack trace, Matrix calls await dataReader.LoadAsync(...) and then dataReader.ReadString(...). Therefore, when LoadAsync loads a buffer of bytes having a last byte that is a first byte of a two-byte Unicode sequence, the next iteration with a call to ReadString will likely fail, because the first byte is greater than 127. The following code shows this problem with a Unicode string and a DataReader over it:
const string problemString = "UTF8. -- ?????.";
IBuffer buffer = Encoding.UTF8.GetBytes(problemString).AsBuffer();
using (DataReader dr = DataReader.FromBuffer(buffer))
{
uint bytesToRead = 4;// buffer.Length; // UNCOMMENT AND EVERYTHING IS FINE
while (dr.UnconsumedBufferLength > 0)
{
string sres = dr.ReadString(bytesToRead); // the second call fails
Debug.WriteLine(sres);
}
}
IBuffer buffer = Encoding.UTF8.GetBytes(problemString).AsBuffer();
using (DataReader dr = DataReader.FromBuffer(buffer))
{
uint bytesToRead = 4;// buffer.Length; // UNCOMMENT AND EVERYTHING IS FINE
while (dr.UnconsumedBufferLength > 0)
{
string sres = dr.ReadString(bytesToRead); // the second call fails
Debug.WriteLine(sres);
}
}
Add the code provided above to the OnLaunched event handler of a new Windows Store application and it will fail with an exception. The solution would probably involve avoiding calls to dataReader.ReadString and instead reading all the packet's bytes first, and then decoding them through UTF-8.
Until this issue is resolved, MatriX for WinRT may randomly fail and close the stream if there are any Unicode characters. (By the way: it might be preferred not to close the entire stream should anything fail while parsing it.)