Not logged in. · Lost password · Register
Forum: MatriX and XmppDotNet RSS
Avatar
Glassmaker #1
Member since Aug 2013 · 8 posts
Group memberships: Members
Show profile · Link to this post
Subject: UTF-8 parsing issue: No mapping for the Unicode character exists in the target multi-byte code page.
The following problem is happening from time to time:

<!-- ++ 18:28:52 06.08.2013 ++ ++ XMPP/SendXml ++ ++ -->
<iq id="MX_9" type="get" xmlns="jabber:client">
  <query xmlns="jabber:iq:roster" />
</iq>

<!-- !! 18:28:52 06.08.2013 !! !! XMPP/Error !! !! -->
System.ArgumentOutOfRangeException: No mapping for the Unicode character exists in the target multi-byte code page.

   at Windows.Storage.Streams.DataReader.ReadString(UInt32 codeUnitCount)
   at Matrix.Net.ClientSocket.#=qgJNXI0mSiTn0jJtXnbbtaXhf_aH_s$hK_aH0R2ryLbk=(UInt32 #=qNgllZZgtaERkuEdpmbMd2A==, DataReader #=q7gKjvNH6BmgzOEL4UnltqQ==)
   at Matrix.Net.ClientSocket.#=qz3YXVyG7Ca6euUSq9ErtzA==(IAsyncOperationWithProgress`2 #=qN2QS$OxFpI_4_ipPawmcag==, AsyncStatus #=qVHcWN9EoMF59r6A_9tYgQQ==)

After that, MatriX closes the stream.

The reason for it is that the server's roster response contains UTF-8 characters. As far as I understood by the stack trace, Matrix calls await dataReader.LoadAsync(...) and then dataReader.ReadString(...). Therefore, when LoadAsync loads a buffer of bytes having a last byte that is a first byte of a two-byte Unicode sequence, the next iteration with a call to ReadString will likely fail, because the first byte is greater than 127. The following code shows this problem with a Unicode string and a DataReader over it:

const string problemString = "UTF8. -- ?????.";
IBuffer buffer = Encoding.UTF8.GetBytes(problemString).AsBuffer();
using (DataReader dr = DataReader.FromBuffer(buffer))
{
  uint bytesToRead = 4;// buffer.Length; // UNCOMMENT AND EVERYTHING IS FINE
  while (dr.UnconsumedBufferLength > 0)
  {
    string sres = dr.ReadString(bytesToRead); // the second call fails
    Debug.WriteLine(sres);
  }
}

Add the code provided above to the OnLaunched event handler of a new Windows Store application and it will fail with an exception. The solution would probably involve avoiding calls to dataReader.ReadString and instead reading all the packet's bytes first, and then decoding them through UTF-8.

Until this issue is resolved, MatriX for WinRT may randomly fail and close the stream if there are any Unicode characters. (By the way: it might be preferred not to close the entire stream should anything fail while parsing it.)
Avatar
Glassmaker #2
Member since Aug 2013 · 8 posts
Group memberships: Members
Show profile · Link to this post
This forum apparently does not support Unicode characters. Please visit a web page in a language using a non-Latin-based alphabet, such as this one: http://ja.wikipedia.org/wiki/%E3%82%B3%E3%83%BC%E3%83%89%E… , and copy some text from it in place of question marks.
const string problemString = "UTF8. -- ????????.";
Avatar
Alex #3
Member since Feb 2003 · 4447 posts · Location: Germany
Group memberships: Administrators, Members
Show profile · Link to this post
yes there is a ReadString call in MatriX.

I have replaced the ReadString call with ReadBytes and have attached a new MatriX build. Can you please let me know if this build fixes your problems?

Thanks,
Alex
The author has attached one file to this post:
Matrix.zip 350.3 kBytes
You have no permission to open this file.
This post was edited on 2013-08-06, 19:39 by Alex.
Avatar
Glassmaker #4
Member since Aug 2013 · 8 posts
Group memberships: Members
Show profile · Link to this post
Yes, this build fixed the problem, it does not fail now, and all the Unicode characters seem to be in place. Thank you!
Avatar
Alex #5
Member since Feb 2003 · 4447 posts · Location: Germany
Group memberships: Administrators, Members
Show profile · Link to this post
yes, MatriX is fully unicode compatible and has its own Xml tokenizer which has no problems with handling partial packets and also partial unicode sequences. This was only the ReadString call which was in the code for debugging purposes.

Thanks for finding this problem.

Alex
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters: