Corporate Information | Software | Support | Contact Us
Corporate Information
Software
Messenger Pro
Messenger Pro Downloads
Beta and Archived Downloads
Evaluate
Support
Messenger Pro FAQ
Messenger Pro Change Log
Messenger Pro (Windows/Linux/Mac) Mailing List Archive
Messenger Pro (RISC OS) Mailing List Archive
Bug Tracker
Contact Us

Re: bug about nonsense character and ?; Adjust-click suggestion

From:Bernard Boase Date:11 Jan 2016 20:37
In Reply To: bug about nonsense character and £; Adjust-click suggestion (Jim Nagel)
Replies: Re: bug about nonsense character and ?; Adjust-click suggestion (Brian Howlett)
Re: bug about nonsense character and ?; Adjust-click suggestion (Jim Nagel)

On 9 Jan, mpro@... typed:

> Problem:  The body of the message contains a couple of pound signs
> (£), and the Resend routine each time prefixed a nonsense character
> (Â) to the £.  By the time I had reiterated the resend routine 10
> times, the message had a long string of nonsense  characters.

This is (another) interesting bug or perhaps just lack of consistency 
in the MPro/RISC OS combo, and presumably relates to their handling of 
UTF-8 encoded text. First, the Subject line:

As received here (ARMX6, MPro 7.08) Jim's post has the Subject line 
shown in the Group index like this;

     bug about nonsense character and ; Adjust-click suggestion

But in the displayed email like this:

     bug about nonsense character and ?; Adjust-click suggestion

both when MPro--Choices--Display settings--User interface 
font--Homerton. However it displays as:

     bug about nonsense character and {?}; Adjust-click 
     suggestion

when MPro --Choices --Display settings --User interface font --DejaVu. 
I've used {?} here to represent a black lozenge containing a white 
question mark. I'd be interested to know what character(s) were sent 
by Jim between "and" and "Adjust-click".

DejaVu is a Unicode font which may not be complete but !XChars shows 
the normal &00-&FF character set for Encoding = Latin1, but those {?} 
characters for &80-&FF when Encoding = UTF8. The email headers 
included:

     Content-Type: text/plain; charset="iso-8859-1"
     Content-Transfer-Encoding: quoted-printable

as opposed to (for example) any of:

     charset="us-ascii", "utf-8"
     Encoding: 8bit, 7bit, base64

The fact is if that encoding into UTF-8 a character such as £ (&A3) 
which is within the range &80-&FF requires two bytes. As explained in  
https://en.wikipedia.org/wiki/UTF-8, &A3 becomes &C2A3 which, lo and 
behold, looks like capital A circumflex + pound sign when interpreted 
as if in Latin1 (ISO-8859/1).

If such incorrect interpretation is then encoded into UTF-8 &C2 
becomes &C382, and &A3 again becomes &C2A3, giving a string of four 
bytes that will look like capital A tilde + small w circumflex + 
captial A circumflex + pound sign when interpreted as if in Latin1.

Another capital A cirumflex will appear to be prefixed to the pound 
sign every time it is encoded as UTF-8 but subsequently read as if it 
were not. Perhaps this is at the root of Jim's observation.

With MPro --Choices --Message editing --UTF-8 editing unticked, that 
shouldn't (I would have thought) happen, but I've run out of time to 
experiment further just now. Over to others to report what their fonts 
and settings are and whether MPro is (as I suspect) merely trying but 
failing to acknowledge UTF-8 on a system that still has no Unicode 
input editor.

-- 
Bernard

______________________________________________________________________
This message was sent via the messenger-l mailing list
To unsubscribe, mail messenger-l+unsubscribe@...



© 2024 intellegit ltd. - info@intellegit.com