Re: bug about nonsense character and ?; Adjust-click suggestion
On 9 Jan, mpro@... typed:
> Problem: The body of the message contains a couple of pound signs
> (£), and the Resend routine each time prefixed a nonsense character
> (Â) to the £. By the time I had reiterated the resend routine 10
> times, the message had a long string of nonsense  characters.
This is (another) interesting bug or perhaps just lack of consistency
in the MPro/RISC OS combo, and presumably relates to their handling of
UTF-8 encoded text. First, the Subject line:
As received here (ARMX6, MPro 7.08) Jim's post has the Subject line
shown in the Group index like this;
bug about nonsense character and ; Adjust-click suggestion
But in the displayed email like this:
bug about nonsense character and ?; Adjust-click suggestion
both when MPro--Choices--Display settings--User interface
font--Homerton. However it displays as:
bug about nonsense character and {?}; Adjust-click
suggestion
when MPro --Choices --Display settings --User interface font --DejaVu.
I've used {?} here to represent a black lozenge containing a white
question mark. I'd be interested to know what character(s) were sent
by Jim between "and" and "Adjust-click".
DejaVu is a Unicode font which may not be complete but !XChars shows
the normal &00-&FF character set for Encoding = Latin1, but those {?}
characters for &80-&FF when Encoding = UTF8. The email headers
included:
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
as opposed to (for example) any of:
charset="us-ascii", "utf-8"
Encoding: 8bit, 7bit, base64
The fact is if that encoding into UTF-8 a character such as £ (&A3)
which is within the range &80-&FF requires two bytes. As explained in
https://en.wikipedia.org/wiki/UTF-8, &A3 becomes &C2A3 which, lo and
behold, looks like capital A circumflex + pound sign when interpreted
as if in Latin1 (ISO-8859/1).
If such incorrect interpretation is then encoded into UTF-8 &C2
becomes &C382, and &A3 again becomes &C2A3, giving a string of four
bytes that will look like capital A tilde + small w circumflex +
captial A circumflex + pound sign when interpreted as if in Latin1.
Another capital A cirumflex will appear to be prefixed to the pound
sign every time it is encoded as UTF-8 but subsequently read as if it
were not. Perhaps this is at the root of Jim's observation.
With MPro --Choices --Message editing --UTF-8 editing unticked, that
shouldn't (I would have thought) happen, but I've run out of time to
experiment further just now. Over to others to report what their fonts
and settings are and whether MPro is (as I suspect) merely trying but
failing to acknowledge UTF-8 on a system that still has no Unicode
input editor.
--
Bernard
______________________________________________________________________
This message was sent via the messenger-l mailing list
To unsubscribe, mail messenger-l+unsubscribe@...
|