I consider "" a string literal just like "foo", and it is documented
that string literals are UTF-8. So I'd think that "" should be
UTF-8 like all other string literals.
Yes, but consider the following case. Suppose you're reading
MacRoman strings from a file, concatenating them, and then writing
them back to another file which *must* contain MacRoman text.
s = "" // just there for illustration, s would start out ""
tin.Encoding = Encodings.MacRoman
while not tin.EOF
temp = tin.ReadLine
s = s + temp
wend
tout.Write s
If s were defined as a UTF-8 string, then the MacRoman strings being
added to it would be converted to UTF-8, resulting in a UTF-8 string
at the end.
It could be argued that you should not write out a string unless
you've explicitly converted it to the desired encoding, but why
should you have to in a case like this? You've only added MacRoman
strings, so the result should be MacRoman!
I think the special case of "" being considered ASCII is much better
than other special cases you'd have to come up with if "" were
considered UTF-8.
--
-Thomas
Personal web page: http://homepage.mac.com/thomasareed/
My shareware: http://www.bitjuggler.com/
REALbasic page: http://www.bitjuggler.com/extra/
There are 10 kinds of people in the world -- those who understand binary
numbers and those who don't.
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|