realbasic-nug
[Top] [All Lists]

RB Text Encoding FAQ

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: RB Text Encoding FAQ
From: "Joseph J. Strout" <joe at realsoftware dot com>
Date: Thu, 20 Mar 2003 08:50:15 -0800
As was recently suggested, I've decided to start a FAQ on text encoding in REALbasic 5. I'm going to let this be driven by actual questions. Here's what I have so far, based only on posts I saw today. If you have more questions, please feel free to ask and I'll add the answers to the FAQ.

Best,
- Joe

Frequently Asked Questions
about Text Encoding in REALbasic 5
----------------------------------

1. What encoding are my string literals, constants, etc. in?

All strings in your REALbasic project should be compiled as UTF-8. This is a Unicode encoding that uses one byte for ASCII characters, and up to four bytes for non-ASCII characters. It has a number of other handy properties too, for example, an ASCII character will never appear as part of a multi-byte character.


2. Which is faster, ConvertEncoding or TextConverter.Convert?

In most cases, ConvertEncoding is much faster than using TextConverter.Convert. ConvertEncoding has a number of optimizations for common cases, such as converting the same string multiple times, or converting from one superset of ASCII to another. (All WorldScript encodings, most Windows encodings, and UTF-8 are all supersets of ASCII.)

So, you should usually use ConvertEncoding, but if you really need the speed then you should just measure it both ways and see which performs better in your particular situation.


3. How do I get a specific byte into a string?

Use ChrB. ChrB takes a byte value (0-255) and returns a string with undefined encoding, containing exactly that byte. You can build a string containing multiple bytes by just adding these together.

Of course, don't expect such a string to display as text in any sensible way. If you want to make text, see the next question.


4. How do I get a specific character by its code point (or "ASCII value")?

Use TextEncoding.Chr. This returns a one-character string with the character you specified by its code point within that encoding. For example, a capital A in the ASCII character set would be:

   s = Encodings.ASCII.Chr(65)

A copyright symbol represented in UTF-8 would be:

   s = Encodings.UTF8.Chr(169)


--
,------------------------------------------------------------------.
|    Joseph J. Strout           REAL Software, Inc.                |
|    joe at realsoftware dot com       http://www.realsoftware.com        |
`------------------------------------------------------------------'

---
A searchable archive of this list is available at:
<http://support.realsoftware.com/listarchives/search.php>

Unsubscribe:
<mailto:realbasic-nug-off at lists dot realsoftware dot com>

Subscribe to the digest: <mailto:realbasic-nug-digest at lists dot realsoftware dot com>
.


<Prev in Thread] Current Thread [Next in Thread>