Rozszerzone zbiory znaków


Windows supports single-byte and multibyte character sets as well as Unicode. With a single-byte character set (SBCS), each byte in a string represents one character. The ANSI character set used by most Western versions of Windows is a single-byte character set.

In a multibyte character set (MBCS), some characters are represented by one byte and others by more than one byte. The first byte of a multibyte character is called the lead byte. In general, the lower 128 characters of a multibyte character set map to the 7-bit ASCII characters, and any byte whose ordinal value is greater than 127 is the lead byte of a multibyte character. Only single-byte characters can contain the null value (#0). Multibyte character sets—especially double-byte character sets (DBCS)—are widely used for Asian languages.

In the Unicode character set, each character is represented by two bytes. Thus a Unicode string is a sequence not of individual bytes but of two-byte words. Unicode characters and strings are also called wide characters and wide character strings. The first 256 Unicode characters map to the ANSI character set.

Object Pascal supports single-byte and multibyte characters and strings through the Char, PChar, AnsiChar, PAnsiChar, and AnsiString types. Delphi’s standard string-handling functions have multibyte-enabled counterparts that also implement locale-specific ordering for characters. (Names of multibyte functions usually start with Ansi-. For example, the multibyte version of StrPos is AnsiStrPos.) Multibyte character support is operating-system dependent and based on the current Windows locale.

Object Pascal supports Unicode characters and strings through the WideChar, PWideChar, and WideString types.