Chinese  Search Engine
China internet resources--Chinese character sets GB and Big5

Home | CN domain name | Chinese translation | China web hosting | Chinese search engine submission | Service list
Chinese search engine optimization | Chinese search engine News | Chinese search engine | Chinese PPC engine

Home-->Chinese internet resource-->Chinese character sets

Chinese character sets

The two most commonly used character sets for Chinese are:

GB (used in mainland China and associated with simplified characters)

Big5 (used in Taiwan and Hong Kong and associated with traditional characters)

A new universal character set called Unicode is gradually coming into use, but almost all Chinese web pages use either GB or Big5.

What's a character set?
A character set is just a standard way of representing and storing written symbols in a computer. These symbols can be letters, punctuation, Chinese characters, and so on.

For writing English, the most common character set is called ASCII. This is a 7-bit character set, which means it can store a maximum of 128 separate symbols. It includes capital letters, small letters, digits, common punctuation symbols, and so on. For instance, the capital letter A is represented by 01000001.

On the Internet, most web pages can be written using the ISO-8859-1 character set. This is an 8-bit character set, which means it can store a maximum of 256 separate symbols. It includes all of ASCII plus accented letters used by Western European languages.

Because Chinese contains thousands of different characters, it requires a double-byte (or 16-bit) character set, such as GB or Big5 or Unicode, capable of storing a maximum of 65536 separate symbols.

Technical details
GB contains roughly 6500 separate symbols and Big5 contains roughly 13000 separate symbols. In addition to Chinese characters, these include Roman letters, Russian Cyrillic letters, Greek letters, Japanese katakana and hiragana, math symbols, and so on. However, they do not include accented letters as used by European languages.

Each byte of a double-byte GB character is in the hexadecimal range a1 to fe. For Big5, the first byte in the range a1 to fe, while the second byte is in the range 40 to 7f or a1 to fe.

Thus, 7-bit ASCII text can be intermixed freely within a GB or Big5 text. However, 8-bit accented European letters cannot be mixed in this way, and most current browsers unfortunately cannot display European languages and Chinese on the same web page.

> Simple facts about China
> China Internet Marketing Report By CNNIC
> Chinese character sets
> Chinese online games provider sees income
triple

> World 'beauty makers' knocking China door
> Cosmetic industry embarks on reshuffle
> Chinese Internet User Profile ----- Till Jul, 2004
> Chinese Internet User Profile ----- Till Jan, 2004
> Chinese Internet User Profile ----- Till July, 2003