Index - Contents - Home

PGN Encoding

PGN Encoding and UTF-8

Scid vs. PC can export PGN to UTF-8 or Latin-1 (ISO 8859/1) character sets. English speakers will generally prefer Latin-1 (the PGN standard) , but other locales may find UTF-8 a better choice.

Enforcing selection of one of these is necessary because si4 has weaknesses concerning the internationalization of game data. Player, Site, Event names, etc, and PGN comments, can be stored with any character-set encoding.

Technical Details

These factors affect the encoding of Scid databases.

The PGN export will be done with the use of a character-set detector. This detector examines the content of the text, and converts it to either Latin-1 or UTF-8 (depending on the user's choice). In many cases it is even able to convert defective encodings into a proper character-set.

Implementing this feature in Scid vs. PC is also an important step towards the support of the C/CIF archive format, which only allows valid UTF-8, and the character-set detector will be used for a proper conversion.