diff options
Diffstat (limited to 'src/chrtrans/README.format')
-rw-r--r-- | src/chrtrans/README.format | 138 |
1 files changed, 0 insertions, 138 deletions
diff --git a/src/chrtrans/README.format b/src/chrtrans/README.format deleted file mode 100644 index 7437b503..00000000 --- a/src/chrtrans/README.format +++ /dev/null @@ -1,138 +0,0 @@ -Some notes on the format of table files used here. -(See README.tables for what to do with them.) - -The format is derived from stuff in the console driver of the -Linux kernel (as are the guts of the chartrans machinery). -THAT DOES NOT MEAN that anything here is Linux specific - it isn't. - -[Note that the format may change, this is still somewhat experimental.] - -There are four kinds of lines: - -Summary example: - - # This line is a comment, the next line is a directive - O Brand new Charset! - 0x41 U+0041 U+0391 - U+00cd:I' - -Description: - -a) comment lines start with a '#' character. - (trailing comments are allowed on some of the other lines, if in doubt - check the examples..) - -b) directives: - start with a keyword which may be abbreviated to one letter (first - letter must be capitalized), followed by space and a value. - Currently recognized: - - OptionName - The name under which this should appear on the O)ptions screen - in the list for Display Character Set - MIMEName - The name for this charset in MIME syntax (one word with digits - and some other non-letters allowed, should be IANA registered) - Default - If "Y[es]" or "1", this is the default (fallback) translation table, - it will be used for Unicode -> 8bit (or 7bit) translation if no - translation is found in the specific table. - FallBack - Whether to use the default table if no translation is found in - this table. Normally fallback is used, "FallBack NO" or "FallBack 0" - disables it (actually, other values than "FallBack Y[es]" or - "FallBack 1" disable it). - - RawOrEnc - a number which flags some special property (encoding) for this - charset [see utf8_uni.tbl for example, see UCDefs.h for details]. - - Codepage number (IBM specific) - used by OS/2 font-switching code. - -c) character translation definitions: - they look like - - 0x41 U+0041 U+0391 ... - - and are used for "forward" translation (mapping this charset to Unicode) - AS WELL AS "back" translation (mapping Unicodes to an 8-bit - [incl. 7-bit ASCII] code). - - For the "forward" direction, only the first Unicode is used; for - "back" translation, all listed Unicodes are mapped to the byte (i.e. - code point) on the left. - - The above example line would tell the chartrans mechanism: - "For this charset, code position 65 [hex 0x41] contains Unicode - U+0041 (LATIN CAPITAL LETTER A). For translation of Unicodes to - this charset, use byte value 65 [hex 0x41] for U+0041 (LATIN CAPITAL - LETTER A) as well as for U+0391 (GREEK CAPITAL LETTER ALPHA)." - - [Note that for bytes in the ASCII range 0x00-0x7F, the forward translations - will (probably) not be used by Lynx. It doesn't hurt to list those, - too, for completeness.] - - Some other forms are also accepted: - - * Syntax accepted: - * <fontpos> <unicode> <unicode> ... - * <fontpos> <unicode range> <unicode range> ... - * <fontpos> idem - * <range> idem - * <range> <unicode range> - * - * where <unicode range> ::= <unicode>-<unicode> - * and <unicode> ::= U+<h><h><h><h> - * and <h> ::= <hexadecimal digit> - * - [Note that <fontpos> _without_ targets assumed notdefined, - so tables from ftp.unicode.org need no patching.] - - -d) string replacement definitions: - - They look like - - U+00cd:I' - - which would mean "Replace Unicode U+00cd (LATIN CAPITAL LETTER I WITH - ACUTE" with the string (consisting of two character) I' (if no other - translation is available)." Please note that replacement definitions - in certain charset table will override ones from the Default table. - - Note that everything after the ':' is currently taken VERBATIM, so - careful with trailing blanks etc. Please use <C replace> syntax below - when you need trailing spaces. - - * Syntax accepted: - * <unicode> :<replace> - * <unicode range> :<replace> - * <unicode> "<C replace>" - * <unicode range> "<C replace>" - * - * where <unicode range> ::= <unicode>-<unicode> - * and <unicode> ::= U+<h><h><h><h> - * and <h> ::= <hexadecimal digit> - * and <replace> any string not containing '\n' or '\0', taken verbatim - * and <C replace> any string, with backslash having the usual C meaning. - -Motivation: - -- It is an extension of the format already in use for Linux (kernel, - kbd package), those files can be used with some minimal editing. - -- It is easy to convert Unicode tables for other charsets, as they - are commonly found on ftp sites etc., to this format - the right - sed command should do 99% of the work. - -- The format is independent of details of other parts of the Lynx code, - unlike the "old" LYCharsets.c mechanism. The tables don't have to - be changed in synch when e.g., new entities are added to the entities.h. - - -Note: the Default "7bit approximation" table can be used for -case-insensitive search for non-ascii letters if no upper/lower case -information provided by other means, e.g., locale. It is assumed that -upper/lower case letters have their "7bit approximation" images -in def7_uni.tbl matched case-insensitively. |