about summary refs log tree commit diff stats
path: root/src/chrtrans/README.format
diff options
context:
space:
mode:
Diffstat (limited to 'src/chrtrans/README.format')
-rw-r--r--src/chrtrans/README.format138
1 files changed, 0 insertions, 138 deletions
diff --git a/src/chrtrans/README.format b/src/chrtrans/README.format
deleted file mode 100644
index 7437b503..00000000
--- a/src/chrtrans/README.format
+++ /dev/null
@@ -1,138 +0,0 @@
-Some notes on the format of table files used here.
-(See README.tables for what to do with them.)
-
-The format is derived from stuff in the console driver of the
-Linux kernel (as are the guts of the chartrans machinery).
-THAT DOES NOT MEAN that anything here is Linux specific - it isn't.
-
-[Note that the format may change, this is still somewhat experimental.]
-
-There are four kinds of lines:
-
-Summary example:
-
-  # This line is a comment, the next line is a directive
-  O Brand new Charset!
-  0x41    U+0041 U+0391
-  U+00cd:I'
-
-Description:
-
-a) comment lines start with a '#' character.
-   (trailing comments are allowed on some of the other lines, if in doubt
-   check the examples..)
-
-b) directives:
-   start with a keyword which may be abbreviated to one letter (first
-   letter must be capitalized), followed by space and a value.
-   Currently recognized:
-
-    OptionName
-	The name under which this should appear on the O)ptions screen
-	in the list for Display Character Set
-    MIMEName
-	The name for this charset in MIME syntax (one word with digits
-	and some other non-letters allowed, should be IANA registered)
-    Default
-	If "Y[es]" or "1", this is the default (fallback) translation table,
-	it will be used for Unicode -> 8bit (or 7bit) translation if no
-	translation is found in the specific table.
-    FallBack
-	Whether to use the default table if no translation is found in
-	this table.  Normally fallback is used, "FallBack NO" or "FallBack 0"
-	disables it (actually, other values than "FallBack Y[es]" or
-	"FallBack 1" disable it).
-
-    RawOrEnc
-	a number which flags some special property (encoding) for this
-	charset [see utf8_uni.tbl for example, see UCDefs.h for details].
-
-    Codepage number (IBM specific)
-	used by OS/2 font-switching code.
-
-c) character translation definitions:
-   they look like
-
-   0x41    U+0041 U+0391 ...
-
-   and are used for "forward" translation (mapping this charset to Unicode)
-   AS WELL AS "back" translation (mapping Unicodes to an 8-bit
-   [incl. 7-bit ASCII] code).
-
-   For the "forward" direction, only the first Unicode is used; for
-   "back" translation, all listed Unicodes are mapped to the byte (i.e.
-   code point) on the left.
-
-   The above example line would tell the chartrans mechanism:
-   "For this charset, code position 65 [hex 0x41] contains Unicode
-    U+0041 (LATIN CAPITAL LETTER A).  For translation of Unicodes to
-    this charset, use byte value 65 [hex 0x41] for U+0041 (LATIN CAPITAL
-    LETTER A) as well as for U+0391 (GREEK CAPITAL LETTER ALPHA)."
-
-  [Note that for bytes in the ASCII range 0x00-0x7F, the forward translations
-   will (probably) not be used by Lynx.  It doesn't hurt to list those,
-   too, for completeness.]
-
-   Some other forms are also accepted:
-
- * Syntax accepted:
- *	<fontpos>	<unicode> <unicode> ...
- *	<fontpos>	<unicode range> <unicode range> ...
- *	<fontpos>	idem
- *	<range>		idem
- *	<range>		<unicode range>
- *
- * where <unicode range> ::= <unicode>-<unicode>
- * and <unicode> ::= U+<h><h><h><h>
- * and <h> ::= <hexadecimal digit>
- *
-  [Note that <fontpos> _without_ targets assumed notdefined,
-  so tables from ftp.unicode.org need no patching.]
-
-
-d) string replacement definitions:
-
-  They look like
-
-  U+00cd:I'
-
-  which would mean "Replace Unicode U+00cd (LATIN CAPITAL LETTER I WITH
-  ACUTE" with the string (consisting of two character) I' (if no other
-  translation is available)."  Please note that replacement definitions
-  in certain charset table will override ones from the Default table.
-
-  Note that everything after the ':' is currently taken VERBATIM, so
-  careful with trailing blanks etc.  Please use <C replace> syntax below
-  when you need trailing spaces.
-
- * Syntax accepted:
- *      <unicode>	:<replace>
- *      <unicode range>	:<replace>
- *      <unicode>	"<C replace>"
- *      <unicode range>	"<C replace>"
- *
- * where <unicode range> ::= <unicode>-<unicode>
- * and <unicode> ::= U+<h><h><h><h>
- * and <h> ::= <hexadecimal digit>
- * and <replace> any string not containing '\n' or '\0', taken verbatim
- * and <C replace> any string, with backslash having the usual C meaning.
-
-Motivation:
-
-- It is an extension of the format already in use for Linux (kernel,
-  kbd package), those files can be used with some minimal editing.
-
-- It is easy to convert Unicode tables for other charsets, as they
-  are commonly found on ftp sites etc., to this format - the right
-  sed command should do 99% of the work.
-
-- The format is independent of details of other parts of the Lynx code,
-  unlike the "old" LYCharsets.c mechanism.  The tables don't have to
-  be changed in synch when e.g., new entities are added to the entities.h.
-
-
-Note: the Default "7bit approximation" table can be used for
-case-insensitive search for non-ascii letters if no upper/lower case
-information provided by other means, e.g., locale.  It is assumed that
-upper/lower case letters have their "7bit approximation" images
-in def7_uni.tbl matched case-insensitively.