Lynx CHARTRANS
Features (in addition to those which Lynx already has):
- Can (attempt to) translate from any document charset to any display
character set, *IF* the document charset is known by a translation
table (compiled in at installation).
- Old method for specifying translations of Latin1 characters and
SGML entities still supported. (IBMPC-charsets.announce is still
relevant.)
- New method to define character sets: used for input charset as well
as display character set, translation tables compiled in from
separate files (one per charset).
- Unicode (UTF8) support: can (attempt to) decode and translate UTF8 to
display character set, or pass through UTF to display (if terminal
or console understands UTF8). [raw display of UTF only tested with Slang
so far, does not always position everything correctly on screen]
- Support for CHARSET attribute on A tag [but not yet on LINK], as in
HTML i18n RFC 2070. A link can suggest the target's charset in this way.
- EXPERIMENTAL, currently enabled only for Linux console:
can (attempt to) automatically switch terminal mode and load new
code pages on change of display character set.
- some minor changes: sometimes invalid characters are displayed in a hex
notation Uxxxx (helps debugging, but I also regard it as at least not
worse than showing the wrong char without warning).
Additions/changes to user interface:
- many new Display Character Sets are available on O)ptions screen.
(also can now use arrow keys, HOME, END etc. for cycling through the list).
- new command line flags:
-assume_charset=... assume this as charset for documents that don't
specify a charset parameter in HTTP headers
-assume_unknown_charset=... in case a charset parameter is not recognized
-assume_local_charset=... assume this as charset of local file: docs
also available as ASSUME_CHARSET etc. in lynx.cfg
- The "Raw" toggle (from -raw flag, '@' key, or Options screen)
o should work as before for CJK charsets,
o otherwise toggles the assumption "Default remote charset is same
as Display Character Set" on or off.
Toggling of the assumed charset is between Display Character Set and
the specified ASSUME_CHARSET or, if they are the same, between the
specified ASSUME_CHARSET and ISO-8859-1.
o The default for raw mode now depends on the Display Character Set as
well as on the specified ASSUME_CHARSET value.
(Try the "Transparent" Display Character Set for more "rawness".)
Requirements: same as for Lynx in general :)
The chartrans code is now merged with Wayne Buttle's changes for
32-bit MS Windows and DOS/DJGPP, with Thomas Dickey's and Jim Spath's
emerging auto-configure mechanism, and with BUGFIXES from Foteos
Macrides. See the accompanying file CHANGES.few for the current
status.
A WARNING BEFORE YOU PROCEDE:
This is experimental. It works already nicely for me, but I have only
tested it on Linux, compiled with Slang. In some cases undisplayable bytes
may get sent to the terminal which are then interpreted as control chars.
Other usual warnings about alpha software apply...
HOW TO GET SOURCES:
The location of files mentioned below is currently at
but it is probably easier to get a full package of the development code
from
(different formats there, check it out.)
Check both locations to find the newest version.
There are three alternatives:
( Replace the * in the followin filenames with the appropriate subversion )
(1.)
Provided as a full Lynx distribution (for now..) - in zip format.
This contains the complete Lynx source package.
Just get lynx-2.7.1ac-*.zip.
(2.)
Provided as add-on over the Lynx 2.7.1 distribution, i.e. all files that
are new or have changed - in zip format.
(a) Get the official Lynx 2.7.1 distribution - see list of download sites at
.
(b) Get add-to-offi-2.7.1ac-*.zip.
(c) Unpack the official 2.7.1 package, then unzip add-to-offi-2.7.1ac-*.zip
over that directory tree.
(3.)
Provided in two files (for minimal download time) to install over the
Lynx 2.7.1 distribution - needs gunzip, tar, and patch.
(a) Get the official Lynx 2.7.1 distribution - see list of download sites at
.
(b) Get lynx-newfiles-2.7.1ac-*.tar.gz.
(c) Get lynx-patch-2.7.1ac-*.pch.gz.
(d) Unpack the official 2.7.1 package, then unpack
lynx-newfiles-2.7.1ac-*.tar.gz over that directory tree.
(d) Apply patches from lynx-patch-2.7.1ac-*.pch.
HOW TO INSTALL:
(4) before compiling:
Check top level makefile or Makefile and userdefs.h as usual.
NOTE that there is a new "#define" in userdefs.h for MAX_CHARSETS
near the end (in "Section 3.").
NOTE that in the top-level Makefile, the -DEXP_CHARTRANS must be
in *both* SITE_DEFS *and* SITE_LYDEFS.
(5) Building Lynx:
If you are compiling for VMS you have to figure out for yourself
how to modify the procedure, sorry.
What's supposed to happen (in addition to the usual things when
building Lynx): in the new subdirectory src/chrtrans, make should
first compile the auxiliary program `makeuctb', then invoke that
program to create xxxxx_yyy.h files from the provided xxxxx.yyy
translation table files. (See README.* files in src/chrtrans for
more info.)
If all goes well, just invoking make from the top-level Lynx dir
as usual should do everything automatically. If not, the makefiles
may need some tweaking... or:
(6) Some things to look at if compilation fails:
In src/chrtrans/UCkd.h there is a typedef for an unsigned 16bit
numeric type which may need to be changed for your system.
See comment near top there.
For recompiling Lynx, `make clean' should not be necessary if only
files in src/chrtrans have been changed. On the other hand
may not propagate to the src/chrtrans directory (depending how things
are going with auto-config), you may have to cd to that directory
and `make clean' there to really clean up there.
(7) To customize (add/change translation tables etc.):
See README.* files in src/chrtrans.
Make the necessary changes there, then recompile.
(A general `make clean' should not be necessary, but make sure
the ...uni.h file in src/chrtrans gets regenerated.)
Note that definition of new character entities (if e.g. you want
Lynx to recognize Ž) are not covered by these table files,
they have to be listed in HTMLDTD.c.
_If you are on a Linux system_ and using Lynx on the console (i.e.
not xterm, not a dialup *into* the Linux box), you can compile
with -DEXP_CHARTRANS_AUTOSWITCH. This is very useful for testing
the various Display Character Sets, Lynx will try to automatically
chage the console state. You need to have the Linux kbd package
installed, with a working `setfont' command executable by the user,
and the right font files - check the source in src/UCAuto.c for
the files used and/or to change them!
NOTE that with this enabled,
- Lynx currently will not clean up the console state at exit,
it will probably left like the last Display Character Set you used.
- Loading a font is global across _all_ virtual text consoles, so
using Lynx (compiled with this flag) may change the appearance of
text on other consoles (if that text contains characters
beyond ASCII).
(8) Some suggested Web pages for testing:
,
especially
.
(9) Please report bugs, unexpected behavior, etc.
to or .
Suggestions for improvement would be welcome, as well as
contributed translation tables (for stuff that is not available
at ftp://dkuug.dk or ftp://unicode.org).
KW 1996-05-03