diff options
author | Thomas E. Dickey <dickey@invisible-island.net> | 1999-02-08 10:50:02 -0500 |
---|---|---|
committer | Thomas E. Dickey <dickey@invisible-island.net> | 1999-02-08 10:50:02 -0500 |
commit | 8ce6b560f4fb325be3d34266c54c70eb8668e8e1 (patch) | |
tree | d227c501d100ee0c5f1c72601d9ea5a487c1e2ca /samples | |
parent | 87434eaa074d789f65bac589b03df341e76e7a4e (diff) | |
download | lynx-snapshots-8ce6b560f4fb325be3d34266c54c70eb8668e8e1.tar.gz |
snapshot of project "lynx", label v2-8-2dev_16
Diffstat (limited to 'samples')
-rw-r--r-- | samples/cernrules.txt | 226 |
1 files changed, 226 insertions, 0 deletions
diff --git a/samples/cernrules.txt b/samples/cernrules.txt new file mode 100644 index 00000000..97212904 --- /dev/null +++ b/samples/cernrules.txt @@ -0,0 +1,226 @@ +# This files contains examples and an explanation for the RULESFILE / RULE +# feature. +# +# Rules for Lynx are experimental. They provide a rudimentary capability +# for URL rejection and substitution based on string matching. +# Most users and most installations will not need this feature, it is here +# in case you find it useful. Note that this may change or go away in +# future releases of Lynx; if you find it useful, consider describing your +# use of it in a message to <lynx-dev@sig.net>. +# +# Syntax: +# ======= +# As you may have guessed, comments are introduced by a '#' character. +# Rules have the general form +# Operator Operand1 [Operand2] +# with words separated by whitespace. +# +# Recognized operators are +# +# Fail URL1 +# Reject access to this URL, stop processing further rules. +# +# Map URL1 URL2 +# Change the URL to URL2, then continue processing. +# +# Pass URL1 [URL2] +# Accept this URL and stop processing further rules; if URL2 +# is given, apply this as the last mapping. +# +# Rules are processed sequentially first to last, a rule applies +# if the current URL (for the resource the user is trying to access) +# matches URL1. case-sensitive (!) string comparison is used, in addition +# URL1 can contain one '*' which is interpreted as a wildcard matching +# 0 or more characters. So if for example +# "http://example.com/dir/doc.html" is requested, it would matches any of +# the following: +# Pass http:* +# Pass http://example.com/*.html +# Pass http://example.com/* +# Pass http://example* +# Pass http://*/doc.html +# but not: +# Pass http://example/* +# Pass http://Example.COM/dir/doc.html +# Pass http://Example.COM/* +# +# If a URL2 is given and also contains a '*', that character will be +# replaced by whatever matched in URL1. Processing stops with the +# first matching "Fail" or "Pass" or when the end of the rules is reached. +# If the end is reached without a "Fail" or "Pass", the URL is allowed +# (equivalent to a final "Pass *"). +# +# The requested URL will have been transformed to Lynx's normal +# representation. This means that local file resources should be +# expected in the form "file://localhost/<path using slash separators>", +# not in the machine's native representation for filenames. +# +# Anyone with experience configuring the venerable CERN httpd server will +# recognize the syntax - in fact, the code implementing rules goes back +# to a common ancestor. But note the differences: all URLs and URL- +# patterns here have to be given as absolute URLs, even for local files. +# (Absolute URLs don't imply proxying - you cannot control that from here.) +# +# CAVEAT +# ====== +# First, to squash any false expectations, and example for what NOT TO DO. +# It might be expected that a rule like +# Fail file://localhost/etc/passwd # <- DON'T RELY ON THIS +# could be used to prevent access to the file "/etc/passwd". This might +# fool a naive user, but the more sophisticated user could still gain +# access, by experimenting with other forms like (@@@ untested) +# "file://<machine's domain name>/etc/passwd" or "/etc//passwd" +# or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on. +# There are many URL forms for accessing the same resource, and Lynx +# just doesn't guarantee that URLs for the same resource will look the +# same way. +# +# The same reservation applies to any attempts to block access to unwanted +# sites and so on. This isn't the right place for implementing it. +# (Lynx has a number of mechanisms documented elsewhere to restrict access, +# see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.) +# +# Some more useful applications: +# +# 1. Disabling URLs by access scheme +# ---------------------------------- +# Fail gopher:* +# Fail finger:* +# Fail lynxcgi:* +# Fail LYNXIMGMAP:* +# This should work (but no guarantees) because Lynx canonicalizes +# the case of recognized access schemes and does not interpret +# %-escaping in the scheme part (@@@ always?) +# +# Note that for many access schemes Lynx already has mechanisms to +# restrict access (see lynx.cfg, -help, -restrictions, etc.), others +# have to be specifically enabled. Those mechanisms should be used +# in preference. +# Note especially Limitation 1 below. +# This can be used for the remaining cases, or in addition by the +# more paranoid. Note that disabling "file:*" will also make many +# of the special pages generated by lynx as temporary files (INFO, +# history, ...) inaccessible, on the other hand it doesn't prevent +# _writing_ of various temp files - probably not what you want. +# +# You could also direct access for a scheme to a brief text explaining +# why it's not available: +# Map news:* http://localhost/texts/newsserver-is-broken.html +# (That text shouldn't contain any relative links, they would be +# broken.) +# +# 2. Preventing accidental access +# ------------------------------- +# If there is a page or site you don't want to access for whatever +# reason (say there's a link to it that crashes Lynx [don't forget to +# report a bug], or it that starts sending you a 5 Mb file you don't +# want, or you just don't like the people...), you can prevent yourself +# from accidentally accessing it: +# Fail http://bad.site.com/* +# +# 3. Compressed files +# ------------------- +# You have downloaded a bunch of HTML documents, and compressed them +# to save space. Then you discover that links between the files don't +# work, because they all use the names of the uncompressed files. The +# following kind of rule will alow you to navigate, invisibly accessing +# the compressed files: +# Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz +# +# 4. Use local copies +# ------------------- +# You have downloaded a tree of HTML documents, but there are many links +# between them that still point to the remote location. You want to access +# the local copies instead, after all that's why you downloaded them. You +# could start editing the HTML, but the following might be simpler: +# Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html +# Or even combine this with compressing the files: +# Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz +# +# 5. Broken links etc. +# -------------------- +# A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john, +# or http://www.provider.com/company/ has moved to their own server +# http://www.company.com, but there are still links to the old location +# all over the place; they now are broken or lead to a stupid "this page +# has moved, please update your bookmarks. Refresh in 5 seconds" page +# which you're tired of seeing. This will not fix your bookmarks, and +# it will let you see the outdated URLs for longer (Limitation 3 below), +# but for a quick fix: +# Map http://www.siteA.com/~jdoe/* http://siteB.org/john/* +# Map http://www.provider.com/company/* http://www.company.com/* +# But note that you are likely to create invalid links if no all documents +# from a site are mapped (Limitation 3). +# +# 6. DNS troubles +# --------------- +# A special case of broken links. If a site is inaccessible because the +# name cannot be resolved (your or their name server is broken, or the +# name registry once again made a mistake, or they really didn't pay in +# time...) but you still somehow know the address; or if name lookups are +# just too slow: +# Map http://www.somesite.com/* http://10.1.2.3/* +# (You could do the equivalent more cleanly by adding an entry to the hosts +# file, if you have access to it.) +# +# Or, if a name resolves to several addresses of which one is down, and the +# DNS hasn't caught up: +# Map http://www.w3.org/* http://www12.w3.org/* +# +# Note that this can break access to some name-based virtually hosted sites. + + +# Limitations +# =========== +# First, see CAVEAT above. There are other limitations: +# +# 1. Applicable URL schemes +# ------------------------- +# Rules processing does not apply to all URL schemes. Some are +# handled differently from the generic access code, therefore rules +# for such URLs will never bee "seen". This limitation applies at +# least to lynxexec:, lynxprog:, mailto:, and LYNXHIST: URLs. +# +# Also, a scheme has to be known to Lynx in order to get as far as +# applying rules - you cannot just define your own new foobar: scheme +# and then map it to something here. +# +# 2. No re-checking +# ----------------- +# When a URL is mapped to a different one, the new URL is not checked +# again for compliance with most restrictions established by -anonymous, +# -restrictions, lynx.cfg and so on. This can be regarded as a feature: +# it allows specific exceptions. Of course it means that users for +# whom any restrictions must be enforced cannot have write access to a +# personal rules file, but that should be obvious anyway! +# +# 3. Mappings are invisible +# ------------------------- +# Changing the URL with "Map" or "Pass" rules will in general not be +# visible to the user, because it happens at a late stage of processing +# a request (similar to directing a request through a proxy). One +# can think of two kinds of URL for every resource: a "Document URL" as +# the user sees it (on INFO page, history list, status line, etc.), and +# a "physical URL" used for the actual access. Rules change only the +# physical URL. This is different from the effect of HTTP redirection. +# Often this is bad, sometimes it may be desirable. +# +# Changing the URL can create broken links if a document has relative URLs, +# since they are taken to be relative to the "Document URL" (if no BASE tag +# is present) when the HTML is parsed. +# +# 4. Interaction with proxying +# ---------------------------- +# Rules processing is done after most other access checks, but before +# proxy (and gateway) settings are examined. A "Fail" rule works +# as expected, but when the URL has been mapped to a different one, +# the subsequent proxy checking can get confused. If it decides that +# access is through a proxy or gateway, it will generally use the +# original URL to construct the "physical" URL, effectively overriding +# the mapping rules. If the mapping is to a different access scheme +# or hostname, proxy checking could also be fooled to use a proxy when +# it shouldn't, to not use one when it should, or (if different proxies +# are used for different schemes) to use the wrong proxy. So "just +# don't do that"; in some cases setting the no_proxy variable will help. +# Example 3 happens to work nicely if there is a http_proxy but no +# ftp_proxy. |