snapshot of project "lynx", label v2-8-2dev_16

author: Thomas E. Dickey <dickey@invisible-island.net> 1999-02-08 10:50:02 -0500
committer: Thomas E. Dickey <dickey@invisible-island.net> 1999-02-08 10:50:02 -0500
commit: 8ce6b560f4fb325be3d34266c54c70eb8668e8e1 (patch)
tree: d227c501d100ee0c5f1c72601d9ea5a487c1e2ca /samples
parent: 87434eaa074d789f65bac589b03df341e76e7a4e (diff)
download: lynx-snapshots-8ce6b560f4fb325be3d34266c54c70eb8668e8e1.tar.gz
1 files changed, 226 insertions, 0 deletions
diff --git a/samples/cernrules.txt b/samples/cernrules.txt
new file mode 100644
index 00000000..97212904
--- /dev/null
+++ b/samples/cernrules.txt
@@ -0,0 +1,226 @@
+# This files contains examples and an explanation for the RULESFILE / RULE
+# feature.
+#
+# Rules for Lynx are experimental.  They provide a rudimentary capability
+# for URL rejection and substitution based on string matching.
+# Most users and most installations will not need this feature, it is here
+# in case you find it useful.  Note that this may change or go away in
+# future releases of Lynx; if you find it useful, consider describing your
+# use of it in a message to <lynx-dev@sig.net>.
+#
+# Syntax:
+# =======
+# As you may have guessed, comments are introduced by a '#' character.
+# Rules have the general form
+#   Operator  Operand1  [Operand2]
+# with words separated by whitespace.
+#
+# Recognized operators are
+#
+#   Fail  URL1
+# Reject access to this URL, stop processing further rules.
+#
+#   Map   URL1  URL2
+# Change the URL to URL2, then continue processing.
+#
+#   Pass  URL1  [URL2]
+# Accept this URL and stop processing further rules; if URL2
+# is given, apply this as the last mapping.
+#
+# Rules are processed sequentially first to last, a rule applies
+# if the current URL (for the resource the user is trying to access)
+# matches URL1.  case-sensitive (!) string comparison is used, in addition
+# URL1 can contain one '*' which is interpreted as a wildcard matching
+# 0 or more characters.  So if for example
+# "http://example.com/dir/doc.html" is requested, it would matches any of
+# the following:
+#   Pass  http:*
+#   Pass  http://example.com/*.html
+#   Pass  http://example.com/*
+#   Pass  http://example*
+#   Pass  http://*/doc.html
+# but not:
+#   Pass  http://example/*
+#   Pass  http://Example.COM/dir/doc.html
+#   Pass  http://Example.COM/*
+#
+# If a URL2 is given and also contains a '*', that character will be
+# replaced by whatever matched in URL1.  Processing stops with the
+# first matching "Fail" or "Pass" or when the end of the rules is reached.
+# If the end is reached without a "Fail" or "Pass", the URL is allowed
+# (equivalent to a final "Pass *").
+#
+# The requested URL will have been transformed to Lynx's normal
+# representation.  This means that local file resources should be
+# expected in the form "file://localhost/<path using slash separators>",
+# not in the machine's native representation for filenames.
+#
+# Anyone with experience configuring the venerable CERN httpd server will
+# recognize the syntax - in fact, the code implementing rules goes back
+# to a common ancestor.  But note the differences: all URLs and URL-
+# patterns here have to be given as absolute URLs, even for local files.
+# (Absolute URLs don't imply proxying - you cannot control that from here.)
+#
+# CAVEAT
+# ======
+# First, to squash any false expectations, and example for what NOT TO DO.
+# It might be expected that a rule like
+#   Fail  file://localhost/etc/passwd		# <- DON'T RELY ON THIS
+# could be used to prevent access to the file "/etc/passwd".  This might
+# fool a naive user, but the more sophisticated user could still gain
+# access, by experimenting with other forms like (@@@ untested)
+# "file://<machine's domain name>/etc/passwd" or "/etc//passwd"
+# or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on.
+# There are many URL forms for accessing the same resource, and Lynx
+# just doesn't guarantee that URLs for the same resource will look the
+# same way.
+#
+# The same reservation applies to any attempts to block access to unwanted
+# sites and so on.  This isn't the right place for implementing it.
+# (Lynx has a number of mechanisms documented elsewhere to restrict access,
+# see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.)
+#
+# Some more useful applications:
+#
+# 1. Disabling URLs by access scheme
+# ----------------------------------
+#   Fail  gopher:*
+#   Fail  finger:*
+#   Fail  lynxcgi:*
+#   Fail  LYNXIMGMAP:*
+# This should work (but no guarantees) because Lynx canonicalizes
+# the case of recognized access schemes and does not interpret
+# %-escaping in the scheme part (@@@ always?)
+#
+# Note that for many access schemes Lynx already has mechanisms to
+# restrict access (see lynx.cfg, -help, -restrictions, etc.), others
+# have to be specifically enabled.  Those mechanisms should be used
+# in preference.
+# Note especially Limitation 1 below.
+# This can be used for the remaining cases, or in addition by the
+# more paranoid.  Note that disabling "file:*" will also make many
+# of the special pages generated by lynx as temporary files (INFO,
+# history, ...) inaccessible, on the other hand it doesn't prevent
+# _writing_ of various temp files - probably not what you want.
+#
+# You could also direct access for a scheme to a brief text explaining
+# why it's not available:
+#   Map news:*   http://localhost/texts/newsserver-is-broken.html
+# (That text shouldn't contain any relative links, they would be
+# broken.)
+#
+# 2. Preventing accidental access
+# -------------------------------
+# If there is a page or site you don't want to access for whatever
+# reason (say there's a link to it that crashes Lynx [don't forget to
+# report a bug], or it that starts sending you a 5 Mb file you don't
+# want, or you just don't like the people...), you can prevent yourself
+# from accidentally accessing it:
+#    Fail  http://bad.site.com/*
+#
+# 3. Compressed files
+# -------------------
+# You have downloaded a bunch of HTML documents, and compressed them
+# to save space.  Then you discover that links between the files don't
+# work, because they all use the names of the uncompressed files.  The
+# following kind of rule will alow you to navigate, invisibly accessing
+# the compressed files:
+#   Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz
+#
+# 4. Use local copies
+# -------------------
+# You have downloaded a tree of HTML documents, but there are many links
+# between them that still point to the remote location.  You want to access
+# the local copies instead, after all that's why you downloaded them.  You
+# could start editing the HTML, but the following might be simpler:
+#  Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html
+# Or even combine this with compressing the files:
+#  Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz
+#
+# 5. Broken links etc.
+# --------------------
+# A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john,
+# or http://www.provider.com/company/ has moved to their own server
+# http://www.company.com, but there are still links to the old location
+# all over the place; they now are broken or lead to a stupid "this page
+# has moved, please update your bookmarks. Refresh in 5 seconds" page
+# which you're tired of seeing.  This will not fix your bookmarks, and
+# it will let you see the outdated URLs for longer (Limitation 3 below),
+# but for a quick fix:
+#   Map   http://www.siteA.com/~jdoe/*      http://siteB.org/john/*
+#   Map   http://www.provider.com/company/* http://www.company.com/*
+# But note that you are likely to create invalid links if no all documents
+# from a site are mapped (Limitation 3).
+#
+# 6. DNS troubles
+# ---------------
+# A special case of broken links.  If a site is inaccessible because the
+# name cannot be resolved (your or their name server is broken, or the
+# name registry once again made a mistake, or they really didn't pay in
+# time...) but you still somehow know the address; or if name lookups are
+# just too slow:
+#   Map   http://www.somesite.com/*  http://10.1.2.3/*
+# (You could do the equivalent more cleanly by adding an entry to the hosts
+# file, if you have access to it.)
+#
+# Or, if a name resolves to several addresses of which one is down, and the
+# DNS hasn't caught up:
+#   Map   http://www.w3.org/*    http://www12.w3.org/*
+#
+# Note that this can break access to some name-based virtually hosted sites.
+
+
+# Limitations
+# ===========
+# First, see CAVEAT above.  There are other limitations:
+#
+# 1. Applicable URL schemes
+# -------------------------
+# Rules processing does not apply to all URL schemes.  Some are
+# handled differently from the generic access code, therefore rules
+# for such URLs will never bee "seen".  This limitation applies at
+# least to lynxexec:, lynxprog:, mailto:, and LYNXHIST: URLs.
+#
+# Also, a scheme has to be known to Lynx in order to get as far as
+# applying rules - you cannot just define your own new foobar: scheme
+# and then map it to something here.
+#
+# 2. No re-checking
+# -----------------
+# When a URL is mapped to a different one, the new URL is not checked
+# again for compliance with most restrictions established by -anonymous,
+# -restrictions, lynx.cfg and so on.  This can be regarded as a feature:
+# it allows specific exceptions.  Of course it means that users for
+# whom any restrictions must be enforced cannot have write access to a
+# personal rules file, but that should be obvious anyway!
+#
+# 3. Mappings are invisible
+# -------------------------
+# Changing the URL with "Map" or "Pass" rules will in general not be
+# visible to the user, because it happens at a late stage of processing
+# a request (similar to directing a request through a proxy).  One
+# can think of two kinds of URL for every resource: a "Document URL" as
+# the user sees it (on INFO page, history list, status line, etc.), and
+# a "physical URL" used for the actual access.  Rules change only the
+# physical URL.  This is different from the effect of HTTP redirection.
+# Often this is bad, sometimes it may be desirable.
+#
+# Changing the URL can create broken links if a document has relative URLs,
+# since they are taken to be relative to the "Document URL" (if no BASE tag
+# is present) when the HTML is parsed.
+#
+# 4. Interaction with proxying
+# ----------------------------
+# Rules processing is done after most other access checks, but before
+# proxy (and gateway) settings are examined.  A "Fail" rule works
+# as expected, but when the URL has been mapped to a different one,
+# the subsequent proxy checking can get confused.  If it decides that
+# access is through a proxy or gateway, it will generally use the
+# original URL to construct the "physical" URL, effectively overriding
+# the mapping rules.  If the mapping is to a different access scheme
+# or hostname, proxy checking could also be fooled to use a proxy when
+# it shouldn't, to not use one when it should, or (if different proxies
+# are used for different schemes) to use the wrong proxy.  So "just
+# don't do that"; in some cases setting the no_proxy variable will help.
+# Example 3 happens to work nicely if there is a http_proxy but no
+# ftp_proxy.
author	Thomas E. Dickey <dickey@invisible-island.net>	1999-02-08 10:50:02 -0500
committer	Thomas E. Dickey <dickey@invisible-island.net>	1999-02-08 10:50:02 -0500
commit	8ce6b560f4fb325be3d34266c54c70eb8668e8e1 (patch)
tree	d227c501d100ee0c5f1c72601d9ea5a487c1e2ca /samples
parent	87434eaa074d789f65bac589b03df341e76e7a4e (diff)
download	lynx-snapshots-8ce6b560f4fb325be3d34266c54c70eb8668e8e1.tar.gz