samples/cernrules.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226

# This files contains examples and an explanation for the RULESFILE / RULE
# feature.
#
# Rules for Lynx are experimental.  They provide a rudimentary capability
# for URL rejection and substitution based on string matching.
# Most users and most installations will not need this feature, it is here
# in case you find it useful.  Note that this may change or go away in
# future releases of Lynx; if you find it useful, consider describing your
# use of it in a message to <lynx-dev@sig.net>.
#
# Syntax:
# =======
# As you may have guessed, comments are introduced by a '#' character.
# Rules have the general form
#   Operator  Operand1  [Operand2]
# with words separated by whitespace.
#
# Recognized operators are
#
#   Fail  URL1
# Reject access to this URL, stop processing further rules.
#
#   Map   URL1  URL2
# Change the URL to URL2, then continue processing.
#
#   Pass  URL1  [URL2]
# Accept this URL and stop processing further rules; if URL2
# is given, apply this as the last mapping.
#
# Rules are processed sequentially first to last, a rule applies
# if the current URL (for the resource the user is trying to access)
# matches URL1.  case-sensitive (!) string comparison is used, in addition
# URL1 can contain one '*' which is interpreted as a wildcard matching
# 0 or more characters.  So if for example
# "http://example.com/dir/doc.html" is requested, it would matches any of
# the following:
#   Pass  http:*
#   Pass  http://example.com/*.html
#   Pass  http://example.com/*
#   Pass  http://example*
#   Pass  http://*/doc.html
# but not:
#   Pass  http://example/*
#   Pass  http://Example.COM/dir/doc.html
#   Pass  http://Example.COM/*
#
# If a URL2 is given and also contains a '*', that character will be
# replaced by whatever matched in URL1.  Processing stops with the
# first matching "Fail" or "Pass" or when the end of the rules is reached.
# If the end is reached without a "Fail" or "Pass", the URL is allowed
# (equivalent to a final "Pass *").
#
# The requested URL will have been transformed to Lynx's normal
# representation.  This means that local file resources should be
# expected in the form "file://localhost/<path using slash separators>",
# not in the machine's native representation for filenames.
#
# Anyone with experience configuring the venerable CERN httpd server will
# recognize the syntax - in fact, the code implementing rules goes back
# to a common ancestor.  But note the differences: all URLs and URL-
# patterns here have to be given as absolute URLs, even for local files.
# (Absolute URLs don't imply proxying - you cannot control that from here.)
#
# CAVEAT
# ======
# First, to squash any false expectations, and example for what NOT TO DO.
# It might be expected that a rule like
#   Fail  file://localhost/etc/passwd		# <- DON'T RELY ON THIS
# could be used to prevent access to the file "/etc/passwd".  This might
# fool a naive user, but the more sophisticated user could still gain
# access, by experimenting with other forms like (@@@ untested)
# "file://<machine's domain name>/etc/passwd" or "/etc//passwd"
# or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on.
# There are many URL forms for accessing the same resource, and Lynx
# just doesn't guarantee that URLs for the same resource will look the
# same way.
#
# The same reservation applies to any attempts to block access to unwanted
# sites and so on.  This isn't the right place for implementing it.
# (Lynx has a number of mechanisms documented elsewhere to restrict access,
# see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.)
#
# Some more useful applications:
#
# 1. Disabling URLs by access scheme
# ----------------------------------
#   Fail  gopher:*
#   Fail  finger:*
#   Fail  lynxcgi:*
#   Fail  LYNXIMGMAP:*
# This should work (but no guarantees) because Lynx canonicalizes
# the case of recognized access schemes and does not interpret
# %-escaping in the scheme part (@@@ always?)
#
# Note that for many access schemes Lynx already has mechanisms to
# restrict access (see lynx.cfg, -help, -restrictions, etc.), others
# have to be specifically enabled.  Those mechanisms should be used
# in preference.
# Note especially Limitation 1 below.
# This can be used for the remaining cases, or in addition by the
# more paranoid.  Note that disabling "file:*" will also make many
# of the special pages generated by lynx as temporary files (INFO,
# history, ...) inaccessible, on the other hand it doesn't prevent
# _writing_ of various temp files - probably not what you want.
#
# You could also direct access for a scheme to a brief text explaining
# why it's not available:
#   Map news:*   http://localhost/texts/newsserver-is-broken.html
# (That text shouldn't contain any relative links, they would be
# broken.)
#
# 2. Preventing accidental access
# -------------------------------
# If there is a page or site you don't want to access for whatever
# reason (say there's a link to it that crashes Lynx [don't forget to
# report a bug], or it that starts sending you a 5 Mb file you don't
# want, or you just don't like the people...), you can prevent yourself
# from accidentally accessing it:
#    Fail  http://bad.site.com/*
#
# 3. Compressed files
# -------------------
# You have downloaded a bunch of HTML documents, and compressed them
# to save space.  Then you discover that links between the files don't
# work, because they all use the names of the uncompressed files.  The
# following kind of rule will alow you to navigate, invisibly accessing
# the compressed files:
#   Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz
#
# 4. Use local copies
# -------------------
# You have downloaded a tree of HTML documents, but there are many links
# between them that still point to the remote location.  You want to access
# the local copies instead, after all that's why you downloaded them.  You
# could start editing the HTML, but the following might be simpler:
#  Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html
# Or even combine this with compressing the files:
#  Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz
#
# 5. Broken links etc.
# --------------------
# A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john,
# or http://www.provider.com/company/ has moved to their own server
# http://www.company.com, but there are still links to the old location
# all over the place; they now are broken or lead to a stupid "this page
# has moved, please update your bookmarks. Refresh in 5 seconds" page
# which you're tired of seeing.  This will not fix your bookmarks, and
# it will let you see the outdated URLs for longer (Limitation 3 below),
# but for a quick fix:
#   Map   http://www.siteA.com/~jdoe/*      http://siteB.org/john/*
#   Map   http://www.provider.com/company/* http://www.company.com/*
# But note that you are likely to create invalid links if not all documents
# from a site are mapped (Limitation 3).
#
# 6. DNS troubles
# ---------------
# A special case of broken links.  If a site is inaccessible because the
# name cannot be resolved (your or their name server is broken, or the
# name registry once again made a mistake, or they really didn't pay in
# time...) but you still somehow know the address; or if name lookups are
# just too slow:
#   Map   http://www.somesite.com/*  http://10.1.2.3/*
# (You could do the equivalent more cleanly by adding an entry to the hosts
# file, if you have access to it.)
#
# Or, if a name resolves to several addresses of which one is down, and the
# DNS hasn't caught up:
#   Map   http://www.w3.org/*    http://www12.w3.org/*
#
# Note that this can break access to some name-based virtually hosted sites.


# Limitations
# ===========
# First, see CAVEAT above.  There are other limitations:
#
# 1. Applicable URL schemes
# -------------------------
# Rules processing does not apply to all URL schemes.  Some are
# handled differently from the generic access code, therefore rules
# for such URLs will never be "seen".  This limitation applies at
# least to lynxexec:, lynxprog:, mailto:, and LYNXHIST: URLs.
#
# Also, a scheme has to be known to Lynx in order to get as far as
# applying rules - you cannot just define your own new foobar: scheme
# and then map it to something here.
#
# 2. No re-checking
# -----------------
# When a URL is mapped to a different one, the new URL is not checked
# again for compliance with most restrictions established by -anonymous,
# -restrictions, lynx.cfg and so on.  This can be regarded as a feature:
# it allows specific exceptions.  Of course it means that users for
# whom any restrictions must be enforced cannot have write access to a
# personal rules file, but that should be obvious anyway!
#
# 3. Mappings are invisible
# -------------------------
# Changing the URL with "Map" or "Pass" rules will in general not be
# visible to the user, because it happens at a late stage of processing
# a request (similar to directing a request through a proxy).  One
# can think of two kinds of URL for every resource: a "Document URL" as
# the user sees it (on INFO page, history list, status line, etc.), and
# a "physical URL" used for the actual access.  Rules change only the
# physical URL.  This is different from the effect of HTTP redirection.
# Often this is bad, sometimes it may be desirable.
#
# Changing the URL can create broken links if a document has relative URLs,
# since they are taken to be relative to the "Document URL" (if no BASE tag
# is present) when the HTML is parsed.
#
# 4. Interaction with proxying
# ----------------------------
# Rules processing is done after most other access checks, but before
# proxy (and gateway) settings are examined.  A "Fail" rule works
# as expected, but when the URL has been mapped to a different one,
# the subsequent proxy checking can get confused.  If it decides that
# access is through a proxy or gateway, it will generally use the
# original URL to construct the "physical" URL, effectively overriding
# the mapping rules.  If the mapping is to a different access scheme
# or hostname, proxy checking could also be fooled to use a proxy when
# it shouldn't, to not use one when it should, or (if different proxies
# are used for different schemes) to use the wrong proxy.  So "just
# don't do that"; in some cases setting the no_proxy variable will help.
# Example 3 happens to work nicely if there is a http_proxy but no
# ftp_proxy.