diff options
author | Araq <rumpf_a@web.de> | 2011-03-23 01:09:52 +0100 |
---|---|---|
committer | Araq <rumpf_a@web.de> | 2011-03-23 01:09:52 +0100 |
commit | 5b789f2da8e57ea2adf0c088f5e41fd7a71fe89b (patch) | |
tree | 81f2f23d48d56ef7b106d24982231d99809a6def | |
parent | 8d734244b14e09c97267432468ac20fcf8ff82eb (diff) | |
download | Nim-5b789f2da8e57ea2adf0c088f5e41fd7a71fe89b.tar.gz |
bugfixes; field discriminant checks; linearScanEnd, unroll, shallow pragmas
45 files changed, 2341 insertions, 242 deletions
diff --git a/config/nimrod.cfg b/config/nimrod.cfg index f45be780a..56ee74dde 100755 --- a/config/nimrod.cfg +++ b/config/nimrod.cfg @@ -81,7 +81,7 @@ gcc.options.debug = "-g" @end @else: @if not release: - gcc.options.always = "-w -O1" + gcc.options.always = "-w" @else: gcc.options.always = "-w" @end diff --git a/doc/manual.txt b/doc/manual.txt index 975094a56..67c3c43c7 100755 --- a/doc/manual.txt +++ b/doc/manual.txt @@ -2614,6 +2614,26 @@ The `final`:idx: pragma can be used for an object type to specify that it cannot be inherited from. +shallow pragma +-------------- +The `shallow`:idx: pragma affects the semantics of a type: The compiler is +allowed to make a shallow copy. This can cause serious semantic issues and +break memory safety! However, it can speed up assignments considerably, +because the semantics of Nimrod require deep copying of sequences and strings. +This can be expensive, especially if sequences are used to build a tree +structure: + +.. code-block:: nimrod + type + TNodeKind = enum nkLeaf, nkInner + TNode {.final, shallow.} = object + case kind: TNodeKind + of nkLeaf: + strVal: string + of nkInner: + children: seq[TNode] + + Pure pragma ----------- The `pure`:idx: pragma serves two completely different purposes: @@ -2646,53 +2666,53 @@ hint pragma ----------- The `hint`:idx: pragma is used to make the compiler output a hint message with the given content. Compilation continues after the hint. - - -linearScanEnd pragma --------------------- -The `linearScanEnd`:idx: pragma can be used to tell the compiler how to -compile a Nimrod `case`:idx: statement. Syntactially it has to be used as a -statement: - -.. code-block:: nimrod - case myInt - of 0: - echo "most common case" - of 1: - {.linearScanEnd.} - echo "second most common case" - of 2: echo "unlikely: use branch table" - else: echo "unlikely too: use branch table for ", myInt - -In the example, the case branches ``0`` and ``1`` are much more common than -the other cases. Therefore the generated assembler code should test for these -values first, so that the CPU's branch predictor has a good chance to succeed -(avoiding an expensive CPU pipeline stall). The other cases might be put into a -jump table for O(1) overhead, but at the cost of a (very likely) pipeline -stall. - -The ``linearScanEnd`` pragma should be put into the last branch that should be -tested against via linear scanning. If put into the last branch of the -whole ``case`` statement, the whole ``case`` statement uses linear scanning. - - -unroll pragma -------------- -The `unroll`:idx: pragma can be used to tell the compiler that it should unroll -a `for`:idx: or `while`:idx: loop for runtime efficiency: - -.. code-block:: nimrod - proc searchChar(s: string, c: char): int = - for i in 0 .. s.high: - {.unroll: 4.} - if s[i] == c: return i - result = -1 - -In the above example, the search loop is unrolled by a factor 4. The unroll -factor can be left out too; the compiler then chooses an appropriate unroll -factor. - -**Note**: Currently the compiler recognizes but ignores this pragma. + + +linearScanEnd pragma +-------------------- +The `linearScanEnd`:idx: pragma can be used to tell the compiler how to +compile a Nimrod `case`:idx: statement. Syntactially it has to be used as a +statement: + +.. code-block:: nimrod + case myInt + of 0: + echo "most common case" + of 1: + {.linearScanEnd.} + echo "second most common case" + of 2: echo "unlikely: use branch table" + else: echo "unlikely too: use branch table for ", myInt + +In the example, the case branches ``0`` and ``1`` are much more common than +the other cases. Therefore the generated assembler code should test for these +values first, so that the CPU's branch predictor has a good chance to succeed +(avoiding an expensive CPU pipeline stall). The other cases might be put into a +jump table for O(1) overhead, but at the cost of a (very likely) pipeline +stall. + +The ``linearScanEnd`` pragma should be put into the last branch that should be +tested against via linear scanning. If put into the last branch of the +whole ``case`` statement, the whole ``case`` statement uses linear scanning. + + +unroll pragma +------------- +The `unroll`:idx: pragma can be used to tell the compiler that it should unroll +a `for`:idx: or `while`:idx: loop for runtime efficiency: + +.. code-block:: nimrod + proc searchChar(s: string, c: char): int = + for i in 0 .. s.high: + {.unroll: 4.} + if s[i] == c: return i + result = -1 + +In the above example, the search loop is unrolled by a factor 4. The unroll +factor can be left out too; the compiler then chooses an appropriate unroll +factor. + +**Note**: Currently the compiler recognizes but ignores this pragma. compilation option pragmas diff --git a/koch.nim b/koch.nim index 45cd9dae1..4eddb06e4 100755 --- a/koch.nim +++ b/koch.nim @@ -131,28 +131,29 @@ proc findStartNimrod: string = proc safeRemove(filename: string) = if existsFile(filename): removeFile(filename) -proc bootIteration(args: string): bool = - var nimrod1 = "rod" / "nimrod1".exe - safeRemove(nimrod1) - moveFile(dest=nimrod1, source="rod" / "nimrod".exe) - exec "rod" / "nimrod1 cc $# $# rod/nimrod.nim" % [bootOptions, args] - # Nimrod does not produce an executable again if nothing changed. That's ok: - result = sameFileContent("rod" / "nimrod".exe, nimrod1) - var dest = "bin" / "nimrod".exe +proc thVersion(i: int): string = + result = ("rod" / "nimrod" & $i).exe + +proc copyExe(source, dest: string) = safeRemove(dest) - copyFile(dest=dest, source="rod" / "nimrod".exe) + copyFile(dest=dest, source=source) inclFilePermissions(dest, {fpUserExec}) - safeRemove(nimrod1) - if result: echo "executables are equal: SUCCESS!" - + proc boot(args: string) = - echo "iteration: 1" - exec findStartNimrod() & " cc $# $# rod" / "nimrod.nim" % [bootOptions, args] - echo "iteration: 2" - if not bootIteration(args): - echo "executables are not equal: compile once again..." - if not bootIteration(args): - echo "[Warning] executables are still not equal" + var output = "rod" / "nimrod".exe + var finalDest = "bin" / "nimrod".exe + + copyExe(findStartNimrod(), 0.thVersion) + for i in 0..2: + echo "iteration: ", i+1 + exec i.thVersion & " cc $# $# rod" / "nimrod.nim" % [bootOptions, args] + if sameFileContent(output, i.thVersion): + copyExe(output, finalDest) + echo "executables are equal: SUCCESS!" + return + copyExe(output, (i+1).thVersion) + copyExe(output, finalDest) + echo "[Warning] executables are still not equal" # -------------- clean -------------------------------------------------------- diff --git a/lib/impure/rdstdin.nim b/lib/impure/rdstdin.nim index 003cfa3d1..003cfa3d1 100644..100755 --- a/lib/impure/rdstdin.nim +++ b/lib/impure/rdstdin.nim diff --git a/lib/pure/algorithm.nim b/lib/pure/algorithm.nim index 517819e1c..517819e1c 100644..100755 --- a/lib/pure/algorithm.nim +++ b/lib/pure/algorithm.nim diff --git a/lib/pure/gentabs.nim b/lib/pure/gentabs.nim new file mode 100755 index 000000000..c57a77aed --- /dev/null +++ b/lib/pure/gentabs.nim @@ -0,0 +1,192 @@ +# +# +# Nimrod's Runtime Library +# (c) Copyright 2010 Andreas Rumpf +# +# See the file "copying.txt", included in this +# distribution, for details about the copyright. +# + +## The ``gentabs`` module implements an efficient hash table that is a +## key-value mapping. The keys are required to be strings, but the values +## may be any Nimrod or user defined type. This module supports matching +## of keys in case-sensitive, case-insensitive and style-insensitive modes. + +import + os, hashes, strutils + +type + TGenTableMode* = enum ## describes the table's key matching mode + modeCaseSensitive, ## case sensitive matching of keys + modeCaseInsensitive, ## case insensitive matching of keys + modeStyleInsensitive ## style sensitive matching of keys + + TGenKeyValuePair[T] = tuple[key: string, val: T] + TGenKeyValuePairSeq[T] = seq[TGenKeyValuePair[T]] + TGenTable*[T] = object of TObject + counter: int + data: TGenKeyValuePairSeq[T] + mode: TGenTableMode + + PGenTable*[T] = ref TGenTable[T] ## use this type to declare hash tables + + +const + growthFactor = 2 + startSize = 64 + + +proc len*[T](tbl: PGenTable[T]): int {.inline.} = + ## returns the number of keys in `tbl`. + result = tbl.counter + +iterator pairs*[T](tbl: PGenTable[T]): tuple[key: string, value: T] = + ## iterates over any (key, value) pair in the table `tbl`. + for h in 0..high(tbl.data): + if not isNil(tbl.data[h].key): + yield (tbl.data[h].key, tbl.data[h].val) + +proc myhash[T](tbl: PGenTable[T], key: string): THash = + case tbl.mode + of modeCaseSensitive: result = hashes.hash(key) + of modeCaseInsensitive: result = hashes.hashIgnoreCase(key) + of modeStyleInsensitive: result = hashes.hashIgnoreStyle(key) + +proc myCmp[T](tbl: PGenTable[T], a, b: string): bool = + case tbl.mode + of modeCaseSensitive: result = cmp(a, b) == 0 + of modeCaseInsensitive: result = cmpIgnoreCase(a, b) == 0 + of modeStyleInsensitive: result = cmpIgnoreStyle(a, b) == 0 + +proc mustRehash(length, counter: int): bool = + assert(length > counter) + result = (length * 2 < counter * 3) or (length - counter < 4) + +proc newGenTable*[T](mode: TGenTableMode): PGenTable[T] = + ## creates a new generic hash table that is empty. + new(result) + result.mode = mode + result.counter = 0 + newSeq(result.data, startSize) + +proc nextTry(h, maxHash: THash): THash {.inline.} = + result = ((5 * h) + 1) and maxHash + +proc RawGet[T](tbl: PGenTable[T], key: string): int = + var h: THash + h = myhash(tbl, key) and high(tbl.data) # start with real hash value + while not isNil(tbl.data[h].key): + if mycmp(tbl, tbl.data[h].key, key): + return h + h = nextTry(h, high(tbl.data)) + result = - 1 + +proc RawInsert[T](tbl: PGenTable[T], data: var TGenKeyValuePairSeq[T], + key: string, val: T) = + var h: THash + h = myhash(tbl, key) and high(data) + while not isNil(data[h].key): + h = nextTry(h, high(data)) + data[h].key = key + data[h].val = val + +proc Enlarge[T](tbl: PGenTable[T]) = + var n: TGenKeyValuePairSeq[T] + newSeq(n, len(tbl.data) * growthFactor) + for i in countup(0, high(tbl.data)): + if not isNil(tbl.data[i].key): + RawInsert[T](tbl, n, tbl.data[i].key, tbl.data[i].val) + swap(tbl.data, n) + +proc hasKey*[T](tbl: PGenTable[T], key: string): bool = + ## returns true iff `key` is in the table `tbl`. + result = rawGet(tbl, key) >= 0 + +proc `[]`*[T](tbl: PGenTable[T], key: string): T = + ## retrieves the value at ``tbl[key]``. If `key` is not in `tbl`, + ## "" is returned and no exception is raised. One can check + ## with ``hasKey`` whether the key exists. + var index: int + index = RawGet(tbl, key) + if index >= 0: result = tbl.data[index].val + #else: result = "" ### Not sure what to do here + +proc `[]=`*[T](tbl: PGenTable[T], key: string, val: T) = + ## puts a (key, value)-pair into `tbl`. + var index = RawGet(tbl, key) + if index >= 0: + tbl.data[index].val = val + else: + if mustRehash(len(tbl.data), tbl.counter): Enlarge(tbl) + RawInsert(tbl, tbl.data, key, val) + inc(tbl.counter) + + +when isMainModule: + # + # Verify tables of integer values (string keys) + # + var x = newGenTable[int](modeCaseInsensitive) + x["one"] = 1 + x["two"] = 2 + x["three"] = 3 + x["four"] = 4 + x["five"] = 5 + assert(len(x) == 5) # length procedure works + assert(x["one"] == 1) # case-sensitive lookup works + assert(x["ONE"] == 1) # case-insensitive should work for this table + assert(x["one"]+x["two"] == 3) # make sure we're getting back ints + assert(x.hasKey("one")) # hasKey should return 'true' for a key + # of "one"... + assert(not x.hasKey("NOPE")) # ...but key "NOPE" is not in the table. + for k,v in pairs(x): # make sure the 'pairs' iterator works + assert(x[k]==v) + echo() + + + # + # Verify a table of user-defined types + # + type + TMyType = tuple[first, second: string] # a pair of strings + + var y = newGenTable[TMyType](modeCaseInsensitive) # hash table where each + # value is TMyType tuple + + #var junk: TMyType = ("OK", "Here") + y["Hello"] = ("Hello", "World") + y["Goodbye"] = ("Goodbye", "Everyone") + #y["Hello"] = TMyType( ("Hello", "World") ) + #y["Goodbye"] = TMyType( ("Goodbye", "Everyone") ) + + assert( y["Hello"].first == "Hello" ) + assert( y["Hello"].second == "World" ) + + + # + # Verify table of tables + # + var z: PGenTable[ PGenTable[int] ] # hash table where each value is + # a hash table of ints + + z = newGenTable[PGenTable[int]](modeCaseInsensitive) + z["first"] = newGenTable[int](modeCaseInsensitive) + z["first"]["one"] = 1 + z["first"]["two"] = 2 + z["first"]["three"] = 3 + + z["second"] = newGenTable[int](modeCaseInsensitive) + z["second"]["red"] = 10 + z["second"]["blue"] = 20 + + assert(len(z) == 2) # length of outer table + assert(len(z["first"]) == 3) # length of "first" table + assert(len(z["second"]) == 2) # length of "second" table + assert( z["first"]["one"] == 1) # retrieve from first inner table + assert( z["second"]["red"] == 10) # retrieve from second inner table + + for k,v in pairs(z): + echo( "$# ($#) ->" % [k,$len(v)] ) + #for k2,v2 in pairs(v): + # echo( " $# <-> $#" % [k2,$v2] ) + echo() diff --git a/lib/pure/os.nim b/lib/pure/os.nim index 2d234473b..992b34c49 100755 --- a/lib/pure/os.nim +++ b/lib/pure/os.nim @@ -708,7 +708,7 @@ else: # environ is needed, the _NSGetEnviron() routine, defined in # <crt_externs.h>, can be used to retrieve the address of environ # at runtime. - proc NSGetEnviron(): cstringArray {. + proc NSGetEnviron(): ptr cstringArray {. importc: "_NSGetEnviron", header: "<crt_externs.h>".} else: var gEnv {.importc: "environ".}: cstringArray @@ -717,7 +717,7 @@ else: # retrieves the variables of char** env of C's main proc if not envComputed: when defined(macosx): - var gEnv = NSGetEnviron() + var gEnv = NSGetEnviron()^ var i = 0 while True: if gEnv[i] == nil: break diff --git a/lib/pure/pegs.nim b/lib/pure/pegs.nim index 334f5dcd3..6ad52c9a1 100755 --- a/lib/pure/pegs.nim +++ b/lib/pure/pegs.nim @@ -75,7 +75,7 @@ type col: int ## column the symbol has been declared/used in flags: set[TNonTerminalFlag] ## the nonterminal's flags rule: TNode ## the rule that the symbol refers to - TNode {.final.} = object + TNode {.final, shallow.} = object case kind: TPegKind of pkEmpty..pkWhitespace: nil of pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle: term: string @@ -128,10 +128,12 @@ proc add(d: var TPeg, s: TPeg) {.inline.} = add(d.sons, s) proc addChoice(dest: var TPeg, elem: TPeg) = var L = dest.len-1 if L >= 0 and dest.sons[L].kind == pkCharChoice: + # caution! Do not introduce false aliasing here! case elem.kind of pkCharChoice: - dest.sons[L].charChoice^ = dest.sons[L].charChoice^ + elem.charChoice^ - of pkChar: incl(dest.sons[L].charChoice^, elem.ch) + dest.sons[L] = charSet(dest.sons[L].charChoice^ + elem.charChoice^) + of pkChar: + dest.sons[L] = charSet(dest.sons[L].charChoice^ + {elem.ch}) else: add(dest, elem) else: add(dest, elem) @@ -155,9 +157,12 @@ proc `/`*(a: openArray[TPeg]): TPeg {. proc addSequence(dest: var TPeg, elem: TPeg) = var L = dest.len-1 if L >= 0 and dest.sons[L].kind == pkTerminal: + # caution! Do not introduce false aliasing here! case elem.kind - of pkTerminal: add(dest.sons[L].term, elem.term) - of pkChar: add(dest.sons[L].term, elem.ch) + of pkTerminal: + dest.sons[L] = term(dest.sons[L].term & elem.term) + of pkChar: + dest.sons[L] = term(dest.sons[L].term & elem.ch) else: add(dest, elem) else: add(dest, elem) diff --git a/lib/system/assign.nim b/lib/system/assign.nim index 24d688ca9..9ac00434e 100755 --- a/lib/system/assign.nim +++ b/lib/system/assign.nim @@ -1,14 +1,14 @@ # # # Nimrod's Runtime Library -# (c) Copyright 2009 Andreas Rumpf +# (c) Copyright 2011 Andreas Rumpf # # See the file "copying.txt", included in this # distribution, for details about the copyright. # -proc genericAssignAux(dest, src: Pointer, mt: PNimType) -proc genericAssignAux(dest, src: Pointer, n: ptr TNimNode) = +proc genericAssignAux(dest, src: Pointer, mt: PNimType, shallow: bool) +proc genericAssignAux(dest, src: Pointer, n: ptr TNimNode, shallow: bool) = var d = cast[TAddress](dest) s = cast[TAddress](src) @@ -16,66 +16,69 @@ proc genericAssignAux(dest, src: Pointer, n: ptr TNimNode) = of nkNone: assert(false) of nkSlot: genericAssignAux(cast[pointer](d +% n.offset), cast[pointer](s +% n.offset), - n.typ) + n.typ, shallow) of nkList: for i in 0..n.len-1: - genericAssignAux(dest, src, n.sons[i]) + genericAssignAux(dest, src, n.sons[i], shallow) of nkCase: copyMem(cast[pointer](d +% n.offset), cast[pointer](s +% n.offset), n.typ.size) var m = selectBranch(src, n) - if m != nil: genericAssignAux(dest, src, m) + if m != nil: genericAssignAux(dest, src, m, shallow) -proc genericAssignAux(dest, src: Pointer, mt: PNimType) = +proc genericAssignAux(dest, src: Pointer, mt: PNimType, shallow: bool) = var d = cast[TAddress](dest) s = cast[TAddress](src) - assert(mt != nil) case mt.Kind + of tyString: + var x = cast[ppointer](dest) + var s2 = cast[ppointer](s)^ + if s2 == nil or shallow: + unsureAsgnRef(x, s2) + else: + unsureAsgnRef(x, copyString(cast[NimString](s2))) of tySequence: var s2 = cast[ppointer](src)^ - var seq = cast[PGenericSeq](s2) - if s2 == nil: # this can happen! nil sequences are allowed - var x = cast[ppointer](dest) - x^ = nil + var seq = cast[PGenericSeq](s2) + var x = cast[ppointer](dest) + if s2 == nil or shallow: + # this can happen! nil sequences are allowed + unsureAsgnRef(x, s2) return assert(dest != nil) - unsureAsgnRef(cast[ppointer](dest), - newObj(mt, seq.len * mt.base.size + GenericSeqSize)) + unsureAsgnRef(x, newObj(mt, seq.len * mt.base.size + GenericSeqSize)) var dst = cast[taddress](cast[ppointer](dest)^) for i in 0..seq.len-1: genericAssignAux( cast[pointer](dst +% i*% mt.base.size +% GenericSeqSize), cast[pointer](cast[taddress](s2) +% i *% mt.base.size +% GenericSeqSize), - mt.Base) + mt.Base, shallow) var dstseq = cast[PGenericSeq](dst) dstseq.len = seq.len dstseq.space = seq.len of tyObject, tyTuple, tyPureObject: # we don't need to copy m_type field for tyObject, as they are equal anyway - genericAssignAux(dest, src, mt.node) + genericAssignAux(dest, src, mt.node, shallow) of tyArray, tyArrayConstr: for i in 0..(mt.size div mt.base.size)-1: genericAssignAux(cast[pointer](d +% i*% mt.base.size), - cast[pointer](s +% i*% mt.base.size), mt.base) - of tyString: # a leaf - var s2 = cast[ppointer](s)^ - if s2 != nil: # nil strings are possible! - unsureAsgnRef(cast[ppointer](dest), copyString(cast[NimString](s2))) - else: - var x = cast[ppointer](dest) - x^ = nil - return - of tyRef: # BUGFIX: a long time this has been forgotten! + cast[pointer](s +% i*% mt.base.size), mt.base, shallow) + of tyRef: unsureAsgnRef(cast[ppointer](dest), cast[ppointer](s)^) else: copyMem(dest, src, mt.size) # copy raw bits proc genericAssign(dest, src: Pointer, mt: PNimType) {.compilerProc.} = GC_disable() - genericAssignAux(dest, src, mt) + genericAssignAux(dest, src, mt, false) + GC_enable() + +proc genericShallowAssign(dest, src: Pointer, mt: PNimType) {.compilerProc.} = + GC_disable() + genericAssignAux(dest, src, mt, true) GC_enable() proc genericSeqAssign(dest, src: Pointer, mt: PNimType) {.compilerProc.} = @@ -152,3 +155,19 @@ proc genericReset(dest: Pointer, mt: PNimType) = else: zeroMem(dest, mt.size) # set raw bits to zero +proc selectBranch(discVal, L: int, + a: ptr array [0..0x7fff, ptr TNimNode]): ptr TNimNode = + result = a[L] # a[L] contains the ``else`` part (but may be nil) + if discVal <% L: + var x = a[discVal] + if x != nil: result = x + +proc FieldDiscriminantCheck(oldDiscVal, newDiscVal: int, + a: ptr array [0..0x7fff, ptr TNimNode], + L: int) {.compilerProc.} = + var oldBranch = selectBranch(oldDiscVal, L, a) + var newBranch = selectBranch(newDiscVal, L, a) + if newBranch != oldBranch and oldDiscVal != 0: + raise newException(EInvalidField, + "assignment to discriminant changes object branch") + diff --git a/lib/wrappers/readline/history.nim b/lib/wrappers/readline/history.nim index 12dfa2707..12dfa2707 100644..100755 --- a/lib/wrappers/readline/history.nim +++ b/lib/wrappers/readline/history.nim diff --git a/lib/wrappers/readline/readline.nim b/lib/wrappers/readline/readline.nim index d14171c46..d14171c46 100644..100755 --- a/lib/wrappers/readline/readline.nim +++ b/lib/wrappers/readline/readline.nim diff --git a/lib/wrappers/readline/rltypedefs.nim b/lib/wrappers/readline/rltypedefs.nim index 202cf925d..202cf925d 100644..100755 --- a/lib/wrappers/readline/rltypedefs.nim +++ b/lib/wrappers/readline/rltypedefs.nim diff --git a/lib/wrappers/readline/tweaked/history.h b/lib/wrappers/readline/tweaked/history.h index 53bd642b1..53bd642b1 100644..100755 --- a/lib/wrappers/readline/tweaked/history.h +++ b/lib/wrappers/readline/tweaked/history.h diff --git a/lib/wrappers/readline/tweaked/readline.h b/lib/wrappers/readline/tweaked/readline.h index b13fbdbbe..b13fbdbbe 100644..100755 --- a/lib/wrappers/readline/tweaked/readline.h +++ b/lib/wrappers/readline/tweaked/readline.h diff --git a/lib/wrappers/readline/tweaked/rltypedefs.h b/lib/wrappers/readline/tweaked/rltypedefs.h index 46bb42567..46bb42567 100644..100755 --- a/lib/wrappers/readline/tweaked/rltypedefs.h +++ b/lib/wrappers/readline/tweaked/rltypedefs.h diff --git a/lib/wrappers/readline/tweaked/tilde.h b/lib/wrappers/readline/tweaked/tilde.h index d91d0418d..d91d0418d 100644..100755 --- a/lib/wrappers/readline/tweaked/tilde.h +++ b/lib/wrappers/readline/tweaked/tilde.h diff --git a/rod/ccgexprs.nim b/rod/ccgexprs.nim index 65cc33fd4..8da08347c 100755 --- a/rod/ccgexprs.nim +++ b/rod/ccgexprs.nim @@ -184,6 +184,26 @@ proc genRefAssign(p: BProc, dest, src: TLoc, flags: TAssignmentFlags) = appcg(p.module, p.s[cpsStmts], "#unsureAsgnRef((void**) $1, $2);$n", [addrLoc(dest), rdLoc(src)]) +proc genGenericAsgn(p: BProc, dest, src: TLoc, flags: TAssignmentFlags) = + # Consider: + # type TMyFastString {.shallow.} = string + # Due to the implementation of pragmas this would end up to set the + # tfShallow flag for the built-in string type too! So we check only + # here for this flag, where it is reasonably safe to do so + # (for objects, etc.): + if needToCopy notin flags or + tfShallow in skipTypes(dest.t, abstractVarRange).flags: + if (dest.s == OnStack) or not (optRefcGC in gGlobalOptions): + appcg(p, cpsStmts, + "memcpy((void*)$1, (NIM_CONST void*)$2, sizeof($3));$n", + [addrLoc(dest), addrLoc(src), rdLoc(dest)]) + else: + appcg(p, cpsStmts, "#genericShallowAssign((void*)$1, (void*)$2, $3);$n", + [addrLoc(dest), addrLoc(src), genTypeInfo(p.module, dest.t)]) + else: + appcg(p, cpsStmts, "#genericAssign((void*)$1, (void*)$2, $3);$n", + [addrLoc(dest), addrLoc(src), genTypeInfo(p.module, dest.t)]) + proc genAssignment(p: BProc, dest, src: TLoc, flags: TAssignmentFlags) = # This function replaces all other methods for generating # the assignment operation in C. @@ -192,13 +212,13 @@ proc genAssignment(p: BProc, dest, src: TLoc, flags: TAssignmentFlags) = of tyRef: genRefAssign(p, dest, src, flags) of tySequence: - if not (needToCopy in flags): + if needToCopy notin flags: genRefAssign(p, dest, src, flags) else: appcg(p, cpsStmts, "#genericSeqAssign($1, $2, $3);$n", [addrLoc(dest), rdLoc(src), genTypeInfo(p.module, dest.t)]) of tyString: - if not (needToCopy in flags): + if needToCopy notin flags: genRefAssign(p, dest, src, flags) else: if (dest.s == OnStack) or not (optRefcGC in gGlobalOptions): @@ -209,23 +229,15 @@ proc genAssignment(p: BProc, dest, src: TLoc, flags: TAssignmentFlags) = else: appcg(p, cpsStmts, "#unsureAsgnRef((void**) $1, #copyString($2));$n", [addrLoc(dest), rdLoc(src)]) - of tyTuple: - if needsComplexAssignment(dest.t): - appcg(p, cpsStmts, "#genericAssign((void*)$1, (void*)$2, $3);$n", - [addrLoc(dest), addrLoc(src), genTypeInfo(p.module, dest.t)]) - else: - appcg(p, cpsStmts, "$1 = $2;$n", [rdLoc(dest), rdLoc(src)]) - of tyObject: + of tyTuple, tyObject: # XXX: check for subtyping? if needsComplexAssignment(dest.t): - appcg(p, cpsStmts, "#genericAssign((void*)$1, (void*)$2, $3);$n", - [addrLoc(dest), addrLoc(src), genTypeInfo(p.module, dest.t)]) + genGenericAsgn(p, dest, src, flags) else: appcg(p, cpsStmts, "$1 = $2;$n", [rdLoc(dest), rdLoc(src)]) of tyArray, tyArrayConstr: if needsComplexAssignment(dest.t): - appcg(p, cpsStmts, "#genericAssign((void*)$1, (void*)$2, $3);$n", - [addrLoc(dest), addrLoc(src), genTypeInfo(p.module, dest.t)]) + genGenericAsgn(p, dest, src, flags) else: appcg(p, cpsStmts, "memcpy((void*)$1, (NIM_CONST void*)$2, sizeof($1));$n", @@ -363,10 +375,11 @@ proc binaryArithOverflow(p: BProc, e: PNode, d: var TLoc, m: TMagic) = proc unaryArithOverflow(p: BProc, e: PNode, d: var TLoc, m: TMagic) = const - opr: array[mUnaryMinusI..mAbsI64, string] = ["((NI$2)-($1))", # UnaryMinusI - "-($1)", # UnaryMinusI64 - "(NI$2)abs($1)", # AbsI - "($1 > 0? ($1) : -($1))"] # AbsI64 + opr: array[mUnaryMinusI..mAbsI64, string] = [ + mUnaryMinusI: "((NI$2)-($1))", + mUnaryMinusI64: "-($1)", + mAbsI: "(NI$2)abs($1)", + mAbsI64: "($1 > 0? ($1) : -($1))"] var a: TLoc t: PType @@ -760,47 +773,68 @@ proc genEcho(p: BProc, n: PNode) = appcg(p, cpsStmts, "#rawEchoNL();$n") proc genCall(p: BProc, t: PNode, d: var TLoc) = - var - param: PSym - invalidRetType: bool - typ: PType - pl: PRope # parameter list - op, list, a: TLoc - length: int + var op, a: TLoc # this is a hotspot in the compiler initLocExpr(p, t.sons[0], op) - pl = con(op.r, "(") #typ := getUniqueType(t.sons[0].typ); - typ = t.sons[0].typ # getUniqueType() is too expensive here! + var pl = con(op.r, "(") + var typ = t.sons[0].typ # getUniqueType() is too expensive here! assert(typ.kind == tyProc) - invalidRetType = isInvalidReturnType(typ.sons[0]) - length = sonsLen(t) + var invalidRetType = isInvalidReturnType(typ.sons[0]) + var length = sonsLen(t) for i in countup(1, length - 1): initLocExpr(p, t.sons[i], a) # generate expression for param assert(sonsLen(typ) == sonsLen(typ.n)) if (i < sonsLen(typ)): assert(typ.n.sons[i].kind == nkSym) - param = typ.n.sons[i].sym + var param = typ.n.sons[i].sym if ccgIntroducedPtr(param): app(pl, addrLoc(a)) else: app(pl, rdLoc(a)) else: app(pl, rdLoc(a)) - if (i < length - 1) or (invalidRetType and (typ.sons[0] != nil)): - app(pl, ", ") - if (typ.sons[0] != nil) and invalidRetType: - # XXX (detected by pegs module 64bit): p(result, result) is not - # correct here. Thus we always allocate a temporary: - if d.k == locNone: getTemp(p, typ.sons[0], d) - app(pl, addrLoc(d)) - app(pl, ")") - if (typ.sons[0] != nil) and not invalidRetType: - if d.k == locNone: getTemp(p, typ.sons[0], d) - assert(d.t != nil) # generate an assignment to d: - initLoc(list, locCall, nil, OnUnknown) - list.r = pl - genAssignment(p, d, list, {}) # no need for deep copying + if i < length - 1: app(pl, ", ") + if typ.sons[0] != nil: + if invalidRetType: + if length > 1: app(pl, ", ") + # beware of 'result = p(result)'. We always allocate a temporary: + if d.k in {locTemp, locNone}: + # We already got a temp. Great, special case it: + if d.k == locNone: getTemp(p, typ.sons[0], d) + app(pl, addrLoc(d)) + app(pl, ")") + app(p.s[cpsStmts], pl) + app(p.s[cpsStmts], ';' & tnl) + else: + var tmp: TLoc + getTemp(p, typ.sons[0], tmp) + app(pl, addrLoc(tmp)) + app(pl, ")") + app(p.s[cpsStmts], pl) + app(p.s[cpsStmts], ';' & tnl) + genAssignment(p, d, tmp, {}) # no need for deep copying + else: + app(pl, ")") + if d.k == locNone: getTemp(p, typ.sons[0], d) + assert(d.t != nil) # generate an assignment to d: + var list: TLoc + initLoc(list, locCall, nil, OnUnknown) + list.r = pl + genAssignment(p, d, list, {}) # no need for deep copying else: + app(pl, ")") app(p.s[cpsStmts], pl) app(p.s[cpsStmts], ';' & tnl) + + when false: + app(pl, ")") + if (typ.sons[0] != nil) and not invalidRetType: + if d.k == locNone: getTemp(p, typ.sons[0], d) + assert(d.t != nil) # generate an assignment to d: + initLoc(list, locCall, nil, OnUnknown) + list.r = pl + genAssignment(p, d, list, {}) # no need for deep copying + else: + app(p.s[cpsStmts], pl) + app(p.s[cpsStmts], ';' & tnl) proc genStrConcat(p: BProc, e: PNode, d: var TLoc) = # <Nimrod code> diff --git a/rod/ccgstmts.nim b/rod/ccgstmts.nim index e87305065..da6ca377f 100755 --- a/rod/ccgstmts.nim +++ b/rod/ccgstmts.nim @@ -162,6 +162,7 @@ proc genWhileStmt(p: BProc, t: PNode) = a: TLoc Labl: TLabel length: int + inc(p.withinLoop) genLineDir(p, t) assert(sonsLen(t) == 2) inc(p.labels) @@ -179,6 +180,7 @@ proc genWhileStmt(p: BProc, t: PNode) = if p.blocks[length].id > 0: appf(p.s[cpsStmts], "} $1: ;$n", [Labl]) else: app(p.s[cpsStmts], '}' & tnl) setlen(p.blocks, len(p.blocks) - 1) + dec(p.withinLoop) proc genBlock(p: BProc, t: PNode, d: var TLoc) = inc(p.labels) @@ -619,21 +621,52 @@ proc genPragma(p: BProc, n: PNode) = if (sfDeadCodeElim in p.module.module.flags): addPendingModule(p.module) else: nil - -proc genAsgn(p: BProc, e: PNode) = - var a: TLoc - genLineDir(p, e) # BUGFIX - InitLocExpr(p, e.sons[0], a) - assert(a.t != nil) - expr(p, e.sons[1], a) -proc genFastAsgn(p: BProc, e: PNode) = - var a: TLoc - genLineDir(p, e) # BUGFIX +proc FieldDiscriminantCheckNeeded(p: BProc, asgn: PNode): bool = + if optFieldCheck in p.options: + var le = asgn.sons[0] + if le.kind == nkCheckedFieldExpr: + var field = le.sons[0].sons[1].sym + result = sfDiscriminant in field.flags + elif le.kind == nkDotExpr: + var field = le.sons[1].sym + result = sfDiscriminant in field.flags + +proc genDiscriminantCheck(p: BProc, a, tmp: TLoc, objtype: PType, + field: PSym) = + var t = skipTypes(objtype, abstractVar) + assert t.kind == tyObject + discard genTypeInfo(p.module, t) + var L = lengthOrd(field.typ) + if not IntSetContainsOrIncl(p.module.declaredThings, field.id): + appcg(p.module, cfsVars, "extern $1", + discriminatorTableDecl(p.module, t, field)) + appcg(p, cpsStmts, + "#FieldDiscriminantCheck((NI)(NU)($1), (NI)(NU)($2), $3, $4);$n", + [rdLoc(a), rdLoc(tmp), discriminatorTableName(p.module, t, field), + intLiteral(L+1)]) + +proc asgnFieldDiscriminant(p: BProc, e: PNode) = + var a, tmp: TLoc + var dotExpr = e.sons[0] + var d: PSym + if dotExpr.kind == nkCheckedFieldExpr: dotExpr = dotExpr.sons[0] InitLocExpr(p, e.sons[0], a) - incl(a.flags, lfNoDeepCopy) - assert(a.t != nil) - expr(p, e.sons[1], a) + getTemp(p, a.t, tmp) + expr(p, e.sons[1], tmp) + genDiscriminantCheck(p, a, tmp, dotExpr.sons[0].typ, dotExpr.sons[1].sym) + genAssignment(p, a, tmp, {}) + +proc genAsgn(p: BProc, e: PNode, fastAsgn: bool) = + genLineDir(p, e) + if not FieldDiscriminantCheckNeeded(p, e): + var a: TLoc + InitLocExpr(p, e.sons[0], a) + if fastAsgn: incl(a.flags, lfNoDeepCopy) + assert(a.t != nil) + expr(p, e.sons[1], a) + else: + asgnFieldDiscriminant(p, e) proc genStmts(p: BProc, t: PNode) = var @@ -657,8 +690,8 @@ proc genStmts(p: BProc, t: PNode) = nkCallStrLit: genLineDir(p, t) initLocExpr(p, t, a) - of nkAsgn: genAsgn(p, t) - of nkFastAsgn: genFastAsgn(p, t) + of nkAsgn: genAsgn(p, t, fastAsgn=false) + of nkFastAsgn: genAsgn(p, t, fastAsgn=true) of nkDiscardStmt: genLineDir(p, t) initLocExpr(p, t.sons[0], a) @@ -684,7 +717,7 @@ proc genStmts(p: BProc, t: PNode) = (sfExportc in prc.flags and lfExportLib in prc.loc.flags) or (prc.kind == skMethod): # we have not only the header: - if (t.sons[codePos].kind != nkEmpty) or (lfDynamicLib in prc.loc.flags): + if t.sons[codePos].kind != nkEmpty or lfDynamicLib in prc.loc.flags: genProc(p.module, prc) else: internalError(t.info, "genStmts(" & $t.kind & ')') diff --git a/rod/ccgtypes.nim b/rod/ccgtypes.nim index 43b4f173d..1920da599 100755 --- a/rod/ccgtypes.nim +++ b/rod/ccgtypes.nim @@ -7,9 +7,6 @@ # distribution, for details about the copyright. # -#var -# newDummyVar: int # just to check the symbol file mechanism - # ------------------------- Name Mangling -------------------------------- proc mangle(name: string): string = @@ -566,46 +563,51 @@ proc genTypeInfoAux(m: BModule, typ: PType, name: PRope) = base = toRope("0") genTypeInfoAuxBase(m, typ, name, base) +proc discriminatorTableName(m: BModule, objtype: PType, d: PSym): PRope = + if objType.sym == nil: + InternalError(d.info, "anonymous obj with discriminator") + result = ropef("NimDT_$1_$2", [ + toRope(objType.sym.name.s), toRope(d.name.s)]) + +proc discriminatorTableDecl(m: BModule, objtype: PType, d: PSym): PRope = + discard cgsym(m, "TNimNode") + var tmp = discriminatorTableName(m, objtype, d) + result = ropef("TNimNode* $1[$2];$n", [tmp, toRope(lengthOrd(d.typ)+1)]) + proc genObjectFields(m: BModule, typ: PType, n: PNode, expr: PRope) = - var - tmp, tmp2: PRope - length, x, y: int - field: PSym - b: PNode case n.kind of nkRecList: - length = sonsLen(n) - if length == 1: + var L = sonsLen(n) + if L == 1: genObjectFields(m, typ, n.sons[0], expr) - elif length > 0: - tmp = getTempName() - appf(m.s[cfsTypeInit1], "static TNimNode* $1[$2];$n", - [tmp, toRope(length)]) - for i in countup(0, length - 1): - tmp2 = getNimNode(m) + elif L > 0: + var tmp = getTempName() + appf(m.s[cfsTypeInit1], "static TNimNode* $1[$2];$n", [tmp, toRope(L)]) + for i in countup(0, L-1): + var tmp2 = getNimNode(m) appf(m.s[cfsTypeInit3], "$1[$2] = &$3;$n", [tmp, toRope(i), tmp2]) genObjectFields(m, typ, n.sons[i], tmp2) appf(m.s[cfsTypeInit3], "$1.len = $2; $1.kind = 2; $1.sons = &$3[0];$n", - [expr, toRope(length), tmp]) - else: - appf(m.s[cfsTypeInit3], "$1.len = $2; $1.kind = 2;$n", - [expr, toRope(length)]) + [expr, toRope(L), tmp]) + else: + appf(m.s[cfsTypeInit3], "$1.len = $2; $1.kind = 2;$n", [expr, toRope(L)]) of nkRecCase: - length = sonsLen(n) assert(n.sons[0].kind == nkSym) - field = n.sons[0].sym - tmp = getTempName() + var field = n.sons[0].sym + var tmp = discriminatorTableName(m, typ, field) + var L = lengthOrd(field.typ) + assert L > 0 appf(m.s[cfsTypeInit3], "$1.kind = 3;$n" & "$1.offset = offsetof($2, $3);$n" & "$1.typ = $4;$n" & "$1.name = $5;$n" & "$1.sons = &$6[0];$n" & "$1.len = $7;$n", [expr, getTypeDesc(m, typ), field.loc.r, - genTypeInfo(m, field.typ), makeCString(field.name.s), - tmp, toRope(lengthOrd(field.typ))]) - appf(m.s[cfsTypeInit1], "static TNimNode* $1[$2];$n", - [tmp, toRope(lengthOrd(field.typ) + 1)]) - for i in countup(1, length - 1): - b = n.sons[i] # branch - tmp2 = getNimNode(m) + genTypeInfo(m, field.typ), + makeCString(field.name.s), + tmp, toRope(L)]) + appf(m.s[cfsData], "TNimNode* $1[$2];$n", [tmp, toRope(L+1)]) + for i in countup(1, sonsLen(n)-1): + var b = n.sons[i] # branch + var tmp2 = getNimNode(m) genObjectFields(m, typ, lastSon(b), tmp2) case b.kind of nkOfBranch: @@ -613,8 +615,8 @@ proc genObjectFields(m: BModule, typ: PType, n: PNode, expr: PRope) = internalError(b.info, "genObjectFields; nkOfBranch broken") for j in countup(0, sonsLen(b) - 2): if b.sons[j].kind == nkRange: - x = int(getOrdValue(b.sons[j].sons[0])) - y = int(getOrdValue(b.sons[j].sons[1])) + var x = int(getOrdValue(b.sons[j].sons[0])) + var y = int(getOrdValue(b.sons[j].sons[1])) while x <= y: appf(m.s[cfsTypeInit3], "$1[$2] = &$3;$n", [tmp, toRope(x), tmp2]) inc(x) @@ -623,10 +625,10 @@ proc genObjectFields(m: BModule, typ: PType, n: PNode, expr: PRope) = [tmp, toRope(getOrdValue(b.sons[j])), tmp2]) of nkElse: appf(m.s[cfsTypeInit3], "$1[$2] = &$3;$n", - [tmp, toRope(lengthOrd(field.typ)), tmp2]) + [tmp, toRope(L), tmp2]) else: internalError(n.info, "genObjectFields(nkRecCase)") of nkSym: - field = n.sym + var field = n.sym appf(m.s[cfsTypeInit3], "$1.kind = 1;$n" & "$1.offset = offsetof($2, $3);$n" & "$1.typ = $4;$n" & "$1.name = $5;$n", [expr, getTypeDesc(m, typ), @@ -634,10 +636,9 @@ proc genObjectFields(m: BModule, typ: PType, n: PNode, expr: PRope) = else: internalError(n.info, "genObjectFields") proc genObjectInfo(m: BModule, typ: PType, name: PRope) = - var tmp: PRope if typ.kind == tyObject: genTypeInfoAux(m, typ, name) else: genTypeInfoAuxBase(m, typ, name, toRope("0")) - tmp = getNimNode(m) + var tmp = getNimNode(m) genObjectFields(m, typ, typ.n, tmp) appf(m.s[cfsTypeInit3], "$1->node = &$2;$n", [name, tmp]) @@ -742,21 +743,15 @@ proc genTypeInfo(m: BModule, typ: PType): PRope = [result, toRope(typeToString(t))]) if dataGenerated: return case t.kind - of tyEmpty: - result = toRope("0") + of tyEmpty: result = toRope("0") of tyPointer, tyProc, tyBool, tyChar, tyCString, tyString, tyInt..tyFloat128, tyVar: genTypeInfoAuxBase(gNimDat, t, result, toRope("0")) - of tyRef, tyPtr, tySequence, tyRange: - genTypeInfoAux(gNimDat, t, result) - of tyArrayConstr, tyArray: - genArrayInfo(gNimDat, t, result) - of tySet: - genSetInfo(gNimDat, t, result) - of tyEnum: - genEnumInfo(gNimDat, t, result) - of tyObject: - genObjectInfo(gNimDat, t, result) + of tyRef, tyPtr, tySequence, tyRange: genTypeInfoAux(gNimDat, t, result) + of tyArrayConstr, tyArray: genArrayInfo(gNimDat, t, result) + of tySet: genSetInfo(gNimDat, t, result) + of tyEnum: genEnumInfo(gNimDat, t, result) + of tyObject: genObjectInfo(gNimDat, t, result) of tyTuple: if t.n != nil: genObjectInfo(gNimDat, t, result) else: genTupleInfo(gNimDat, t, result) diff --git a/rod/cgen.nim b/rod/cgen.nim index 7df9f3d11..51aee87ab 100755 --- a/rod/cgen.nim +++ b/rod/cgen.nim @@ -77,6 +77,7 @@ type sendClosure: PType # closure record type that we pass receiveClosure: PType # closure record type that we get module: BModule # used to prevent excessive parameter passing + withinLoop: int # > 0 if we are within a loop TTypeSeq = seq[PType] TCGen = object of TPassContext # represents a C source file @@ -131,8 +132,7 @@ proc addPendingModule(m: BModule) = proc findPendingModule(m: BModule, s: PSym): BModule = var ms = getModule(s) - if ms.id == m.module.id: - return m + if ms.id == m.module.id: return m for i in countup(0, high(gPendingModules)): result = gPendingModules[i] if result.module.id == ms.id: return @@ -231,6 +231,9 @@ proc appcg(m: BModule, c: var PRope, frmt: TFormatStr, args: openarray[PRope]) = app(c, ropecg(m, frmt, args)) +proc appcg(m: BModule, s: TCFileSection, frmt: TFormatStr, + args: openarray[PRope]) = + app(m.s[s], ropecg(m, frmt, args)) proc appcg(p: BProc, s: TCProcSection, frmt: TFormatStr, args: openarray[PRope]) = @@ -248,7 +251,7 @@ proc rdLoc(a: TLoc): PRope = proc addrLoc(a: TLoc): PRope = result = a.r - if not (lfIndirect in a.flags): result = con("&", result) + if lfIndirect notin a.flags: result = con("&", result) proc rdCharLoc(a: TLoc): PRope = # read a location that may need a char-cast: @@ -285,7 +288,7 @@ proc genRefAssign(p: BProc, dest, src: TLoc, flags: TAssignmentFlags) proc zeroVar(p: BProc, loc: TLoc, containsGCref: bool) = if skipTypes(loc.t, abstractVarRange).Kind notin {tyArray, tyArrayConstr, tySet, tyTuple, tyObject}: - if containsGcref: + if containsGcref and p.WithInLoop > 0: appf(p.s[cpsInit], "$1 = 0;$n", [rdLoc(loc)]) var nilLoc: TLoc initLoc(nilLoc, locTemp, loc.t, onStack) @@ -295,7 +298,7 @@ proc zeroVar(p: BProc, loc: TLoc, containsGCref: bool) = else: appf(p.s[cpsStmts], "$1 = 0;$n", [rdLoc(loc)]) else: - if containsGcref: + if containsGcref and p.WithInLoop > 0: appf(p.s[cpsInit], "memset((void*)$1, 0, sizeof($2));$n", [addrLoc(loc), rdLoc(loc)]) appcg(p, cpsStmts, "#genericReset((void*)$1, $2);$n", diff --git a/rod/commands.nim b/rod/commands.nim index 83c01f0ed..d3eaf94a9 100755 --- a/rod/commands.nim +++ b/rod/commands.nim @@ -200,12 +200,12 @@ proc ProcessSpecificNote(arg: string, state: TSpecialWord, pass: TCmdlinePass, var id = "" # arg = "X]:on|off" var i = 0 var n = hintMin - while (i < len(arg) + 0) and (arg[i] != ']'): + while i < len(arg) and (arg[i] != ']'): add(id, arg[i]) inc(i) - if (i < len(arg) + 0) and (arg[i] == ']'): inc(i) + if i < len(arg) and (arg[i] == ']'): inc(i) else: InvalidCmdLineOption(pass, arg, info) - if (i < len(arg) + 0) and (arg[i] in {':', '='}): inc(i) + if i < len(arg) and (arg[i] in {':', '='}): inc(i) else: InvalidCmdLineOption(pass, arg, info) if state == wHint: var x = findStr(msgs.HintsToStr, id) diff --git a/rod/rst.nim b/rod/rst.nim index dace43a44..85b0cf54e 100755 --- a/rod/rst.nim +++ b/rod/rst.nim @@ -432,7 +432,7 @@ proc matchesHyperlink(h: PRstNode, filename: string): bool = result = matchesHyperlink(h.sons[0], filename) elif h.kind == rnHyperlink: var s = addNodes(h.sons[1]) - if startsWith(s, filename) and (s[len(filename) + 0] == '#'): result = true + if startsWith(s, filename) and (s[len(filename)] == '#'): result = true else: result = false else: result = false diff --git a/rod/semcall.nim b/rod/semcall.nim index 294c0399b..294c0399b 100644..100755 --- a/rod/semcall.nim +++ b/rod/semcall.nim diff --git a/rod/semexprs.nim b/rod/semexprs.nim index 7a14b931a..712ce4e6e 100755 --- a/rod/semexprs.nim +++ b/rod/semexprs.nim @@ -412,10 +412,8 @@ proc semDirectCallAnalyseEffects(c: PContext, n: PNode, flags: TExprFlags): PNode = var symflags = {skProc, skMethod, skConverter} if efWantIterator in flags: - symflags = {skIterator} - elif efAllowType in flags: # for ``type countup(1,3)``, see ``tests/ttoseq``. - symflags.incl(skIterator) + symflags = {skIterator} result = semDirectCall(c, n, symflags) if result != nil: if result.sons[0].kind != nkSym: @@ -571,7 +569,6 @@ proc lookupInRecordAndBuildCheck(c: PContext, n, r: PNode, field: PIdent, check: var PNode): PSym = # transform in a node that contains the runtime check for the # field, if it is in a case-part... - var s, it, inExpr, notExpr: PNode result = nil case r.kind of nkRecList: @@ -583,9 +580,9 @@ proc lookupInRecordAndBuildCheck(c: PContext, n, r: PNode, field: PIdent, if (r.sons[0].kind != nkSym): IllFormedAst(r) result = lookupInRecordAndBuildCheck(c, n, r.sons[0], field, check) if result != nil: return - s = newNodeI(nkCurly, r.info) + var s = newNodeI(nkCurly, r.info) for i in countup(1, sonsLen(r) - 1): - it = r.sons[i] + var it = r.sons[i] case it.kind of nkOfBranch: result = lookupInRecordAndBuildCheck(c, n, lastSon(it), field, check) @@ -597,7 +594,7 @@ proc lookupInRecordAndBuildCheck(c: PContext, n, r: PNode, field: PIdent, addSon(check, ast.emptyNode) # make space for access node s = newNodeI(nkCurly, n.info) for j in countup(0, sonsLen(it) - 2): addSon(s, copyTree(it.sons[j])) - inExpr = newNodeI(nkCall, n.info) + var inExpr = newNodeI(nkCall, n.info) addSon(inExpr, newIdentNode(getIdent("in"), n.info)) addSon(inExpr, copyTree(r.sons[0])) addSon(inExpr, s) #writeln(output, renderTree(inExpr)); @@ -609,11 +606,11 @@ proc lookupInRecordAndBuildCheck(c: PContext, n, r: PNode, field: PIdent, if check == nil: check = newNodeI(nkCheckedFieldExpr, n.info) addSon(check, ast.emptyNode) # make space for access node - inExpr = newNodeI(nkCall, n.info) + var inExpr = newNodeI(nkCall, n.info) addSon(inExpr, newIdentNode(getIdent("in"), n.info)) addSon(inExpr, copyTree(r.sons[0])) addSon(inExpr, s) - notExpr = newNodeI(nkCall, n.info) + var notExpr = newNodeI(nkCall, n.info) addSon(notExpr, newIdentNode(getIdent("not"), n.info)) addSon(notExpr, inExpr) addSon(check, semExpr(c, notExpr)) diff --git a/rod/semtempl.nim b/rod/semtempl.nim index 1fbcbe227..7782c7b42 100755 --- a/rod/semtempl.nim +++ b/rod/semtempl.nim @@ -152,7 +152,7 @@ proc transformToExpr(n: PNode): PNode = if realStmt >= 0: result = transformToExpr(n.sons[realStmt]) else: n.kind = nkStmtListExpr of nkBlockStmt: - n.kind = nkBlockExpr + n.kind = nkBlockExpr #nkIfStmt: n.kind := nkIfExpr; // this is not correct! else: nil diff --git a/rod/semtypes.nim b/rod/semtypes.nim index 8dae5c27b..d2e6cd7f5 100755 --- a/rod/semtypes.nim +++ b/rod/semtypes.nim @@ -574,8 +574,10 @@ proc semTypeNode(c: PContext, n: PNode, prev: PType): PType = case n.kind of nkEmpty: nil of nkTypeOfExpr: + # for ``type countup(1,3)``, see ``tests/ttoseq``. + # XXX We should find a better solution. checkSonsLen(n, 1) - result = semExprWithType(c, n.sons[0], {efAllowType}).typ + result = semExprWithType(c, n.sons[0], {efWantIterator}).typ of nkPar: if sonsLen(n) == 1: result = semTypeNode(c, n.sons[0], prev) else: GlobalError(n.info, errTypeExpected) diff --git a/rod/semtypinst.nim b/rod/semtypinst.nim index 6427d7858..6427d7858 100644..100755 --- a/rod/semtypinst.nim +++ b/rod/semtypinst.nim diff --git a/rod/suggest.nim b/rod/suggest.nim index 6f4babe63..6f4babe63 100644..100755 --- a/rod/suggest.nim +++ b/rod/suggest.nim diff --git a/tests/accept/compile/tfib.nim b/tests/accept/compile/tfib.nim index 09a4d5038..09a4d5038 100644..100755 --- a/tests/accept/compile/tfib.nim +++ b/tests/accept/compile/tfib.nim diff --git a/tests/accept/compile/tlinearscanend.nim b/tests/accept/compile/tlinearscanend.nim new file mode 100755 index 000000000..15fd0c70a --- /dev/null +++ b/tests/accept/compile/tlinearscanend.nim @@ -0,0 +1,24 @@ + +import strutils + +var x = 343 + +case stdin.readline.parseInt +of 0: + echo "most common case" +of 1: + {.linearScanEnd.} + echo "second most common case" +of 2: echo "unlikely: use branch table" +else: + echo "unlikely too: use branch table" + + +case x +of 23: echo "23" +of 343: echo "343" +of 21: echo "21" +else: + {.linearScanEnd.} + echo "default" + diff --git a/tests/accept/compile/tmacro1.nim b/tests/accept/compile/tmacro1.nim index e96997c47..e96997c47 100644..100755 --- a/tests/accept/compile/tmacro1.nim +++ b/tests/accept/compile/tmacro1.nim diff --git a/tests/accept/compile/tsortdev.nim b/tests/accept/compile/tsortdev.nim index 1246d7581..1246d7581 100644..100755 --- a/tests/accept/compile/tsortdev.nim +++ b/tests/accept/compile/tsortdev.nim diff --git a/tests/accept/compile/ttypeconverter1.nim b/tests/accept/compile/ttypeconverter1.nim new file mode 100644 index 000000000..9553f6568 --- /dev/null +++ b/tests/accept/compile/ttypeconverter1.nim @@ -0,0 +1,6 @@ + +converter p(i: int): bool = return i != 0 + +if 0: + echo "foo" + diff --git a/tests/accept/run/tmacro2.nim b/tests/accept/run/tmacro2.nim index 6aa9bb57d..6aa9bb57d 100644..100755 --- a/tests/accept/run/tmacro2.nim +++ b/tests/accept/run/tmacro2.nim diff --git a/tests/accept/run/tmacro3.nim b/tests/accept/run/tmacro3.nim index fa2040e92..fa2040e92 100644..100755 --- a/tests/accept/run/tmacro3.nim +++ b/tests/accept/run/tmacro3.nim diff --git a/tests/accept/run/tmacros1.nim b/tests/accept/run/tmacros1.nim index 572ebb914..572ebb914 100644..100755 --- a/tests/accept/run/tmacros1.nim +++ b/tests/accept/run/tmacros1.nim diff --git a/tests/accept/run/tpegs.nim b/tests/accept/run/tpegs.nim new file mode 100755 index 000000000..c127f6fec --- /dev/null +++ b/tests/accept/run/tpegs.nim @@ -0,0 +1,1769 @@ +discard """ + output: '''this +is +an +example +d +e +f +('keyvalue' 'key'*)''' +""" +# PEGs module turned out to be a good test to detect memory management bugs. + +include "system/inclrtl" + +const + useUnicode = true ## change this to deactivate proper UTF-8 support + +import + strutils + +when useUnicode: + import unicode + +const + InlineThreshold = 5 ## number of leaves; -1 to disable inlining + MaxSubpatterns* = 10 ## defines the maximum number of subpatterns that + ## can be captured. More subpatterns cannot be captured! + +type + TPegKind = enum + pkEmpty, + pkAny, ## any character (.) + pkAnyRune, ## any Unicode character (_) + pkNewLine, ## CR-LF, LF, CR + pkLetter, ## Unicode letter + pkLower, ## Unicode lower case letter + pkUpper, ## Unicode upper case letter + pkTitle, ## Unicode title character + pkWhitespace, ## Unicode whitespace character + pkTerminal, + pkTerminalIgnoreCase, + pkTerminalIgnoreStyle, + pkChar, ## single character to match + pkCharChoice, + pkNonTerminal, + pkSequence, ## a b c ... --> Internal DSL: peg(a, b, c) + pkOrderedChoice, ## a / b / ... --> Internal DSL: a / b or /[a, b, c] + pkGreedyRep, ## a* --> Internal DSL: *a + ## a+ --> (a a*) + pkGreedyRepChar, ## x* where x is a single character (superop) + pkGreedyRepSet, ## [set]* (superop) + pkGreedyAny, ## .* or _* (superop) + pkOption, ## a? --> Internal DSL: ?a + pkAndPredicate, ## &a --> Internal DSL: &a + pkNotPredicate, ## !a --> Internal DSL: !a + pkCapture, ## {a} --> Internal DSL: capture(a) + pkBackRef, ## $i --> Internal DSL: backref(i) + pkBackRefIgnoreCase, + pkBackRefIgnoreStyle, + pkSearch, ## @a --> Internal DSL: @a + pkCapturedSearch, ## {@} a --> Internal DSL: @@a + pkRule, ## a <- b + pkList, ## a, b + pkStartAnchor ## ^ --> Internal DSL: startAnchor() + TNonTerminalFlag = enum + ntDeclared, ntUsed + TNonTerminal {.final.} = object ## represents a non terminal symbol + name: string ## the name of the symbol + line: int ## line the symbol has been declared/used in + col: int ## column the symbol has been declared/used in + flags: set[TNonTerminalFlag] ## the nonterminal's flags + rule: TNode ## the rule that the symbol refers to + TNode {.final, shallow.} = object + case kind: TPegKind + of pkEmpty..pkWhitespace: nil + of pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle: term: string + of pkChar, pkGreedyRepChar: ch: char + of pkCharChoice, pkGreedyRepSet: charChoice: ref set[char] + of pkNonTerminal: nt: PNonTerminal + of pkBackRef..pkBackRefIgnoreStyle: index: range[1..MaxSubpatterns] + else: sons: seq[TNode] + PNonTerminal* = ref TNonTerminal + + TPeg* = TNode ## type that represents a PEG + +proc term*(t: string): TPeg {.rtl, extern: "npegs$1Str".} = + ## constructs a PEG from a terminal string + if t.len != 1: + result.kind = pkTerminal + result.term = t + else: + result.kind = pkChar + result.ch = t[0] + +proc termIgnoreCase*(t: string): TPeg {. + rtl, extern: "npegs$1".} = + ## constructs a PEG from a terminal string; ignore case for matching + result.kind = pkTerminalIgnoreCase + result.term = t + +proc termIgnoreStyle*(t: string): TPeg {. + rtl, extern: "npegs$1".} = + ## constructs a PEG from a terminal string; ignore style for matching + result.kind = pkTerminalIgnoreStyle + result.term = t + +proc term*(t: char): TPeg {.rtl, extern: "npegs$1Char".} = + ## constructs a PEG from a terminal char + assert t != '\0' + result.kind = pkChar + result.ch = t + +proc charSet*(s: set[char]): TPeg {.rtl, extern: "npegs$1".} = + ## constructs a PEG from a character set `s` + assert '\0' notin s + result.kind = pkCharChoice + new(result.charChoice) + result.charChoice^ = s + +proc len(a: TPeg): int {.inline.} = return a.sons.len +proc add(d: var TPeg, s: TPeg) {.inline.} = add(d.sons, s) + +proc copyPeg(a: TPeg): TPeg = + result.kind = a.kind + case a.kind + of pkEmpty..pkWhitespace: nil + of pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle: + result.term = a.term + of pkChar, pkGreedyRepChar: + result.ch = a.ch + of pkCharChoice, pkGreedyRepSet: + new(result.charChoice) + result.charChoice^ = a.charChoice^ + of pkNonTerminal: result.nt = a.nt + of pkBackRef..pkBackRefIgnoreStyle: + result.index = a.index + else: + result.sons = a.sons + +proc addChoice(dest: var TPeg, elem: TPeg) = + var L = dest.len-1 + if L >= 0 and dest.sons[L].kind == pkCharChoice: + # caution! Do not introduce false aliasing here! + case elem.kind + of pkCharChoice: + dest.sons[L] = charSet(dest.sons[L].charChoice^ + elem.charChoice^) + of pkChar: + dest.sons[L] = charSet(dest.sons[L].charChoice^ + {elem.ch}) + else: add(dest, elem) + else: add(dest, elem) + +template multipleOp(k: TPegKind, localOpt: expr) = + result.kind = k + result.sons = @[] + for x in items(a): + if x.kind == k: + for y in items(x.sons): + localOpt(result, y) + else: + localOpt(result, x) + if result.len == 1: + result = result.sons[0] + +proc `/`*(a: openArray[TPeg]): TPeg {. + rtl, extern: "npegsOrderedChoice".} = + ## constructs an ordered choice with the PEGs in `a` + multipleOp(pkOrderedChoice, addChoice) + +proc addSequence(dest: var TPeg, elem: TPeg) = + var L = dest.len-1 + if L >= 0 and dest.sons[L].kind == pkTerminal: + # caution! Do not introduce false aliasing here! + case elem.kind + of pkTerminal: + dest.sons[L] = term(dest.sons[L].term & elem.term) + of pkChar: + dest.sons[L] = term(dest.sons[L].term & elem.ch) + else: add(dest, elem) + else: add(dest, elem) + +proc sequence*(a: openArray[TPeg]): TPeg {. + rtl, extern: "npegs$1".} = + ## constructs a sequence with all the PEGs from `a` + multipleOp(pkSequence, addSequence) + +proc `?`*(a: TPeg): TPeg {.rtl, extern: "npegsOptional".} = + ## constructs an optional for the PEG `a` + if a.kind in {pkOption, pkGreedyRep, pkGreedyAny, pkGreedyRepChar, + pkGreedyRepSet}: + # a* ? --> a* + # a? ? --> a? + result = a + else: + result.kind = pkOption + result.sons = @[a] + +proc `*`*(a: TPeg): TPeg {.rtl, extern: "npegsGreedyRep".} = + ## constructs a "greedy repetition" for the PEG `a` + case a.kind + of pkGreedyRep, pkGreedyRepChar, pkGreedyRepSet, pkGreedyAny, pkOption: + assert false + # produces endless loop! + of pkChar: + result.kind = pkGreedyRepChar + result.ch = a.ch + of pkCharChoice: + result.kind = pkGreedyRepSet + result.charChoice = a.charChoice # copying a reference suffices! + of pkAny, pkAnyRune: + result.kind = pkGreedyAny + else: + result.kind = pkGreedyRep + result.sons = @[a] + +proc `@`*(a: TPeg): TPeg {.rtl, extern: "npegsSearch".} = + ## constructs a "search" for the PEG `a` + result.kind = pkSearch + result.sons = @[a] + +proc `@@`*(a: TPeg): TPeg {.rtl, + extern: "npgegsCapturedSearch".} = + ## constructs a "captured search" for the PEG `a` + result.kind = pkCapturedSearch + result.sons = @[a] + +when false: + proc contains(a: TPeg, k: TPegKind): bool = + if a.kind == k: return true + case a.kind + of pkEmpty, pkAny, pkAnyRune, pkGreedyAny, pkNewLine, pkTerminal, + pkTerminalIgnoreCase, pkTerminalIgnoreStyle, pkChar, pkGreedyRepChar, + pkCharChoice, pkGreedyRepSet: nil + of pkNonTerminal: return true + else: + for i in 0..a.sons.len-1: + if contains(a.sons[i], k): return true + +proc `+`*(a: TPeg): TPeg {.rtl, extern: "npegsGreedyPosRep".} = + ## constructs a "greedy positive repetition" with the PEG `a` + return sequence(a, *a) + +proc `&`*(a: TPeg): TPeg {.rtl, extern: "npegsAndPredicate".} = + ## constructs an "and predicate" with the PEG `a` + result.kind = pkAndPredicate + result.sons = @[a] + +proc `!`*(a: TPeg): TPeg {.rtl, extern: "npegsNotPredicate".} = + ## constructs a "not predicate" with the PEG `a` + result.kind = pkNotPredicate + result.sons = @[a] + +proc any*: TPeg {.inline.} = + ## constructs the PEG `any character`:idx: (``.``) + result.kind = pkAny + +proc anyRune*: TPeg {.inline.} = + ## constructs the PEG `any rune`:idx: (``_``) + result.kind = pkAnyRune + +proc newLine*: TPeg {.inline.} = + ## constructs the PEG `newline`:idx: (``\n``) + result.kind = pkNewline + +proc UnicodeLetter*: TPeg {.inline.} = + ## constructs the PEG ``\letter`` which matches any Unicode letter. + result.kind = pkLetter + +proc UnicodeLower*: TPeg {.inline.} = + ## constructs the PEG ``\lower`` which matches any Unicode lowercase letter. + result.kind = pkLower + +proc UnicodeUpper*: TPeg {.inline.} = + ## constructs the PEG ``\upper`` which matches any Unicode lowercase letter. + result.kind = pkUpper + +proc UnicodeTitle*: TPeg {.inline.} = + ## constructs the PEG ``\title`` which matches any Unicode title letter. + result.kind = pkTitle + +proc UnicodeWhitespace*: TPeg {.inline.} = + ## constructs the PEG ``\white`` which matches any Unicode + ## whitespace character. + result.kind = pkWhitespace + +proc startAnchor*: TPeg {.inline.} = + ## constructs the PEG ``^`` which matches the start of the input. + result.kind = pkStartAnchor + +proc endAnchor*: TPeg {.inline.} = + ## constructs the PEG ``$`` which matches the end of the input. + result = !any() + +proc capture*(a: TPeg): TPeg {.rtl, extern: "npegsCapture".} = + ## constructs a capture with the PEG `a` + result.kind = pkCapture + result.sons = @[a] + +proc backref*(index: range[1..MaxSubPatterns]): TPeg {. + rtl, extern: "npegs$1".} = + ## constructs a back reference of the given `index`. `index` starts counting + ## from 1. + result.kind = pkBackRef + result.index = index-1 + +proc backrefIgnoreCase*(index: range[1..MaxSubPatterns]): TPeg {. + rtl, extern: "npegs$1".} = + ## constructs a back reference of the given `index`. `index` starts counting + ## from 1. Ignores case for matching. + result.kind = pkBackRefIgnoreCase + result.index = index-1 + +proc backrefIgnoreStyle*(index: range[1..MaxSubPatterns]): TPeg {. + rtl, extern: "npegs$1".}= + ## constructs a back reference of the given `index`. `index` starts counting + ## from 1. Ignores style for matching. + result.kind = pkBackRefIgnoreStyle + result.index = index-1 + +proc spaceCost(n: TPeg): int = + case n.kind + of pkEmpty: nil + of pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle, pkChar, + pkGreedyRepChar, pkCharChoice, pkGreedyRepSet, + pkAny..pkWhitespace, pkGreedyAny: + result = 1 + of pkNonTerminal: + # we cannot inline a rule with a non-terminal + result = InlineThreshold+1 + else: + for i in 0..n.len-1: + inc(result, spaceCost(n.sons[i])) + if result >= InlineThreshold: break + +proc nonterminal*(n: PNonTerminal): TPeg {. + rtl, extern: "npegs$1".} = + ## constructs a PEG that consists of the nonterminal symbol + assert n != nil + if ntDeclared in n.flags and spaceCost(n.rule) < InlineThreshold: + when false: echo "inlining symbol: ", n.name + result = n.rule # inlining of rule enables better optimizations + else: + result.kind = pkNonTerminal + result.nt = n + +proc newNonTerminal*(name: string, line, column: int): PNonTerminal {. + rtl, extern: "npegs$1".} = + ## constructs a nonterminal symbol + new(result) + result.name = name + result.line = line + result.col = column + +template letters*: expr = + ## expands to ``charset({'A'..'Z', 'a'..'z'})`` + charset({'A'..'Z', 'a'..'z'}) + +template digits*: expr = + ## expands to ``charset({'0'..'9'})`` + charset({'0'..'9'}) + +template whitespace*: expr = + ## expands to ``charset({' ', '\9'..'\13'})`` + charset({' ', '\9'..'\13'}) + +template identChars*: expr = + ## expands to ``charset({'a'..'z', 'A'..'Z', '0'..'9', '_'})`` + charset({'a'..'z', 'A'..'Z', '0'..'9', '_'}) + +template identStartChars*: expr = + ## expands to ``charset({'A'..'Z', 'a'..'z', '_'})`` + charset({'a'..'z', 'A'..'Z', '_'}) + +template ident*: expr = + ## same as ``[a-zA-Z_][a-zA-z_0-9]*``; standard identifier + sequence(charset({'a'..'z', 'A'..'Z', '_'}), + *charset({'a'..'z', 'A'..'Z', '0'..'9', '_'})) + +template natural*: expr = + ## same as ``\d+`` + +digits + +# ------------------------- debugging ----------------------------------------- + +proc esc(c: char, reserved = {'\0'..'\255'}): string = + case c + of '\b': result = "\\b" + of '\t': result = "\\t" + of '\c': result = "\\c" + of '\L': result = "\\l" + of '\v': result = "\\v" + of '\f': result = "\\f" + of '\e': result = "\\e" + of '\a': result = "\\a" + of '\\': result = "\\\\" + of 'a'..'z', 'A'..'Z', '0'..'9', '_': result = $c + elif c < ' ' or c >= '\128': result = '\\' & $ord(c) + elif c in reserved: result = '\\' & c + else: result = $c + +proc singleQuoteEsc(c: Char): string = return "'" & esc(c, {'\''}) & "'" + +proc singleQuoteEsc(str: string): string = + result = "'" + for c in items(str): add result, esc(c, {'\''}) + add result, '\'' + +proc charSetEscAux(cc: set[char]): string = + const reserved = {'^', '-', ']'} + result = "" + var c1 = 0 + while c1 <= 0xff: + if chr(c1) in cc: + var c2 = c1 + while c2 < 0xff and chr(succ(c2)) in cc: inc(c2) + if c1 == c2: + add result, esc(chr(c1), reserved) + elif c2 == succ(c1): + add result, esc(chr(c1), reserved) & esc(chr(c2), reserved) + else: + add result, esc(chr(c1), reserved) & '-' & esc(chr(c2), reserved) + c1 = c2 + inc(c1) + +proc CharSetEsc(cc: set[char]): string = + if card(cc) >= 128+64: + result = "[^" & CharSetEscAux({'\1'..'\xFF'} - cc) & ']' + else: + result = '[' & CharSetEscAux(cc) & ']' + +proc toStrAux(r: TPeg, res: var string) = + case r.kind + of pkEmpty: add(res, "()") + of pkAny: add(res, '.') + of pkAnyRune: add(res, '_') + of pkLetter: add(res, "\\letter") + of pkLower: add(res, "\\lower") + of pkUpper: add(res, "\\upper") + of pkTitle: add(res, "\\title") + of pkWhitespace: add(res, "\\white") + + of pkNewline: add(res, "\\n") + of pkTerminal: add(res, singleQuoteEsc(r.term)) + of pkTerminalIgnoreCase: + add(res, 'i') + add(res, singleQuoteEsc(r.term)) + of pkTerminalIgnoreStyle: + add(res, 'y') + add(res, singleQuoteEsc(r.term)) + of pkChar: add(res, singleQuoteEsc(r.ch)) + of pkCharChoice: add(res, charSetEsc(r.charChoice^)) + of pkNonTerminal: add(res, r.nt.name) + of pkSequence: + add(res, '(') + toStrAux(r.sons[0], res) + for i in 1 .. high(r.sons): + add(res, ' ') + toStrAux(r.sons[i], res) + add(res, ')') + of pkOrderedChoice: + add(res, '(') + toStrAux(r.sons[0], res) + for i in 1 .. high(r.sons): + add(res, " / ") + toStrAux(r.sons[i], res) + add(res, ')') + of pkGreedyRep: + toStrAux(r.sons[0], res) + add(res, '*') + of pkGreedyRepChar: + add(res, singleQuoteEsc(r.ch)) + add(res, '*') + of pkGreedyRepSet: + add(res, charSetEsc(r.charChoice^)) + add(res, '*') + of pkGreedyAny: + add(res, ".*") + of pkOption: + toStrAux(r.sons[0], res) + add(res, '?') + of pkAndPredicate: + add(res, '&') + toStrAux(r.sons[0], res) + of pkNotPredicate: + add(res, '!') + toStrAux(r.sons[0], res) + of pkSearch: + add(res, '@') + toStrAux(r.sons[0], res) + of pkCapturedSearch: + add(res, "{@}") + toStrAux(r.sons[0], res) + of pkCapture: + add(res, '{') + toStrAux(r.sons[0], res) + add(res, '}') + of pkBackRef: + add(res, '$') + add(res, $r.index) + of pkBackRefIgnoreCase: + add(res, "i$") + add(res, $r.index) + of pkBackRefIgnoreStyle: + add(res, "y$") + add(res, $r.index) + of pkRule: + toStrAux(r.sons[0], res) + add(res, " <- ") + toStrAux(r.sons[1], res) + of pkList: + for i in 0 .. high(r.sons): + toStrAux(r.sons[i], res) + add(res, "\n") + of pkStartAnchor: + add(res, '^') + +proc `$` *(r: TPeg): string {.rtl, extern: "npegsToString".} = + ## converts a PEG to its string representation + result = "" + toStrAux(r, result) + +# --------------------- core engine ------------------------------------------- + +type + TCaptures* {.final.} = object ## contains the captured substrings. + matches: array[0..maxSubpatterns-1, tuple[first, last: int]] + ml: int + origStart: int + +proc bounds*(c: TCaptures, + i: range[0..maxSubpatterns-1]): tuple[first, last: int] = + ## returns the bounds ``[first..last]`` of the `i`'th capture. + result = c.matches[i] + +when not useUnicode: + type + TRune = char + template fastRuneAt(s, i, ch: expr) = + ch = s[i] + inc(i) + template runeLenAt(s, i: expr): expr = 1 + + proc isAlpha(a: char): bool {.inline.} = return a in {'a'..'z','A'..'Z'} + proc isUpper(a: char): bool {.inline.} = return a in {'A'..'Z'} + proc isLower(a: char): bool {.inline.} = return a in {'a'..'z'} + proc isTitle(a: char): bool {.inline.} = return false + proc isWhiteSpace(a: char): bool {.inline.} = return a in {' ', '\9'..'\13'} + +proc rawMatch*(s: string, p: TPeg, start: int, c: var TCaptures): int {. + rtl, extern: "npegs$1".} = + ## low-level matching proc that implements the PEG interpreter. Use this + ## for maximum efficiency (every other PEG operation ends up calling this + ## proc). + ## Returns -1 if it does not match, else the length of the match + case p.kind + of pkEmpty: result = 0 # match of length 0 + of pkAny: + if s[start] != '\0': result = 1 + else: result = -1 + of pkAnyRune: + if s[start] != '\0': + result = runeLenAt(s, start) + else: + result = -1 + of pkLetter: + if s[start] != '\0': + var a: TRune + result = start + fastRuneAt(s, result, a) + if isAlpha(a): dec(result, start) + else: result = -1 + else: + result = -1 + of pkLower: + if s[start] != '\0': + var a: TRune + result = start + fastRuneAt(s, result, a) + if isLower(a): dec(result, start) + else: result = -1 + else: + result = -1 + of pkUpper: + if s[start] != '\0': + var a: TRune + result = start + fastRuneAt(s, result, a) + if isUpper(a): dec(result, start) + else: result = -1 + else: + result = -1 + of pkTitle: + if s[start] != '\0': + var a: TRune + result = start + fastRuneAt(s, result, a) + if isTitle(a): dec(result, start) + else: result = -1 + else: + result = -1 + of pkWhitespace: + if s[start] != '\0': + var a: TRune + result = start + fastRuneAt(s, result, a) + if isWhitespace(a): dec(result, start) + else: result = -1 + else: + result = -1 + of pkGreedyAny: + result = len(s) - start + of pkNewLine: + if s[start] == '\L': result = 1 + elif s[start] == '\C': + if s[start+1] == '\L': result = 2 + else: result = 1 + else: result = -1 + of pkTerminal: + result = len(p.term) + for i in 0..result-1: + if p.term[i] != s[start+i]: + result = -1 + break + of pkTerminalIgnoreCase: + var + i = 0 + a, b: TRune + result = start + while i < len(p.term): + fastRuneAt(p.term, i, a) + fastRuneAt(s, result, b) + if toLower(a) != toLower(b): + result = -1 + break + dec(result, start) + of pkTerminalIgnoreStyle: + var + i = 0 + a, b: TRune + result = start + while i < len(p.term): + while true: + fastRuneAt(p.term, i, a) + if a != TRune('_'): break + while true: + fastRuneAt(s, result, b) + if b != TRune('_'): break + if toLower(a) != toLower(b): + result = -1 + break + dec(result, start) + of pkChar: + if p.ch == s[start]: result = 1 + else: result = -1 + of pkCharChoice: + if contains(p.charChoice^, s[start]): result = 1 + else: result = -1 + of pkNonTerminal: + var oldMl = c.ml + when false: echo "enter: ", p.nt.name + result = rawMatch(s, p.nt.rule, start, c) + when false: echo "leave: ", p.nt.name + if result < 0: c.ml = oldMl + of pkSequence: + var oldMl = c.ml + result = 0 + assert(not isNil(p.sons)) + for i in 0..high(p.sons): + var x = rawMatch(s, p.sons[i], start+result, c) + if x < 0: + c.ml = oldMl + result = -1 + break + else: inc(result, x) + of pkOrderedChoice: + var oldMl = c.ml + for i in 0..high(p.sons): + result = rawMatch(s, p.sons[i], start, c) + if result >= 0: break + c.ml = oldMl + of pkSearch: + var oldMl = c.ml + result = 0 + while start+result < s.len: + var x = rawMatch(s, p.sons[0], start+result, c) + if x >= 0: + inc(result, x) + return + inc(result) + result = -1 + c.ml = oldMl + of pkCapturedSearch: + var idx = c.ml # reserve a slot for the subpattern + inc(c.ml) + result = 0 + while start+result < s.len: + var x = rawMatch(s, p.sons[0], start+result, c) + if x >= 0: + if idx < maxSubpatterns: + c.matches[idx] = (start, start+result-1) + #else: silently ignore the capture + inc(result, x) + return + inc(result) + result = -1 + c.ml = idx + of pkGreedyRep: + result = 0 + while true: + var x = rawMatch(s, p.sons[0], start+result, c) + # if x == 0, we have an endless loop; so the correct behaviour would be + # not to break. But endless loops can be easily introduced: + # ``(comment / \w*)*`` is such an example. Breaking for x == 0 does the + # expected thing in this case. + if x <= 0: break + inc(result, x) + of pkGreedyRepChar: + result = 0 + var ch = p.ch + while ch == s[start+result]: inc(result) + of pkGreedyRepSet: + result = 0 + while contains(p.charChoice^, s[start+result]): inc(result) + of pkOption: + result = max(0, rawMatch(s, p.sons[0], start, c)) + of pkAndPredicate: + var oldMl = c.ml + result = rawMatch(s, p.sons[0], start, c) + if result >= 0: result = 0 # do not consume anything + else: c.ml = oldMl + of pkNotPredicate: + var oldMl = c.ml + result = rawMatch(s, p.sons[0], start, c) + if result < 0: result = 0 + else: + c.ml = oldMl + result = -1 + of pkCapture: + var idx = c.ml # reserve a slot for the subpattern + inc(c.ml) + result = rawMatch(s, p.sons[0], start, c) + if result >= 0: + if idx < maxSubpatterns: + c.matches[idx] = (start, start+result-1) + #else: silently ignore the capture + else: + c.ml = idx + of pkBackRef..pkBackRefIgnoreStyle: + if p.index >= c.ml: return -1 + var (a, b) = c.matches[p.index] + var n: TPeg + n.kind = succ(pkTerminal, ord(p.kind)-ord(pkBackRef)) + n.term = s.copy(a, b) + result = rawMatch(s, n, start, c) + of pkStartAnchor: + if c.origStart == start: result = 0 + else: result = -1 + of pkRule, pkList: assert false + +proc match*(s: string, pattern: TPeg, matches: var openarray[string], + start = 0): bool {.rtl, extern: "npegs$1Capture".} = + ## returns ``true`` if ``s[start..]`` matches the ``pattern`` and + ## the captured substrings in the array ``matches``. If it does not + ## match, nothing is written into ``matches`` and ``false`` is + ## returned. + var c: TCaptures + c.origStart = start + result = rawMatch(s, pattern, start, c) == len(s) -start + if result: + for i in 0..c.ml-1: + matches[i] = copy(s, c.matches[i][0], c.matches[i][1]) + +proc match*(s: string, pattern: TPeg, + start = 0): bool {.rtl, extern: "npegs$1".} = + ## returns ``true`` if ``s`` matches the ``pattern`` beginning from ``start``. + var c: TCaptures + c.origStart = start + result = rawMatch(s, pattern, start, c) == len(s)-start + +proc matchLen*(s: string, pattern: TPeg, matches: var openarray[string], + start = 0): int {.rtl, extern: "npegs$1Capture".} = + ## the same as ``match``, but it returns the length of the match, + ## if there is no match, -1 is returned. Note that a match length + ## of zero can happen. It's possible that a suffix of `s` remains + ## that does not belong to the match. + var c: TCaptures + c.origStart = start + result = rawMatch(s, pattern, start, c) + if result >= 0: + for i in 0..c.ml-1: + matches[i] = copy(s, c.matches[i][0], c.matches[i][1]) + +proc matchLen*(s: string, pattern: TPeg, + start = 0): int {.rtl, extern: "npegs$1".} = + ## the same as ``match``, but it returns the length of the match, + ## if there is no match, -1 is returned. Note that a match length + ## of zero can happen. It's possible that a suffix of `s` remains + ## that does not belong to the match. + var c: TCaptures + c.origStart = start + result = rawMatch(s, pattern, start, c) + +proc find*(s: string, pattern: TPeg, matches: var openarray[string], + start = 0): int {.rtl, extern: "npegs$1Capture".} = + ## returns the starting position of ``pattern`` in ``s`` and the captured + ## substrings in the array ``matches``. If it does not match, nothing + ## is written into ``matches`` and -1 is returned. + for i in start .. s.len-1: + if matchLen(s, pattern, matches, i) >= 0: return i + return -1 + # could also use the pattern here: (!P .)* P + +proc findBounds*(s: string, pattern: TPeg, matches: var openarray[string], + start = 0): tuple[first, last: int] {. + rtl, extern: "npegs$1Capture".} = + ## returns the starting position and end position of ``pattern`` in ``s`` + ## and the captured + ## substrings in the array ``matches``. If it does not match, nothing + ## is written into ``matches`` and (-1,0) is returned. + for i in start .. s.len-1: + var L = matchLen(s, pattern, matches, i) + if L >= 0: return (i, i+L-1) + return (-1, 0) + +proc find*(s: string, pattern: TPeg, + start = 0): int {.rtl, extern: "npegs$1".} = + ## returns the starting position of ``pattern`` in ``s``. If it does not + ## match, -1 is returned. + for i in start .. s.len-1: + if matchLen(s, pattern, i) >= 0: return i + return -1 + +iterator findAll*(s: string, pattern: TPeg, start = 0): string = + ## yields all matching captures of pattern in `s`. + var matches: array[0..MaxSubpatterns-1, string] + var i = start + while i < s.len: + var L = matchLen(s, pattern, matches, i) + if L < 0: break + for k in 0..maxSubPatterns-1: + if isNil(matches[k]): break + yield matches[k] + inc(i, L) + +proc findAll*(s: string, pattern: TPeg, start = 0): seq[string] {. + rtl, extern: "npegs$1".} = + ## returns all matching captures of pattern in `s`. + ## If it does not match, @[] is returned. + accumulateResult(findAll(s, pattern, start)) + +template `=~`*(s: string, pattern: TPeg): expr = + ## This calls ``match`` with an implicit declared ``matches`` array that + ## can be used in the scope of the ``=~`` call: + ## + ## .. code-block:: nimrod + ## + ## if line =~ peg"\s* {\w+} \s* '=' \s* {\w+}": + ## # matches a key=value pair: + ## echo("Key: ", matches[0]) + ## echo("Value: ", matches[1]) + ## elif line =~ peg"\s*{'#'.*}": + ## # matches a comment + ## # note that the implicit ``matches`` array is different from the + ## # ``matches`` array of the first branch + ## echo("comment: ", matches[0]) + ## else: + ## echo("syntax error") + ## + when not definedInScope(matches): + var matches: array[0..maxSubpatterns-1, string] + match(s, pattern, matches) + +# ------------------------- more string handling ------------------------------ + +proc contains*(s: string, pattern: TPeg, start = 0): bool {. + rtl, extern: "npegs$1".} = + ## same as ``find(s, pattern, start) >= 0`` + return find(s, pattern, start) >= 0 + +proc contains*(s: string, pattern: TPeg, matches: var openArray[string], + start = 0): bool {.rtl, extern: "npegs$1Capture".} = + ## same as ``find(s, pattern, matches, start) >= 0`` + return find(s, pattern, matches, start) >= 0 + +proc startsWith*(s: string, prefix: TPeg, start = 0): bool {. + rtl, extern: "npegs$1".} = + ## returns true if `s` starts with the pattern `prefix` + result = matchLen(s, prefix, start) >= 0 + +proc endsWith*(s: string, suffix: TPeg, start = 0): bool {. + rtl, extern: "npegs$1".} = + ## returns true if `s` ends with the pattern `prefix` + for i in start .. s.len-1: + if matchLen(s, suffix, i) == s.len - i: return true + +proc replacef*(s: string, sub: TPeg, by: string): string {. + rtl, extern: "npegs$1".} = + ## Replaces `sub` in `s` by the string `by`. Captures can be accessed in `by` + ## with the notation ``$i`` and ``$#`` (see strutils.`%`). Examples: + ## + ## .. code-block:: nimrod + ## "var1=key; var2=key2".replace(peg"{\ident}'='{\ident}", "$1<-$2$2") + ## + ## Results in: + ## + ## .. code-block:: nimrod + ## + ## "var1<-keykey; val2<-key2key2" + result = "" + var i = 0 + var caps: array[0..maxSubpatterns-1, string] + while i < s.len: + var x = matchLen(s, sub, caps, i) + if x <= 0: + add(result, s[i]) + inc(i) + else: + addf(result, by, caps) + inc(i, x) + add(result, copy(s, i)) + +proc replace*(s: string, sub: TPeg, by = ""): string {. + rtl, extern: "npegs$1".} = + ## Replaces `sub` in `s` by the string `by`. Captures cannot be accessed + ## in `by`. + result = "" + var i = 0 + var caps: array[0..maxSubpatterns-1, string] + while i < s.len: + var x = matchLen(s, sub, caps, i) + if x <= 0: + add(result, s[i]) + inc(i) + else: + addf(result, by, caps) + inc(i, x) + add(result, copy(s, i)) + +proc parallelReplace*(s: string, subs: openArray[ + tuple[pattern: TPeg, repl: string]]): string {. + rtl, extern: "npegs$1".} = + ## Returns a modified copy of `s` with the substitutions in `subs` + ## applied in parallel. + result = "" + var i = 0 + var caps: array[0..maxSubpatterns-1, string] + while i < s.len: + block searchSubs: + for j in 0..high(subs): + var x = matchLen(s, subs[j][0], caps, i) + if x > 0: + addf(result, subs[j][1], caps) + inc(i, x) + break searchSubs + add(result, s[i]) + inc(i) + # copy the rest: + add(result, copy(s, i)) + +proc transformFile*(infile, outfile: string, + subs: openArray[tuple[pattern: TPeg, repl: string]]) {. + rtl, extern: "npegs$1".} = + ## reads in the file `infile`, performs a parallel replacement (calls + ## `parallelReplace`) and writes back to `outfile`. Calls ``quit`` if an + ## error occurs. This is supposed to be used for quick scripting. + var x = readFile(infile) + if not isNil(x): + var f: TFile + if open(f, outfile, fmWrite): + write(f, x.parallelReplace(subs)) + close(f) + else: + quit("cannot open for writing: " & outfile) + else: + quit("cannot open for reading: " & infile) + +iterator split*(s: string, sep: TPeg): string = + ## Splits the string `s` into substrings. + ## + ## Substrings are separated by the PEG `sep`. + ## Examples: + ## + ## .. code-block:: nimrod + ## for word in split("00232this02939is39an22example111", peg"\d+"): + ## writeln(stdout, word) + ## + ## Results in: + ## + ## .. code-block:: nimrod + ## "this" + ## "is" + ## "an" + ## "example" + ## + var + first = 0 + last = 0 + while last < len(s): + var x = matchLen(s, sep, last) + if x > 0: inc(last, x) + first = last + while last < len(s): + inc(last) + x = matchLen(s, sep, last) + if x > 0: break + if first < last: + yield copy(s, first, last-1) + +proc split*(s: string, sep: TPeg): seq[string] {. + rtl, extern: "npegs$1".} = + ## Splits the string `s` into substrings. + accumulateResult(split(s, sep)) + +# ------------------- scanner ------------------------------------------------- + +type + TModifier = enum + modNone, + modVerbatim, + modIgnoreCase, + modIgnoreStyle + TTokKind = enum ## enumeration of all tokens + tkInvalid, ## invalid token + tkEof, ## end of file reached + tkAny, ## . + tkAnyRune, ## _ + tkIdentifier, ## abc + tkStringLit, ## "abc" or 'abc' + tkCharSet, ## [^A-Z] + tkParLe, ## '(' + tkParRi, ## ')' + tkCurlyLe, ## '{' + tkCurlyRi, ## '}' + tkCurlyAt, ## '{@}' + tkArrow, ## '<-' + tkBar, ## '/' + tkStar, ## '*' + tkPlus, ## '+' + tkAmp, ## '&' + tkNot, ## '!' + tkOption, ## '?' + tkAt, ## '@' + tkBuiltin, ## \identifier + tkEscaped, ## \\ + tkBackref, ## '$' + tkDollar, ## '$' + tkHat ## '^' + + TToken {.final.} = object ## a token + kind: TTokKind ## the type of the token + modifier: TModifier + literal: string ## the parsed (string) literal + charset: set[char] ## if kind == tkCharSet + index: int ## if kind == tkBackref + + TPegLexer = object ## the lexer object. + bufpos: int ## the current position within the buffer + buf: cstring ## the buffer itself + LineNumber: int ## the current line number + lineStart: int ## index of last line start in buffer + colOffset: int ## column to add + filename: string + +const + tokKindToStr: array[TTokKind, string] = [ + "invalid", "[EOF]", ".", "_", "identifier", "string literal", + "character set", "(", ")", "{", "}", "{@}", + "<-", "/", "*", "+", "&", "!", "?", + "@", "built-in", "escaped", "$", "$", "^" + ] + +proc HandleCR(L: var TPegLexer, pos: int): int = + assert(L.buf[pos] == '\c') + inc(L.linenumber) + result = pos+1 + if L.buf[result] == '\L': inc(result) + L.lineStart = result + +proc HandleLF(L: var TPegLexer, pos: int): int = + assert(L.buf[pos] == '\L') + inc(L.linenumber) + result = pos+1 + L.lineStart = result + +proc init(L: var TPegLexer, input, filename: string, line = 1, col = 0) = + L.buf = input + L.bufpos = 0 + L.lineNumber = line + L.colOffset = col + L.lineStart = 0 + L.filename = filename + +proc getColumn(L: TPegLexer): int {.inline.} = + result = abs(L.bufpos - L.lineStart) + L.colOffset + +proc getLine(L: TPegLexer): int {.inline.} = + result = L.linenumber + +proc errorStr(L: TPegLexer, msg: string, line = -1, col = -1): string = + var line = if line < 0: getLine(L) else: line + var col = if col < 0: getColumn(L) else: col + result = "$1($2, $3) Error: $4" % [L.filename, $line, $col, msg] + +proc handleHexChar(c: var TPegLexer, xi: var int) = + case c.buf[c.bufpos] + of '0'..'9': + xi = (xi shl 4) or (ord(c.buf[c.bufpos]) - ord('0')) + inc(c.bufpos) + of 'a'..'f': + xi = (xi shl 4) or (ord(c.buf[c.bufpos]) - ord('a') + 10) + inc(c.bufpos) + of 'A'..'F': + xi = (xi shl 4) or (ord(c.buf[c.bufpos]) - ord('A') + 10) + inc(c.bufpos) + else: nil + +proc getEscapedChar(c: var TPegLexer, tok: var TToken) = + inc(c.bufpos) + case c.buf[c.bufpos] + of 'r', 'R', 'c', 'C': + add(tok.literal, '\c') + Inc(c.bufpos) + of 'l', 'L': + add(tok.literal, '\L') + Inc(c.bufpos) + of 'f', 'F': + add(tok.literal, '\f') + inc(c.bufpos) + of 'e', 'E': + add(tok.literal, '\e') + Inc(c.bufpos) + of 'a', 'A': + add(tok.literal, '\a') + Inc(c.bufpos) + of 'b', 'B': + add(tok.literal, '\b') + Inc(c.bufpos) + of 'v', 'V': + add(tok.literal, '\v') + Inc(c.bufpos) + of 't', 'T': + add(tok.literal, '\t') + Inc(c.bufpos) + of 'x', 'X': + inc(c.bufpos) + var xi = 0 + handleHexChar(c, xi) + handleHexChar(c, xi) + if xi == 0: tok.kind = tkInvalid + else: add(tok.literal, Chr(xi)) + of '0'..'9': + var val = ord(c.buf[c.bufpos]) - ord('0') + Inc(c.bufpos) + var i = 1 + while (i <= 3) and (c.buf[c.bufpos] in {'0'..'9'}): + val = val * 10 + ord(c.buf[c.bufpos]) - ord('0') + inc(c.bufpos) + inc(i) + if val > 0 and val <= 255: add(tok.literal, chr(val)) + else: tok.kind = tkInvalid + of '\0'..'\31': + tok.kind = tkInvalid + elif c.buf[c.bufpos] in strutils.letters: + tok.kind = tkInvalid + else: + add(tok.literal, c.buf[c.bufpos]) + Inc(c.bufpos) + +proc skip(c: var TPegLexer) = + var pos = c.bufpos + var buf = c.buf + while true: + case buf[pos] + of ' ', '\t': + Inc(pos) + of '#': + while not (buf[pos] in {'\c', '\L', '\0'}): inc(pos) + of '\c': + pos = HandleCR(c, pos) + buf = c.buf + of '\L': + pos = HandleLF(c, pos) + buf = c.buf + else: + break # EndOfFile also leaves the loop + c.bufpos = pos + +proc getString(c: var TPegLexer, tok: var TToken) = + tok.kind = tkStringLit + var pos = c.bufPos + 1 + var buf = c.buf + var quote = buf[pos-1] + while true: + case buf[pos] + of '\\': + c.bufpos = pos + getEscapedChar(c, tok) + pos = c.bufpos + of '\c', '\L', '\0': + tok.kind = tkInvalid + break + elif buf[pos] == quote: + inc(pos) + break + else: + add(tok.literal, buf[pos]) + Inc(pos) + c.bufpos = pos + +proc getDollar(c: var TPegLexer, tok: var TToken) = + var pos = c.bufPos + 1 + var buf = c.buf + if buf[pos] in {'0'..'9'}: + tok.kind = tkBackref + tok.index = 0 + while buf[pos] in {'0'..'9'}: + tok.index = tok.index * 10 + ord(buf[pos]) - ord('0') + inc(pos) + else: + tok.kind = tkDollar + c.bufpos = pos + +proc getCharSet(c: var TPegLexer, tok: var TToken) = + tok.kind = tkCharSet + tok.charset = {} + var pos = c.bufPos + 1 + var buf = c.buf + var caret = false + if buf[pos] == '^': + inc(pos) + caret = true + while true: + var ch: char + case buf[pos] + of ']': + inc(pos) + break + of '\\': + c.bufpos = pos + getEscapedChar(c, tok) + pos = c.bufpos + ch = tok.literal[tok.literal.len-1] + of '\C', '\L', '\0': + tok.kind = tkInvalid + break + else: + ch = buf[pos] + Inc(pos) + incl(tok.charset, ch) + if buf[pos] == '-': + if buf[pos+1] == ']': + incl(tok.charset, '-') + inc(pos) + else: + inc(pos) + var ch2: char + case buf[pos] + of '\\': + c.bufpos = pos + getEscapedChar(c, tok) + pos = c.bufpos + ch2 = tok.literal[tok.literal.len-1] + of '\C', '\L', '\0': + tok.kind = tkInvalid + break + else: + ch2 = buf[pos] + Inc(pos) + for i in ord(ch)+1 .. ord(ch2): + incl(tok.charset, chr(i)) + c.bufpos = pos + if caret: tok.charset = {'\1'..'\xFF'} - tok.charset + +proc getSymbol(c: var TPegLexer, tok: var TToken) = + var pos = c.bufpos + var buf = c.buf + while true: + add(tok.literal, buf[pos]) + Inc(pos) + if buf[pos] notin strutils.IdentChars: break + c.bufpos = pos + tok.kind = tkIdentifier + +proc getBuiltin(c: var TPegLexer, tok: var TToken) = + if c.buf[c.bufpos+1] in strutils.Letters: + inc(c.bufpos) + getSymbol(c, tok) + tok.kind = tkBuiltin + else: + tok.kind = tkEscaped + getEscapedChar(c, tok) # may set tok.kind to tkInvalid + +proc getTok(c: var TPegLexer, tok: var TToken) = + tok.kind = tkInvalid + tok.modifier = modNone + setlen(tok.literal, 0) + skip(c) + case c.buf[c.bufpos] + of '{': + inc(c.bufpos) + if c.buf[c.bufpos] == '@' and c.buf[c.bufpos+1] == '}': + tok.kind = tkCurlyAt + inc(c.bufpos, 2) + add(tok.literal, "{@}") + else: + tok.kind = tkCurlyLe + add(tok.literal, '{') + of '}': + tok.kind = tkCurlyRi + inc(c.bufpos) + add(tok.literal, '}') + of '[': + getCharset(c, tok) + of '(': + tok.kind = tkParLe + Inc(c.bufpos) + add(tok.literal, '(') + of ')': + tok.kind = tkParRi + Inc(c.bufpos) + add(tok.literal, ')') + of '.': + tok.kind = tkAny + inc(c.bufpos) + add(tok.literal, '.') + of '_': + tok.kind = tkAnyRune + inc(c.bufpos) + add(tok.literal, '_') + of '\\': + getBuiltin(c, tok) + of '\'', '"': getString(c, tok) + of '$': getDollar(c, tok) + of '\0': + tok.kind = tkEof + tok.literal = "[EOF]" + of 'a'..'z', 'A'..'Z', '\128'..'\255': + getSymbol(c, tok) + if c.buf[c.bufpos] in {'\'', '"'} or + c.buf[c.bufpos] == '$' and c.buf[c.bufpos+1] in {'0'..'9'}: + case tok.literal + of "i": tok.modifier = modIgnoreCase + of "y": tok.modifier = modIgnoreStyle + of "v": tok.modifier = modVerbatim + else: nil + setLen(tok.literal, 0) + if c.buf[c.bufpos] == '$': + getDollar(c, tok) + else: + getString(c, tok) + if tok.modifier == modNone: tok.kind = tkInvalid + of '+': + tok.kind = tkPlus + inc(c.bufpos) + add(tok.literal, '+') + of '*': + tok.kind = tkStar + inc(c.bufpos) + add(tok.literal, '+') + of '<': + if c.buf[c.bufpos+1] == '-': + inc(c.bufpos, 2) + tok.kind = tkArrow + add(tok.literal, "<-") + else: + add(tok.literal, '<') + of '/': + tok.kind = tkBar + inc(c.bufpos) + add(tok.literal, '/') + of '?': + tok.kind = tkOption + inc(c.bufpos) + add(tok.literal, '?') + of '!': + tok.kind = tkNot + inc(c.bufpos) + add(tok.literal, '!') + of '&': + tok.kind = tkAmp + inc(c.bufpos) + add(tok.literal, '!') + of '@': + tok.kind = tkAt + inc(c.bufpos) + add(tok.literal, '@') + if c.buf[c.bufpos] == '@': + tok.kind = tkCurlyAt + inc(c.bufpos) + add(tok.literal, '@') + of '^': + tok.kind = tkHat + inc(c.bufpos) + add(tok.literal, '^') + else: + add(tok.literal, c.buf[c.bufpos]) + inc(c.bufpos) + +proc arrowIsNextTok(c: TPegLexer): bool = + # the only look ahead we need + var pos = c.bufpos + while c.buf[pos] in {'\t', ' '}: inc(pos) + result = c.buf[pos] == '<' and c.buf[pos+1] == '-' + +# ----------------------------- parser ---------------------------------------- + +type + EInvalidPeg* = object of EInvalidValue ## raised if an invalid + ## PEG has been detected + TPegParser = object of TPegLexer ## the PEG parser object + tok: TToken + nonterms: seq[PNonTerminal] + modifier: TModifier + captures: int + identIsVerbatim: bool + skip: TPeg + +proc pegError(p: TPegParser, msg: string, line = -1, col = -1) = + var e: ref EInvalidPeg + new(e) + e.msg = errorStr(p, msg, line, col) + raise e + +proc getTok(p: var TPegParser) = + getTok(p, p.tok) + if p.tok.kind == tkInvalid: pegError(p, "invalid token") + +proc eat(p: var TPegParser, kind: TTokKind) = + if p.tok.kind == kind: getTok(p) + else: pegError(p, tokKindToStr[kind] & " expected") + +proc parseExpr(p: var TPegParser): TPeg + +proc getNonTerminal(p: var TPegParser, name: string): PNonTerminal = + for i in 0..high(p.nonterms): + result = p.nonterms[i] + if cmpIgnoreStyle(result.name, name) == 0: return + # forward reference: + result = newNonTerminal(name, getLine(p), getColumn(p)) + add(p.nonterms, result) + +proc modifiedTerm(s: string, m: TModifier): TPeg = + case m + of modNone, modVerbatim: result = term(s) + of modIgnoreCase: result = termIgnoreCase(s) + of modIgnoreStyle: result = termIgnoreStyle(s) + +proc modifiedBackref(s: int, m: TModifier): TPeg = + case m + of modNone, modVerbatim: result = backRef(s) + of modIgnoreCase: result = backRefIgnoreCase(s) + of modIgnoreStyle: result = backRefIgnoreStyle(s) + +proc builtin(p: var TPegParser): TPeg = + # do not use "y", "skip" or "i" as these would be ambiguous + case p.tok.literal + of "n": result = newLine() + of "d": result = charset({'0'..'9'}) + of "D": result = charset({'\1'..'\xff'} - {'0'..'9'}) + of "s": result = charset({' ', '\9'..'\13'}) + of "S": result = charset({'\1'..'\xff'} - {' ', '\9'..'\13'}) + of "w": result = charset({'a'..'z', 'A'..'Z', '_', '0'..'9'}) + of "W": result = charset({'\1'..'\xff'} - {'a'..'z','A'..'Z','_','0'..'9'}) + of "a": result = charset({'a'..'z', 'A'..'Z'}) + of "A": result = charset({'\1'..'\xff'} - {'a'..'z', 'A'..'Z'}) + of "ident": result = tpegs.ident + of "letter": result = UnicodeLetter() + of "upper": result = UnicodeUpper() + of "lower": result = UnicodeLower() + of "title": result = UnicodeTitle() + of "white": result = UnicodeWhitespace() + else: pegError(p, "unknown built-in: " & p.tok.literal) + +proc token(terminal: TPeg, p: TPegParser): TPeg = + if p.skip.kind == pkEmpty: result = terminal + else: result = sequence(p.skip, terminal) + +proc primary(p: var TPegParser): TPeg = + case p.tok.kind + of tkAmp: + getTok(p) + return &primary(p) + of tkNot: + getTok(p) + return !primary(p) + of tkAt: + getTok(p) + return @primary(p) + of tkCurlyAt: + getTok(p) + return @@primary(p).token(p) + else: nil + case p.tok.kind + of tkIdentifier: + if p.identIsVerbatim: + var m = p.tok.modifier + if m == modNone: m = p.modifier + result = modifiedTerm(p.tok.literal, m).token(p) + getTok(p) + elif not arrowIsNextTok(p): + var nt = getNonTerminal(p, p.tok.literal) + incl(nt.flags, ntUsed) + result = nonTerminal(nt).token(p) + getTok(p) + else: + pegError(p, "expression expected, but found: " & p.tok.literal) + of tkStringLit: + var m = p.tok.modifier + if m == modNone: m = p.modifier + result = modifiedTerm(p.tok.literal, m).token(p) + getTok(p) + of tkCharSet: + if '\0' in p.tok.charset: + pegError(p, "binary zero ('\\0') not allowed in character class") + result = charset(p.tok.charset).token(p) + getTok(p) + of tkParLe: + getTok(p) + result = parseExpr(p) + eat(p, tkParRi) + of tkCurlyLe: + getTok(p) + result = capture(parseExpr(p)).token(p) + eat(p, tkCurlyRi) + inc(p.captures) + of tkAny: + result = any().token(p) + getTok(p) + of tkAnyRune: + result = anyRune().token(p) + getTok(p) + of tkBuiltin: + result = builtin(p).token(p) + getTok(p) + of tkEscaped: + result = term(p.tok.literal[0]).token(p) + getTok(p) + of tkDollar: + result = endAnchor() + getTok(p) + of tkHat: + result = startAnchor() + getTok(p) + of tkBackref: + var m = p.tok.modifier + if m == modNone: m = p.modifier + result = modifiedBackRef(p.tok.index, m).token(p) + if p.tok.index < 0 or p.tok.index > p.captures: + pegError(p, "invalid back reference index: " & $p.tok.index) + getTok(p) + else: + pegError(p, "expression expected, but found: " & p.tok.literal) + getTok(p) # we must consume a token here to prevent endless loops! + while true: + case p.tok.kind + of tkOption: + result = ?result + getTok(p) + of tkStar: + result = *result + getTok(p) + of tkPlus: + result = +result + getTok(p) + else: break + +proc seqExpr(p: var TPegParser): TPeg = + result = primary(p) + while true: + case p.tok.kind + of tkAmp, tkNot, tkAt, tkStringLit, tkCharset, tkParLe, tkCurlyLe, + tkAny, tkAnyRune, tkBuiltin, tkEscaped, tkDollar, tkBackref, + tkHat, tkCurlyAt: + result = sequence(result, primary(p)) + of tkIdentifier: + if not arrowIsNextTok(p): + result = sequence(result, primary(p)) + else: break + else: break + +proc parseExpr(p: var TPegParser): TPeg = + result = seqExpr(p) + while p.tok.kind == tkBar: + getTok(p) + result = result / seqExpr(p) + +proc parseRule(p: var TPegParser): PNonTerminal = + if p.tok.kind == tkIdentifier and arrowIsNextTok(p): + result = getNonTerminal(p, p.tok.literal) + if ntDeclared in result.flags: + pegError(p, "attempt to redefine: " & result.name) + result.line = getLine(p) + result.col = getColumn(p) + getTok(p) + eat(p, tkArrow) + result.rule = parseExpr(p) + incl(result.flags, ntDeclared) # NOW inlining may be attempted + else: + pegError(p, "rule expected, but found: " & p.tok.literal) + +proc rawParse(p: var TPegParser): TPeg = + ## parses a rule or a PEG expression + while p.tok.kind == tkBuiltin: + case p.tok.literal + of "i": + p.modifier = modIgnoreCase + getTok(p) + of "y": + p.modifier = modIgnoreStyle + getTok(p) + of "skip": + getTok(p) + p.skip = ?primary(p) + else: break + if p.tok.kind == tkIdentifier and arrowIsNextTok(p): + result = parseRule(p).rule + while p.tok.kind != tkEof: + discard parseRule(p) + else: + p.identIsVerbatim = true + result = parseExpr(p) + if p.tok.kind != tkEof: + pegError(p, "EOF expected, but found: " & p.tok.literal) + for i in 0..high(p.nonterms): + var nt = p.nonterms[i] + if ntDeclared notin nt.flags: + pegError(p, "undeclared identifier: " & nt.name, nt.line, nt.col) + elif ntUsed notin nt.flags and i > 0: + pegError(p, "unused rule: " & nt.name, nt.line, nt.col) + +proc parsePeg*(pattern: string, filename = "pattern", line = 1, col = 0): TPeg = + ## constructs a TPeg object from `pattern`. `filename`, `line`, `col` are + ## used for error messages, but they only provide start offsets. `parsePeg` + ## keeps track of line and column numbers within `pattern`. + var p: TPegParser + init(TPegLexer(p), pattern, filename, line, col) + p.tok.kind = tkInvalid + p.tok.modifier = modNone + p.tok.literal = "" + p.tok.charset = {} + p.nonterms = @[] + p.identIsVerbatim = false + getTok(p) + result = rawParse(p) + +proc peg*(pattern: string): TPeg = + ## constructs a TPeg object from the `pattern`. The short name has been + ## chosen to encourage its use as a raw string modifier:: + ## + ## peg"{\ident} \s* '=' \s* {.*}" + result = parsePeg(pattern, "pattern") + +proc escapePeg*(s: string): string = + ## escapes `s` so that it is matched verbatim when used as a peg. + result = "" + var inQuote = false + for c in items(s): + case c + of '\0'..'\31', '\'', '"', '\\': + if inQuote: + result.add('\'') + inQuote = false + result.add("\\x") + result.add(toHex(ord(c), 2)) + else: + if not inQuote: + result.add('\'') + inQuote = true + result.add(c) + if inQuote: result.add('\'') + +when isMainModule: + assert escapePeg("abc''def'") == r"'abc'\x27\x27'def'\x27" + #assert match("(a b c)", peg"'(' @ ')'") + assert match("W_HI_Le", peg"\y 'while'") + assert(not match("W_HI_L", peg"\y 'while'")) + assert(not match("W_HI_Le", peg"\y v'while'")) + assert match("W_HI_Le", peg"y'while'") + + assert($ +digits == $peg"\d+") + assert "0158787".match(peg"\d+") + assert "ABC 0232".match(peg"\w+\s+\d+") + assert "ABC".match(peg"\d+ / \w+") + + for word in split("00232this02939is39an22example111", peg"\d+"): + writeln(stdout, word) + + assert matchLen("key", ident) == 3 + + var pattern = sequence(ident, *whitespace, term('='), *whitespace, ident) + assert matchLen("key1= cal9", pattern) == 11 + + var ws = newNonTerminal("ws", 1, 1) + ws.rule = *whitespace + + var expr = newNonTerminal("expr", 1, 1) + expr.rule = sequence(capture(ident), *sequence( + nonterminal(ws), term('+'), nonterminal(ws), nonterminal(expr))) + + var c: TCaptures + var s = "a+b + c +d+e+f" + assert rawMatch(s, expr.rule, 0, c) == len(s) + var a = "" + for i in 0..c.ml-1: + a.add(copy(s, c.matches[i][0], c.matches[i][1])) + assert a == "abcdef" + #echo expr.rule + + #const filename = "lib/devel/peg/grammar.txt" + #var grammar = parsePeg(newFileStream(filename, fmRead), filename) + #echo "a <- [abc]*?".match(grammar) + assert find("_____abc_______", term("abc"), 2) == 5 + assert match("_______ana", peg"A <- 'ana' / . A") + assert match("abcs%%%", peg"A <- ..A / .A / '%'") + + if "abc" =~ peg"{'a'}'bc' 'xyz' / {\ident}": + assert matches[0] == "abc" + else: + assert false + + var g2 = peg"""S <- A B / C D + A <- 'a'+ + B <- 'b'+ + C <- 'c'+ + D <- 'd'+ + """ + assert($g2 == "((A B) / (C D))") + assert match("cccccdddddd", g2) + assert("var1=key; var2=key2".replacef(peg"{\ident}'='{\ident}", "$1<-$2$2") == + "var1<-keykey; var2<-key2key2") + assert "var1=key; var2=key2".endsWith(peg"{\ident}'='{\ident}") + + if "aaaaaa" =~ peg"'aa' !. / ({'a'})+": + assert matches[0] == "a" + else: + assert false + + var matches: array[0..5, string] + if match("abcdefg", peg"c {d} ef {g}", matches, 2): + assert matches[0] == "d" + assert matches[1] == "g" + else: + assert false + + for x in findAll("abcdef", peg"{.}", 3): + echo x + + if "f(a, b)" =~ peg"{[0-9]+} / ({\ident} '(' {@} ')')": + assert matches[0] == "f" + assert matches[1] == "a, b" + else: + assert false + + assert match("eine übersicht und außerdem", peg"(\letter \white*)+") + # ß is not a lower cased letter?! + assert match("eine übersicht und auerdem", peg"(\lower \white*)+") + assert match("EINE ÜBERSICHT UND AUSSERDEM", peg"(\upper \white*)+") + assert(not match("456678", peg"(\letter)+")) + + assert("var1 = key; var2 = key2".replacef( + peg"\skip(\s*) {\ident}'='{\ident}", "$1<-$2$2") == + "var1<-keykey;var2<-key2key2") + + assert match("prefix/start", peg"^start$", 7) + + # tricky test to check for false aliasing: + block: + var a = term"key" + echo($sequence(sequence(a, term"value"), *a)) + diff --git a/tests/accept/run/tsort.nim b/tests/accept/run/tsort.nim index 306e7e360..306e7e360 100644..100755 --- a/tests/accept/run/tsort.nim +++ b/tests/accept/run/tsort.nim diff --git a/tests/accept/run/ttoseq.nim b/tests/accept/run/ttoseq.nim index d3332cce6..d3332cce6 100644..100755 --- a/tests/accept/run/ttoseq.nim +++ b/tests/accept/run/ttoseq.nim diff --git a/tests/gc/gcleak.nim b/tests/gc/gcleak.nim index a44c16f59..6eb1d24b0 100644..100755 --- a/tests/gc/gcleak.nim +++ b/tests/gc/gcleak.nim @@ -5,7 +5,10 @@ type proc MakeObj(): TTestObj = result.x = "Hello" -while true: +for i in 1 .. 100_000_000: var obj = MakeObj() + if getOccupiedMem() > 300_000: quit("still a leak!") # echo GC_getstatistics() +echo "no leak: ", getOccupiedMem() + diff --git a/tests/gc/gcleak2.nim b/tests/gc/gcleak2.nim index 5ab9da7c9..f9a80f269 100644..100755 --- a/tests/gc/gcleak2.nim +++ b/tests/gc/gcleak2.nim @@ -7,14 +7,13 @@ proc MakeObj(): TTestObj = result.x = "Hello" result.s = @[1,2,3] -#while true: -# var obj = MakeObj() -# echo GC_getstatistics() - proc inProc() = - while true: + for i in 1 .. 100_000_000: var obj: TTestObj obj = MakeObj() + if getOccupiedMem() > 300_000: quit("still a leak!") inProc() +echo "no leak: ", getOccupiedMem() + diff --git a/tests/reject/ttypenoval.nim b/tests/reject/ttypenoval.nim index ed91b05e2..ed91b05e2 100644..100755 --- a/tests/reject/ttypenoval.nim +++ b/tests/reject/ttypenoval.nim diff --git a/todo.txt b/todo.txt index ab892de8c..a72e03483 100755 --- a/todo.txt +++ b/todo.txt @@ -22,11 +22,13 @@ High priority (version 0.9.0) Bugs ---- - proc (x: int) is passable to proc (x: var int) !? -- detected by pegs module 64bit: p(result, result) should use a temporary! - the parser allows empty object case branches - Exception matching needs to take subclasses into account - pegs: the anchor '^' does not work because many procs use a linear search and matchLen() +- BUG: generic assign still buggy + - Optimization: If we use a temporary for the result anyway the code gen + should make use of this fact to generate better code... To implement @@ -51,18 +53,13 @@ Low priority - find a way to reintroduce the cleanup() pass for C code generation: this is hard because of partial evaluation --> symbol files will fix this as a side effect -- implement better error handling: errornous top level statements are ignored - and not part of the symbal table --> for REPL - --> this needs deletion operation for symbol table! - floating point checks for EcmaScript - enhance `` notation for identifier concatenation: `concat` a `these` - prefer proc in current module over other procs with same overloading result? - real types for template results - generalized case statement (requires better transf) -- normalize for the DOM - tlastmod returns wrong results on BSD (Linux, MacOS X: works) -- nested tuple unpacking -- fast assignment optimization for TPeg +- nested tuple unpacking; no auto-unpacking in 'for' loops! - better error messages for used keywords as identifiers - case statement branches should support constant sets diff --git a/tools/nimgrep.cfg b/tools/nimgrep.cfg index b931ff567..b931ff567 100644..100755 --- a/tools/nimgrep.cfg +++ b/tools/nimgrep.cfg diff --git a/web/news.txt b/web/news.txt index 805c9b200..3521b97cc 100755 --- a/web/news.txt +++ b/web/news.txt @@ -66,7 +66,7 @@ Additions - Added ``math.floor``. - The *interactive mode* (REPL) has been improved and documented for the first time. -- Added the ``linearScanEnd`` and ``unroll`` pragmas. +- Added the ``linearScanEnd``, ``unroll``, ``shallow`` pragmas. - The compiler now might use a hashing for string case statements depending on the number of string literals in the case statement. |