diff options
author | Andrey Makarov <ph.makarov@gmail.com> | 2022-09-14 19:28:01 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-09-14 18:28:01 +0200 |
commit | 2140d05f34f7976ed7f7058baa952490ee3fb859 (patch) | |
tree | 119f3225f9a255775521fa9b525394d909143af7 | |
parent | 08faa04d78aca9e619ba518fb9b4ab4e07635455 (diff) | |
download | Nim-2140d05f34f7976ed7f7058baa952490ee3fb859.tar.gz |
nimgrep: add `--inContext` and `--notinContext` options (#19528)
* nimgrep: add `--matchContext` and `--noMatchContext` options * Rename options for uniformity * Revise option names, add `--parentPath` options * Revert --bin deprecation * Copy-paste an original test from quantimnot The origin was: https://gist.githubusercontent.com/quantimnot/5d23b32fe0936ffc453220d20a87b9e2/raw/96544656d52332118295e55aa73718c389e5d194/tnimgrep.nim * Change ! to n * Attempt to fix test * Fix test on Windows * Change --contentsFile -> --inFile, add more tests * Bump * Change --parentPath to --dirpath
-rw-r--r-- | doc/nimgrep.md | 87 | ||||
-rw-r--r-- | doc/nimgrep_cmdline.txt | 48 | ||||
-rw-r--r-- | tests/tools/tnimgrep.nim | 402 | ||||
-rw-r--r-- | tools/nimgrep.nim | 248 |
4 files changed, 682 insertions, 103 deletions
diff --git a/doc/nimgrep.md b/doc/nimgrep.md index e000efb46..8fb86a9d3 100644 --- a/doc/nimgrep.md +++ b/doc/nimgrep.md @@ -34,6 +34,66 @@ Command line switches .. include:: nimgrep_cmdline.txt +Path filter options +------------------- + +Let us assume we have file `dirA/dirB/dirC/file.nim`. +Filesystem path options will match for these parts of the path: + +| option | matches for | +| :------------------ | :-------------------------------- | +| `--[not]extensions` | ``nim`` | +| `--[not]filename` | ``file.nim`` | +| `--[not]dirname` | ``dirA`` and ``dirB`` and ``dirC`` | +| `--[not]dirpath` | ``dirA/dirB/dirC`` | + +Combining multiple filter options together and negating them +------------------------------------------------------------ + +Options for filtering can be provided multiple times so they form a list, +which works as: +* positive filters + `--filename`, `--dirname`, `--dirpath`, `--inContext`, + `--inFile` accept files/matches if *any* pattern from the list is hit +* negative filters + `--notfilename`, `--notdirname`, `--notdirpath`, `--notinContext`, + `--notinFile` accept files/matches if *no* pattern from the list is hit. + +In other words the same filtering option repeated many times means logical OR. + +.. Important:: + Different filtering options are related by logical AND: they all must + be true for a match to be accepted. + E.g. `--filename:F --dirname:D1 --notdirname:D2` means + `filename(F) AND dirname(D1) AND (NOT dirname(D2))`. + +So negative filtering patterns are effectively related by logical OR also: +`(NOT PAT1) AND (NOT PAT2) == NOT (PAT1 OR PAT2)`:literal: in pseudo-code. + +That means you can always use only 1 such an option with logical OR, e.g. +`--notdirname:PAT1 --notdirname:PAT2` is fully equivalent to +`--notdirname:'PAT1|PAT2'`. + +.. Note:: + If you want logical AND on patterns you should compose 1 appropriate pattern, + possibly combined with multi-line mode `(?s)`:literal:. + E.g. to require that multi-line context of matches has occurences of + **both** PAT1 and PAT2 use positive lookaheads (`(?=PAT)`:literal:): + ```cmd + nimgrep --inContext:'(?s)(?=.*PAT1)(?=.*PAT2)' + ``` + +Meaning of `^`:literal: and `$`:literal: +======================================== + +`nimgrep`:cmd: PCRE engine is run in a single-line mode so +`^`:literal: matches the beginning of whole input *file* and +`$`:literal: matches the end of *file* (or whole input *string* for +options like `--filename`). + +Add the `(?m)`:literal: modifier to the beginning of your pattern for +`^`:literal: and `$`:literal: to match the beginnings and ends of *lines*. + Examples ======== @@ -51,23 +111,18 @@ All examples below use default PCRE Regex patterns: + To exclude version control directories (Git, Mercurial=hg, Subversion=svn) from the search: - ```cmd - nimgrep --excludeDir:'^\.git$' --excludeDir:'^\.hg$' --excludeDir:'^\.svn$' - # short: --ed:'^\.git$' --ed:'^\.hg$' --ed:'^\.svn$' + nimgrep --notdirname:'^\.git$' --notdirname:'^\.hg$' --notdirname:'^\.svn$' + # short: --ndi:'^\.git$' --ndi:'^\.hg$' --ndi:'^\.svn$' ``` - -+ To search only in paths containing the `tests` sub-directory recursively: - ++ To search only in paths containing the `tests`:literal: sub-directory + recursively: ```cmd - nimgrep --recursive --includeDir:'(^|/)tests($|/)' - # short: -r --id:'(^|/)tests($|/)' + nimgrep --recursive --dirname:'^tests$' + # short: -r --di:'^tests$' + # or using --dirpath: + nimgrep --recursive --dirpath:'(^|/)tests($|/)' + # short: -r --pa:'(^|/)tests($|/)' ``` - - .. Attention:: note the subtle difference between `--excludeDir`:option: and - `--includeDir`:option:\: the former is applied to relative directory entries - and the latter is applied to the whole paths - -+ Nimgrep can search multi-line, e.g. to find files containing `import` - and then `strutils` use pattern `'import(.|\n)*?strutils'`:option:. - ++ Nimgrep can search multi-line, e.g. to find files containing `import`:literal: + and then `strutils`:literal: use pattern `'import(.|\n)*?strutils'`:literal:. diff --git a/doc/nimgrep_cmdline.txt b/doc/nimgrep_cmdline.txt index 4ec344495..73f29f524 100644 --- a/doc/nimgrep_cmdline.txt +++ b/doc/nimgrep_cmdline.txt @@ -46,8 +46,7 @@ Options: nimgrep --filenames # In current dir nimgrep --filenames "" DIRECTORY # Note empty pattern "", lists all files in DIRECTORY - -* Interpret patterns: +* Interprete patterns: --peg PATTERN and PAT are Peg --re PATTERN and PAT are regular expressions (default) --rex, -x use the "extended" syntax for the regular expression @@ -62,28 +61,45 @@ Options: * File system walk: --recursive, -r process directories recursively --follow follow all symlinks when processing recursively - --ext:EX1|EX2|... only search the files with the given extension(s), - empty one ("--ext") means files with missing extension - --noExt:EX1|... exclude files having given extension(s), use empty one to - skip files with no extension (like some binary files are) - --includeFile:PAT search only files whose names contain pattern PAT - --excludeFile:PAT skip files whose names contain pattern PAT - --includeDir:PAT search only files with their whole directory path - containing PAT - --excludeDir:PAT skip directories whose name (not path) - contain pattern PAT - --if,--ef,--id,--ed abbreviations of the 4 options above --sortTime, -s[:asc|desc] order files by the last modification time (default: off): ascending (recent files go last) or descending -* Filter file content: - --match:PAT select files containing a (not displayed) match of PAT - --noMatch:PAT select files not containing any match of PAT +* Filter files (based on filesystem paths): + + .. Hint:: Instead of `not` you can type just `n` for negative options below. + + --ex[tensions]:EX1|EX2|... + only search the files with the given extension(s), + empty one (`--ex`) means files with missing extension + --notex[tensions]:EX1|EX2|... + exclude files having given extension(s), use empty one to + skip files with no extension (like some binary files are) + --fi[lename]:PAT search only files whose name matches pattern PAT + --notfi[lename]:PAT skip files whose name matches pattern PAT + --di[rname]:PAT select files that in their path have a directory name + that matches pattern PAT + --notdi[rname]:PAT do not descend into directories whose name (not path) + matches pattern PAT + --dirp[ath]:PAT select only files whose whole relative directory path + matches pattern PAT + --notdirp[ath]:PAT skip files whose whole relative directory path + matches pattern PAT + +* Filter files (based on file contents): + --inF[ile]:PAT select files containing a (not displayed) match of PAT + --notinF[ile]:PAT skip files containing a match of PAT --bin:on|off|only process binary files? (detected by \0 in first 1K bytes) (default: on - binary and text files treated the same way) --text, -t process only text files, the same as `--bin:off` +* Filter matches: + --inC[ontext]:PAT select only matches containing a match of PAT in their + surrounding context (multiline with `-c`, `-a`, `-b`) + --notinC[ontext]:PAT + skip matches not containing a match of PAT + in their surrounding context + * Represent results: --nocolor output will be given without any colors --color[:on] force color even if output is redirected (default: auto) diff --git a/tests/tools/tnimgrep.nim b/tests/tools/tnimgrep.nim new file mode 100644 index 000000000..e97b979f1 --- /dev/null +++ b/tests/tools/tnimgrep.nim @@ -0,0 +1,402 @@ +discard """ + output: ''' + +[Suite] nimgrep filesystem + +[Suite] nimgrep contents filtering +''' +""" +## Authors: quantimnot, a-mr + +import osproc, os, streams, unittest, strutils + +#======= +# setup +#======= + +var process: Process +var ngStdOut, ngStdErr: string +var ngExitCode: int +let previousDir = getCurrentDir() +let tempDir = getTempDir() +let testFilesRoot = tempDir / "nimgrep_test_files" + +template nimgrep(optsAndArgs): untyped = + process = startProcess(previousDir / "bin/nimgrep " & optsAndArgs, + options = {poEvalCommand}) + ngExitCode = process.waitForExit + ngStdOut = process.outputStream.readAll + ngStdErr = process.errorStream.readAll + +func fixSlash(s: string): string = + if DirSep == '/': + result = s + else: # on Windows + result = s.replace('/', DirSep) + +func initString(len = 1000, val = ' '): string = + result = newString(len) + for i in 0..<len: + result[i] = val + +# Create test file hierarchy. +createDir testFilesRoot +setCurrentDir testFilesRoot +createDir "a" / "b" +createDir "c" / "b" +createDir ".hidden" +writeFile("do_not_create_another_file_with_this_pattern_KJKJHSFSFKASHFBKAF", "PATTERN") +writeFile("a" / "b" / "only_the_pattern", "PATTERN") +writeFile("c" / "b" / "only_the_pattern", "PATTERN") +writeFile(".hidden" / "only_the_pattern", "PATTERN") +writeFile("null_in_first_1k", "\0PATTERN") +writeFile("null_after_first_1k", initString(1000) & "\0") +writeFile("empty", "") +writeFile("context_match_filtering", """ +- +CONTEXTPAT +- +PATTERN +- +- +- + +- +- +- +PATTERN +- +- +- +""") +writeFile("only_the_pattern.txt", "PATTERN") +writeFile("only_the_pattern.ascii", "PATTERN") + + +#======= +# tests +#======= + +suite "nimgrep filesystem": + + test "`--filename` with matching file": + nimgrep "-r --filename:KJKJHSFSFKASHFBKAF PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == fixSlash dedent""" + ./do_not_create_another_file_with_this_pattern_KJKJHSFSFKASHFBKAF:1: PATTERN + 1 matches + """ + + + test "`--dirname` with matching dir": + nimgrep "-r --dirname:.hid PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == fixSlash dedent""" + .hidden/only_the_pattern:1: PATTERN + 1 matches + """ + + let only_the_pattern = fixSlash dedent""" + a/b/only_the_pattern:1: PATTERN + c/b/only_the_pattern:1: PATTERN + 2 matches + """ + + let only_a = fixSlash dedent""" + a/b/only_the_pattern:1: PATTERN + 1 matches + """ + + test "`--dirname` with matching grandparent path segment": + nimgrep "-r --dirname:a PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == only_a + + test "`--dirpath` with matching grandparent path segment": + nimgrep "-r --dirp:a PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == only_a + + test "`--dirpath` with matching grandparent path segment": + nimgrep "-r --dirpath:a/b PATTERN".fixSlash + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == only_a + + + test "`--dirname` with matching parent path segment": + nimgrep "-r --dirname:b PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == only_the_pattern + + test "`--dirpath` with matching parent path segment": + nimgrep "-r --dirpath:b PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == only_the_pattern + + + let patterns_without_directory_a_b = fixSlash dedent""" + ./context_match_filtering:4: PATTERN + ./context_match_filtering:12: PATTERN + ./do_not_create_another_file_with_this_pattern_KJKJHSFSFKASHFBKAF:1: PATTERN + ./null_in_first_1k:1: """ & "\0PATTERN\n" & dedent""" + ./only_the_pattern.ascii:1: PATTERN + ./only_the_pattern.txt:1: PATTERN + .hidden/only_the_pattern:1: PATTERN + c/b/only_the_pattern:1: PATTERN + 8 matches + """ + + let patterns_without_directory_b = fixSlash dedent""" + ./context_match_filtering:4: PATTERN + ./context_match_filtering:12: PATTERN + ./do_not_create_another_file_with_this_pattern_KJKJHSFSFKASHFBKAF:1: PATTERN + ./null_in_first_1k:1: """ & "\0PATTERN\n" & dedent""" + ./only_the_pattern.ascii:1: PATTERN + ./only_the_pattern.txt:1: PATTERN + .hidden/only_the_pattern:1: PATTERN + 7 matches + """ + + test "`--ndirname` not matching grandparent path segment": + nimgrep "-r --ndirname:a PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == patterns_without_directory_a_b + + test "`--ndirname` not matching parent path segment": + nimgrep "-r --ndirname:b PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == patterns_without_directory_b + + test "`--notdirpath` not matching grandparent path segment": + nimgrep "-r --notdirpath:a PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == patterns_without_directory_a_b + + test "`--notdirpath` not matching parent path segment": + nimgrep "-r --ndirp:b PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == patterns_without_directory_b + + test "`--notdirpath` with matching grandparent/parent path segment": + nimgrep "-r --ndirp:a/b PATTERN".fixSlash + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == patterns_without_directory_a_b + + + test "`--text`, `-t`, `--bin:off` with file containing a null in first 1k chars": + nimgrep "-r --text PATTERN null_in_first_1k" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == "0 matches\n" + checkpoint "`--text`" + nimgrep "-r -t PATTERN null_in_first_1k" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == "0 matches\n" + checkpoint "`-t`" + nimgrep "-r --bin:off PATTERN null_in_first_1k" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == "0 matches\n" + checkpoint "`--binary:off`" + + + test "`--bin:only` with file containing a null in first 1k chars": + nimgrep "--bin:only -@ PATTERN null_in_first_1k null_after_first_1k only_the_pattern.txt" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == dedent""" + null_in_first_1k:1: ^@PATTERN + 1 matches + """ + + + test "`--bin:only` with file containing a null after first 1k chars": + nimgrep "--bin:only PATTERN null_after_first_1k only_the_pattern.txt" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == "0 matches\n" + + + # TODO: we need to throw a warning if e.g. both extension was provided and + # inappropriate filename was directly provided via command line + # + # test "`--ext:doesnotexist` without a matching file": + # # skip() # FIXME: this test fails + # nimgrep "--ext:doesnotexist PATTERN context_match_filtering only_the_pattern.txt" + # check ngExitCode == 0 + # check ngStdErr.len == 0 + # check ngStdOut == """ + #0 matches + #""" + # + # + # test "`--ext:txt` with a matching file": + # nimgrep "--ext:txt PATTERN context_match_filtering only_the_pattern.txt" + # check ngExitCode == 0 + # check ngStdErr.len == 0 + # check ngStdOut == """ + #only_the_pattern.txt:1: PATTERN + #1 matches + #""" + # + # + # test "`--ext:txt|doesnotexist` with some matching files": + # nimgrep "--ext:txt|doesnotexist PATTERN context_match_filtering only_the_pattern.txt only_the_pattern.ascii" + # check ngExitCode == 0 + # check ngStdErr.len == 0 + # check ngStdOut == """ + #only_the_pattern.txt:1: PATTERN + #1 matches + #""" + # + # + # test "`--ext` with some matching files": + # nimgrep "--ext PATTERN context_match_filtering only_the_pattern.txt only_the_pattern.ascii" + # check ngExitCode == 0 + # check ngStdErr.len == 0 + # check ngStdOut == """ + #context_match_filtering:4: PATTERN + #context_match_filtering:12: PATTERN + #2 matches + #""" + # + # + # test "`--ext:txt --ext` with some matching files": + # nimgrep "--ext:txt --ext PATTERN context_match_filtering only_the_pattern.txt only_the_pattern.ascii" + # check ngExitCode == 0 + # check ngStdErr.len == 0 + # check ngStdOut == """ + #context_match_filtering:4: PATTERN + #context_match_filtering:12: PATTERN + #only_the_pattern.txt:1: PATTERN + #3 matches + #""" + + +suite "nimgrep contents filtering": + + test "`--inFile` with matching file": + nimgrep "-r --inf:CONTEXTPAT PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == fixSlash dedent""" + ./context_match_filtering:4: PATTERN + ./context_match_filtering:12: PATTERN + 2 matches + """ + + + test "`--notinFile` with matching files": + nimgrep "-r --ninf:CONTEXTPAT PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == fixSlash dedent""" + ./do_not_create_another_file_with_this_pattern_KJKJHSFSFKASHFBKAF:1: PATTERN + ./null_in_first_1k:1: """ & "\0PATTERN\n" & dedent""" + ./only_the_pattern.ascii:1: PATTERN + ./only_the_pattern.txt:1: PATTERN + .hidden/only_the_pattern:1: PATTERN + a/b/only_the_pattern:1: PATTERN + c/b/only_the_pattern:1: PATTERN + 7 matches + """ + + + test "`--inContext` with missing context option": + # Using `--inContext` implies default -c:1 is used + nimgrep "-r --inContext:CONTEXTPAT PATTERN" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == "0 matches\n" + + + test "`--inContext` with PAT matching PATTERN": + # This tests the scenario where PAT always matches PATTERN and thus + # has the same effect as excluding the `inContext` option. + # I'm not sure of the desired behaviour here. + nimgrep "--context:2 --inc:PAT PATTERN context_match_filtering" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == dedent""" + context_match_filtering:2 CONTEXTPAT + context_match_filtering:3 - + context_match_filtering:4: PATTERN + context_match_filtering:5 - + context_match_filtering:6 - + + context_match_filtering:10 - + context_match_filtering:11 - + context_match_filtering:12: PATTERN + context_match_filtering:13 - + context_match_filtering:14 - + + 2 matches + """ + + + test "`--inContext` with PAT in context": + nimgrep "--context:2 --inContext:CONTEXTPAT PATTERN context_match_filtering" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == dedent""" + context_match_filtering:2 CONTEXTPAT + context_match_filtering:3 - + context_match_filtering:4: PATTERN + context_match_filtering:5 - + context_match_filtering:6 - + + 1 matches + """ + + + test "`--notinContext` with PAT matching some contexts": + nimgrep "--context:2 --ninContext:CONTEXTPAT PATTERN context_match_filtering" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == dedent""" + context_match_filtering:10 - + context_match_filtering:11 - + context_match_filtering:12: PATTERN + context_match_filtering:13 - + context_match_filtering:14 - + + 1 matches + """ + + + test "`--notinContext` with PAT not matching any of the contexts": + nimgrep "--context:1 --ninc:CONTEXTPAT PATTERN context_match_filtering" + check ngExitCode == 0 + check ngStdErr.len == 0 + check ngStdOut == dedent""" + context_match_filtering:3 - + context_match_filtering:4: PATTERN + context_match_filtering:5 - + + context_match_filtering:11 - + context_match_filtering:12: PATTERN + context_match_filtering:13 - + + 2 matches + """ + + +#========= +# cleanup +#========= + +setCurrentDir previousDir +removeDir testFilesRoot diff --git a/tools/nimgrep.nim b/tools/nimgrep.nim index a589cfb14..ef8aa5570 100644 --- a/tools/nimgrep.nim +++ b/tools/nimgrep.nim @@ -95,26 +95,34 @@ type filename: string, fileResult: FileResult] WalkOpt = tuple # used for walking directories/producing paths extensions: seq[string] - skipExtensions: seq[string] - excludeFile: seq[string] - includeFile: seq[string] - includeDir : seq[string] - excludeDir : seq[string] + notExtensions: seq[string] + filename: seq[string] + notFilename: seq[string] + dirPath: seq[string] + notDirPath: seq[string] + dirname : seq[string] + notDirname : seq[string] WalkOptComp[Pat] = tuple # a compiled version of the previous - excludeFile: seq[Pat] - includeFile: seq[Pat] - includeDir : seq[Pat] - excludeDir : seq[Pat] + filename: seq[Pat] + notFilename: seq[Pat] + dirname : seq[Pat] + notDirname : seq[Pat] + dirPath: seq[Pat] + notDirPath: seq[Pat] SearchOpt = tuple # used for searching inside a file - patternSet: bool # to distinguish uninitialized 'pattern' and empty one - pattern: string # main PATTERN - checkMatch: string # --match - checkNoMatch: string # --nomatch - checkBin: Bin # --bin + patternSet: bool # To distinguish uninitialized/empty 'pattern' + pattern: string # Main PATTERN + inFile: seq[string] # --inFile, --inf + notInFile: seq[string] # --notinFile, --ninf + inContext: seq[string] # --inContext, --inc + notInContext: seq[string] # --notinContext, --ninc + checkBin: Bin # --bin, --text SearchOptComp[Pat] = tuple # a compiled version of the previous pattern: Pat - checkMatch: Pat - checkNoMatch: Pat + inFile: seq[Pat] + notInFile: seq[Pat] + inContext: seq[Pat] + notInContext: seq[Pat] SinglePattern[PAT] = tuple # compile single pattern for replacef pattern: PAT Column = tuple # current column info for the cropping (--limit) feature @@ -807,6 +815,33 @@ template declareCompiledPatterns(compiledStruct: untyped, body {.hint[XDeclaredButNotUsed]: on.} +template ensureIncluded(includePat: seq[Pattern], str: string, + body: untyped) = + if includePat.len != 0: + var matched = false + for pat in includePat: + if str.contains(pat): + matched = true + break + if not matched: + body + +template ensureExcluded(excludePat: seq[Pattern], str: string, + body: untyped) = + {.warning[UnreachableCode]: off.} + for pat in excludePat: + if str.contains(pat, 0): + body + break + {.warning[UnreachableCode]: on.} + +func checkContext(context: string, searchOptC: SearchOptComp[Pattern]): bool = + ensureIncluded searchOptC.inContext, context: + return false + ensureExcluded searchOptC.notInContext, context: + return false + result = true + iterator processFile(searchOptC: SearchOptComp[Pattern], filename: string, yieldContents=false): Output = var buffer: string @@ -836,13 +871,13 @@ iterator processFile(searchOptC: SearchOptComp[Pattern], filename: string, reason = "text file" if not reject: - if searchOpt.checkMatch != "": - reject = not contains(buffer, searchOptC.checkMatch, 0) + ensureIncluded searchOptC.inFile, buffer: + reject = true reason = "doesn't contain a requested match" if not reject: - if searchOpt.checkNoMatch != "": - reject = contains(buffer, searchOptC.checkNoMatch, 0) + ensureExcluded searchOptC.notInFile, buffer: + reject = true reason = "contains a forbidden match" if reject: @@ -852,20 +887,50 @@ iterator processFile(searchOptC: SearchOptComp[Pattern], filename: string, else: var found = false var cnt = 0 - for output in searchFile(searchOptC.pattern, buffer): - found = true - if optCount notin options: - yield output - else: - if output.kind in {blockFirstMatch, blockNextMatch}: - inc(cnt) + let skipCheckContext = (searchOpt.notInContext.len == 0 and + searchOpt.inContext.len == 0) + if skipCheckContext: + for output in searchFile(searchOptC.pattern, buffer): + found = true + if optCount notin options: + yield output + else: + if output.kind in {blockFirstMatch, blockNextMatch}: + inc(cnt) + else: + var context: string + var outputAccumulator: seq[Output] + for outp in searchFile(searchOptC.pattern, buffer): + if outp.kind in {blockFirstMatch, blockNextMatch}: + outputAccumulator.add outp + context.add outp.pre + context.add outp.match.match + elif outp.kind == blockEnd: + outputAccumulator.add outp + context.add outp.blockEnding + # context has been formed, now check it: + if checkContext(context, searchOptC): + found = true + for output in outputAccumulator: + if optCount notin options: + yield output + else: + if output.kind in {blockFirstMatch, blockNextMatch}: + inc(cnt) + context = "" + outputAccumulator.setLen 0 + # end `if skipCheckContext`. if optCount in options and cnt > 0: yield Output(kind: justCount, matches: cnt) if yieldContents and found and optCount notin options: yield Output(kind: fileContents, buffer: move(buffer)) - -proc hasRightFileName(path: string, walkOptC: WalkOptComp[Pattern]): bool = +proc hasRightPath(path: string, walkOptC: WalkOptComp[Pattern]): bool = + if not ( + walkOpt.extensions.len > 0 or walkOpt.notExtensions.len > 0 or + walkOpt.filename.len > 0 or walkOpt.notFilename.len > 0 or + walkOpt.notDirPath.len > 0 or walkOpt.dirPath.len > 0): + return true let filename = path.lastPathPart let ex = filename.splitFile.ext.substr(1) # skip leading '.' if walkOpt.extensions.len != 0: @@ -875,31 +940,44 @@ proc hasRightFileName(path: string, walkOptC: WalkOptComp[Pattern]): bool = matched = true break if not matched: return false - for x in walkOpt.skipExtensions: + for x in walkOpt.notExtensions: if os.cmpPaths(x, ex) == 0: return false - if walkOptC.includeFile.len != 0: - var matched = false - for pat in walkOptC.includeFile: - if filename.contains(pat): - matched = true - break - if not matched: return false - for pat in walkOptC.excludeFile: - if filename.contains(pat): return false - let dirname = path.parentDir - if walkOptC.includeDir.len != 0: - var matched = false - for pat in walkOptC.includeDir: - if dirname.contains(pat): - matched = true + ensureIncluded walkOptC.filename, filename: + return false + ensureExcluded walkOptC.notFilename, filename: + return false + let parent = path.parentDir + ensureExcluded walkOptC.notDirPath, parent: + return false + ensureIncluded walkOptC.dirPath, parent: + return false + result = true + +proc isRightDirectory(path: string, walkOptC: WalkOptComp[Pattern]): bool = + ## --dirname can be only checked when the final path is known + ## so this proc is suitable for files only. + if walkOptC.dirname.len > 0: + var badDirname = false + var (nextParent, dirname) = splitPath(path) + # check that --dirname matches for one of directories in parent path: + while dirname != "": + badDirname = false + ensureIncluded walkOptC.dirname, dirname: + badDirname = true + if not badDirname: break - if not matched: return false + (nextParent, dirname) = splitPath(nextParent) + if badDirname: # badDirname was set to true for all the dirs + return false result = true -proc hasRightDirectory(path: string, walkOptC: WalkOptComp[Pattern]): bool = - let dirname = path.lastPathPart - for pat in walkOptC.excludeDir: - if dirname.contains(pat): return false +proc descendToDirectory(path: string, walkOptC: WalkOptComp[Pattern]): bool = + ## --notdirname can be checked for directories immediately for optimization to + ## prevent descending into undesired directories. + if walkOptC.notDirname.len > 0: + let dirname = path.lastPathPart + ensureExcluded walkOptC.notDirname, dirname: + return false result = true iterator walkDirBasic(dir: string, walkOptC: WalkOptComp[Pattern]): string @@ -908,22 +986,24 @@ iterator walkDirBasic(dir: string, walkOptC: WalkOptComp[Pattern]): string var timeFiles = newSeq[(times.Time, string)]() while dirStack.len > 0: let d = dirStack.pop() + let rightDirForFiles = d.isRightDirectory(walkOptC) var files = newSeq[string]() var dirs = newSeq[string]() for kind, path in walkDir(d): case kind of pcFile: - if path.hasRightFileName(walkOptC): + if path.hasRightPath(walkOptC) and rightDirForFiles: files.add(path) of pcLinkToFile: - if optFollow in options and path.hasRightFileName(walkOptC): + if optFollow in options and path.hasRightPath(walkOptC) and + rightDirForFiles: files.add(path) of pcDir: - if optRecursive in options and path.hasRightDirectory(walkOptC): + if optRecursive in options and path.descendToDirectory(walkOptC): dirs.add path of pcLinkToDir: if optFollow in options and optRecursive in options and - path.hasRightDirectory(walkOptC): + path.descendToDirectory(walkOptC): dirs.add path if sortTime: # sort by time - collect files before yielding for file in files: @@ -948,10 +1028,12 @@ iterator walkDirBasic(dir: string, walkOptC: WalkOptComp[Pattern]): string iterator walkRec(paths: seq[string]): tuple[error: string, filename: string] {.closure.} = declareCompiledPatterns(walkOptC, WalkOptComp): - walkOptC.excludeFile.add walkOpt.excludeFile.compileArray() - walkOptC.includeFile.add walkOpt.includeFile.compileArray() - walkOptC.includeDir.add walkOpt.includeDir.compileArray() - walkOptC.excludeDir.add walkOpt.excludeDir.compileArray() + walkOptC.notFilename.add walkOpt.notFilename.compileArray() + walkOptC.filename.add walkOpt.filename.compileArray() + walkOptC.dirname.add walkOpt.dirname.compileArray() + walkOptC.notDirname.add walkOpt.notDirname.compileArray() + walkOptC.dirPath.add walkOpt.dirPath.compileArray() + walkOptC.notDirPath.add walkOpt.notDirPath.compileArray() for path in paths: if dirExists(path): for p in walkDirBasic(path, walkOptC): @@ -1030,8 +1112,10 @@ template processFileResult(pattern: Pattern; filename: string, proc run1Thread() = declareCompiledPatterns(searchOptC, SearchOptComp): compile1Pattern(searchOpt.pattern, searchOptC.pattern) - compile1Pattern(searchOpt.checkMatch, searchOptC.checkMatch) - compile1Pattern(searchOpt.checkNoMatch, searchOptC.checkNoMatch) + searchOptC.inFile.add searchOpt.inFile.compileArray() + searchOptC.notInFile.add searchOpt.notInFile.compileArray() + searchOptC.inContext.add searchOpt.inContext.compileArray() + searchOptC.notInContext.add searchOpt.notInContext.compileArray() if optPipe in options: processFileResult(searchOptC.pattern, "-", processFile(searchOptC, "-", @@ -1073,8 +1157,10 @@ proc worker(initSearchOpt: SearchOpt) {.thread.} = searchOpt = initSearchOpt # init thread-local var declareCompiledPatterns(searchOptC, SearchOptComp): compile1Pattern(searchOpt.pattern, searchOptC.pattern) - compile1Pattern(searchOpt.checkMatch, searchOptC.checkMatch) - compile1Pattern(searchOpt.checkNoMatch, searchOptC.checkNoMatch) + searchOptC.inFile.add searchOpt.inFile.compileArray() + searchOptC.notInFile.add searchOpt.notInFile.compileArray() + searchOptC.inContext.add searchOpt.inContext.compileArray() + searchOptC.notInContext.add searchOpt.notInContext.compileArray() while true: let (fileNo, filename) = searchRequestsChan.recv() var fileResult: FileResult @@ -1197,15 +1283,35 @@ for kind, key, val in getopt(): nWorkers = countProcessors() else: nWorkers = parseNonNegative(val, key) - of "ext": walkOpt.extensions.add val.split('|') - of "noext", "no-ext": walkOpt.skipExtensions.add val.split('|') - of "excludedir", "exclude-dir", "ed": walkOpt.excludeDir.add val - of "includedir", "include-dir", "id": walkOpt.includeDir.add val - of "includefile", "include-file", "if": walkOpt.includeFile.add val - of "excludefile", "exclude-file", "ef": walkOpt.excludeFile.add val - of "match": searchOpt.checkMatch = val - of "nomatch": - searchOpt.checkNoMatch = val + of "extensions", "ex", "ext": walkOpt.extensions.add val.split('|') + of "nextensions", "notextensions", "nex", "notex", + "noext", "no-ext": # 2 deprecated options + walkOpt.notExtensions.add val.split('|') + of "dirname", "di": + walkOpt.dirname.add val + of "ndirname", "notdirname", "ndi", "notdi", + "excludedir", "ed": # 2 deprecated options + walkOpt.notDirname.add val + of "dirpath", "dirp", + "includedir", "id": # 2 deprecated options + walkOpt.dirPath.add val + of "ndirpath", "notdirpath", "ndirp", "notdirp": + walkOpt.notDirPath.add val + of "filename", "fi", + "includefile", "include-file", "if": # 3 deprecated options + walkOpt.filename.add val + of "nfilename", "nfi", "notfilename", "notfi", + "excludefile", "exclude-file", "ef": # 3 deprecated options + walkOpt.notFilename.add val + of "infile", "inf", + "matchfile", "match", "mf": # 3 deprecated options + searchOpt.inFile.add val + of "ninfile", "notinfile", "ninf", "notinf", + "nomatchfile", "nomatch", "nf": # 3 options are deprecated + searchOpt.notInFile.add val + of "incontext", "inc": searchOpt.inContext.add val + of "nincontext", "notincontext", "ninc", "notinc": + searchOpt.notInContext.add val of "bin": case val of "on": searchOpt.checkBin = biOn |