diff options
author | havardjohn <havard.mjaavatten@outlook.com> | 2022-09-04 02:47:09 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-09-03 20:47:09 -0400 |
commit | 23e0160af283bb0bb573a86145e6c1c792780d49 (patch) | |
tree | d95dbd0c940f44d3265af57073f78ba1bb6934dd /lib | |
parent | a6189fbb988ae9c9e6760cb901e792e043b9086b (diff) | |
download | Nim-23e0160af283bb0bb573a86145e6c1c792780d49.tar.gz |
Add improved Windows UNC path support in std/os (#20281)
* Add improved Windows UNC path support in std/os Original issue: `std/os.createDir` tries to create every component of the given path as a directory. The problem is that `createDir` interprets every backslash/slash as a path separator. For a UNC path this is incorrect. E.g. one UNC form is `\\Server\Volume\Path`. It's an error to create the `\\Server` directory, as well as creating `\\Server\Volume`. Add `ntpath.nim` module with `splitDrive` proc. This implements UNC path parsing as implemented in the Python `ntpath.py` module. The following UNC forms are supported: * `\\Server\Volume\Path` * `\\?\Volume\Path` * `\\?\UNC\Server\Volume\Path` Improves support for UNC paths in various procs in `std/os`: --- * pathnorm.addNormalizePath * Issue: This had incomplete support for UNC paths * The UNC prefix (first 2 characters of a UNC path) was assumed to be exactly `\\`, but it can be `//` and `\/`, etc. as well * Also, the UNC prefix must be normalized to the `dirSep` argument of `addNormalizePath` * Resolution: Changed to account for different UNC prefixes, and normalizing the prefixes according to `dirSep` * Affected procs that get tests: `relativePath`, `joinPath` * Issue: The server/volume part of UNC paths can be stripped when normalizing `..` path components * This error should be negligable, so ignoring this * splitPath * Now make sure the UNC drive is not split; return the UNC drive as `head` if the UNC drive is the only component of the path * Consequently fixes `extractFilename`, `lastPathPart` * parentDir / `/../` * Strip away drive before working on the path, prepending the drive after all work is done - prevents stripping UNC components * Return empty string if drive component is the only component; this is the behavior for POSIX paths as well * Alternative implementation: Just call something like `pathnorm.normalizePath(path & "/..")` for the whole proc - maybe too big of a change * tailDir * If drive is present in path, just split that from path and return path * parentDirs iterator * Uses `parentDir` for going backwards * When going forwards, first `splitDrive`, yield the drive field, and then iterate over path field as normal * splitFile * Make sure path parsing stops at end of drive component * createDir * Fixed by skipping drive part before creating directories * Alternative implementation: use `parentDirs` iterator instead of iterating over characters * Consequence is that it will try to create the root directory * isRootDir * Changed to treat UNC drive alone as root (e.g. "//?/c:" is root) * This change prevents the empty string being yielded by the `parentDirs` iterator with `fromRoot = false` * Internal `sameRoot` * The "root" refers to the drive, so `splitDrive` can be used here This adds UNC path support to all procs that could use it in std/os. I don't think any more work has to be done to support UNC paths. For the future, I believe the path handling code can be refactored due to duplicate code. There are multiple ways of manipulating paths, such as manually searching string for path separator and also having a path normalizer (pathnorm.nim). If all path manipulation used `pathnorm.nim`, and path component splitting used `parentDirs` iterator, then a lot of code could be removed. Tests --- Added test file for `pathnorm.nim` and `ntpath.nim`. `pathnorm.normalizePath` has no tests, so I'm adding a few unit tests. `ntpath.nim` contains tests copied from Python's test suite. Added integration tests to `tos.nim` that tests UNC paths. Removed incorrect `relativePath` runnableExamples from being tested on Windows: --- `relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim"` This is incorrect on Windows because the `/` and `//` are not the same root. `/` (or `\`) is expanded to the drive in the current working directory (e.g. `C:\`). `//` (or `\\`), however, are the first two characters of a UNC path. The following holds true for normal Windows installations: * `dirExists("/Users") != dirExists("//Users")` * `dirExists("\\Users") != dirExists("\\\\Users")` Fixes #19103 Questions: --- * Should the `splitDrive` proc be in `os.nim` instead with copyright notice above the proc? * Is it fine to put most of the new tests into the `runnableExamples` section of the procs in std/os? * [skipci] Apply suggestions from code review Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com> * [skip ci] Update lib/pure/os.nim Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com> * Move runnableExamples tests in os.nim to tos.nim * tests/topt_no_cursor: Change from using splitFile to splitDrive `splitFile` can no longer be used in the test, because it generates different ARC code on Windows and Linux. This replaces `splitFile` with `splitDrive`, because it generates same ARC code on Windows and Linux, and returns a tuple. I assume the test wants a proc that returns a tuple. * Drop copyright attribute to Python Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>
Diffstat (limited to 'lib')
-rw-r--r-- | lib/pure/os.nim | 105 | ||||
-rw-r--r-- | lib/pure/pathnorm.nim | 17 | ||||
-rw-r--r-- | lib/std/private/ntpath.nim | 61 |
3 files changed, 136 insertions, 47 deletions
diff --git a/lib/pure/os.nim b/lib/pure/os.nim index 82ccd9051..5541e6f15 100644 --- a/lib/pure/os.nim +++ b/lib/pure/os.nim @@ -236,6 +236,9 @@ proc `/`*(head, tail: string): string {.noSideEffect, inline.} = result = joinPath(head, tail) +when doslikeFileSystem: + import std/private/ntpath + proc splitPath*(path: string): tuple[head, tail: string] {. noSideEffect, rtl, extern: "nos$1".} = ## Splits a directory into `(head, tail)` tuple, so that @@ -258,8 +261,14 @@ proc splitPath*(path: string): tuple[head, tail: string] {. assert splitPath("bin") == ("", "bin") assert splitPath("") == ("", "") + when doslikeFileSystem: + let (drive, splitpath) = splitDrive(path) + let stop = drive.len + else: + const stop = 0 + var sepPos = -1 - for i in countdown(len(path)-1, 0): + for i in countdown(len(path)-1, stop): if path[i] in {DirSep, AltSep}: sepPos = i break @@ -272,8 +281,12 @@ proc splitPath*(path: string): tuple[head, tail: string] {. ) result.tail = substr(path, sepPos+1) else: - result.head = "" - result.tail = path + when doslikeFileSystem: + result.head = drive + result.tail = splitpath + else: + result.head = "" + result.tail = path proc isAbsolute*(path: string): bool {.rtl, noSideEffect, extern: "nos$1", raises: [].} = ## Checks whether a given `path` is absolute. @@ -318,9 +331,6 @@ when doslikeFileSystem: (path[0] == DirSep and (path.len == 1 or path[1] notin {DirSep, AltSep, ':'}))) - proc isUNCPrefix(path: string): bool {.noSideEffect, raises: [].} = - path[0] == DirSep and path[1] == DirSep - proc sameRoot(path1, path2: string): bool {.noSideEffect, raises: [].} = ## Return true if path1 and path2 have a same root. ## @@ -330,40 +340,12 @@ when doslikeFileSystem: assert(isAbsolute(path1)) assert(isAbsolute(path2)) - let - len1 = path1.len - len2 = path2.len - assert(len1 != 0 and len2 != 0) - if isAbsFromCurrentDrive(path1) and isAbsFromCurrentDrive(path2): - return true - elif len1 == 1 or len2 == 1: - return false + result = true + elif cmpIgnoreCase(splitDrive(path1).drive, splitDrive(path2).drive) == 0: + result = true else: - if path1[1] == ':' and path2[1] == ':': - return path1[0].toLowerAscii() == path2[0].toLowerAscii() - else: - var - p1, p2: PathIter - pp1 = next(p1, path1) - pp2 = next(p2, path2) - if pp1[1] - pp1[0] == 1 and pp2[1] - pp2[0] == 1 and - isUNCPrefix(path1) and isUNCPrefix(path2): - #UNC - var h = 0 - while p1.hasNext(path1) and p2.hasNext(path2) and h < 2: - pp1 = next(p1, path1) - pp2 = next(p2, path2) - let diff = pp1[1] - pp1[0] - if diff != pp2[1] - pp2[0]: - return false - for i in 0..diff: - if path1[i + pp1[0]] !=? path2[i + pp2[0]]: - return false - inc h - return h == 2 - else: - return false + result = false proc relativePath*(path, base: string, sep = DirSep): string {. rtl, extern: "nos$1".} = @@ -384,7 +366,8 @@ proc relativePath*(path, base: string, sep = DirSep): string {. runnableExamples: assert relativePath("/Users/me/bar/z.nim", "/Users/other/bad", '/') == "../../me/bar/z.nim" assert relativePath("/Users/me/bar/z.nim", "/Users/other", '/') == "../me/bar/z.nim" - assert relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim" + when not doslikeFileSystem: # On Windows, UNC-paths start with `//` + assert relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim" assert relativePath("/Users/me/bar/z.nim", "/Users/me", '/') == "bar/z.nim" assert relativePath("", "/users/moo", '/') == "" assert relativePath("foo", ".", '/') == "foo" @@ -497,6 +480,9 @@ proc parentDir*(path: string): string {. assert parentDir("a//./") == "." assert parentDir("a/b/c/..") == "a" result = pathnorm.normalizePath(path) + when doslikeFileSystem: + let (drive, splitpath) = splitDrive(result) + result = splitpath var sepPos = parentDirPos(result) if sepPos >= 0: result = substr(result, 0, sepPos) @@ -507,6 +493,13 @@ proc parentDir*(path: string): string {. result = "" else: result = "." + when doslikeFileSystem: + if result.len == 0: + discard + elif drive.len > 0 and result.len == 1 and result[0] in {DirSep, AltSep}: + result = drive + else: + result = drive & result proc tailDir*(path: string): string {. noSideEffect, rtl, extern: "nos$1".} = @@ -526,6 +519,10 @@ proc tailDir*(path: string): string {. assert tailDir("usr/local/bin") == "local/bin" var i = 0 + when doslikeFileSystem: + let (drive, splitpath) = path.splitDrive + if drive != "": + return splitpath.strip(chars = {DirSep, AltSep}, trailing = false) while i < len(path): if path[i] in {DirSep, AltSep}: while i < len(path) and path[i] in {DirSep, AltSep}: inc i @@ -544,6 +541,9 @@ proc isRootDir*(path: string): bool {. assert not isRootDir("/a") assert not isRootDir("a/b/c") + when doslikeFileSystem: + if splitDrive(path).path == "": + return true result = parentDirPos(path) < 0 iterator parentDirs*(path: string, fromRoot=false, inclusive=true): string = @@ -588,7 +588,11 @@ iterator parentDirs*(path: string, fromRoot=false, inclusive=true): string = current = current.parentDir yield current else: - for i in countup(0, path.len - 2): # ignore the last / + when doslikeFileSystem: + let start = path.splitDrive.drive.len + else: + const start = 0 + for i in countup(start, path.len - 2): # ignore the last / # deal with non-normalized paths such as /foo//bar//baz if path[i] in {DirSep, AltSep} and (i == 0 or path[i-1] notin {DirSep, AltSep}): @@ -608,11 +612,15 @@ proc `/../`*(head, tail: string): string {.noSideEffect.} = assert "a/b/c" /../ "d/e" == "a/b/d/e" assert "a" /../ "d/e" == "a/d/e" + when doslikeFileSystem: + let (drive, head) = splitDrive(head) let sepPos = parentDirPos(head) if sepPos >= 0: result = substr(head, 0, sepPos-1) / tail else: result = head / tail + when doslikeFileSystem: + result = drive / result proc normExt(ext: string): string = if ext == "" or ext[0] == ExtSep: result = ext # no copy needed here @@ -680,7 +688,13 @@ proc splitFile*(path: string): tuple[dir, name, ext: string] {. var namePos = 0 var dotPos = 0 - for i in countdown(len(path) - 1, 0): + when doslikeFileSystem: + let (drive, _) = splitDrive(path) + let stop = len(drive) + result.dir = drive + else: + const stop = 0 + for i in countdown(len(path) - 1, stop): if path[i] in {DirSep, AltSep} or i == 0: if path[i] in {DirSep, AltSep}: result.dir = substr(path, 0, if likely(i >= 1): i - 1 else: 0) @@ -2541,8 +2555,13 @@ proc createDir*(dir: string) {.rtl, extern: "nos$1", ## * `moveDir proc`_ var omitNext = false when doslikeFileSystem: - omitNext = isAbsolute(dir) - for i in 1.. dir.len-1: + var start = 1 + if isAbsolute(dir): + omitNext = true + start = dir.splitDrive.drive.len + 1 + else: + let start = 1 + for i in start.. dir.len-1: if dir[i] in {DirSep, AltSep}: if omitNext: omitNext = false diff --git a/lib/pure/pathnorm.nim b/lib/pure/pathnorm.nim index 10a2a0b67..a71ae0762 100644 --- a/lib/pure/pathnorm.nim +++ b/lib/pure/pathnorm.nim @@ -29,10 +29,6 @@ proc next*(it: var PathIter; x: string): (int, int) = if not it.notFirst and x[it.i] in {DirSep, AltSep}: # absolute path: inc it.i - when doslikeFileSystem: # UNC paths have leading `\\` - if hasNext(it, x) and x[it.i] == DirSep and - it.i+1 < x.len and x[it.i+1] != DirSep: - inc it.i else: while it.i < x.len and x[it.i] notin {DirSep, AltSep}: inc it.i if it.i > it.prev: @@ -56,10 +52,23 @@ proc isDotDot(x: string; bounds: (int, int)): bool = proc isSlash(x: string; bounds: (int, int)): bool = bounds[1] == bounds[0] and x[bounds[0]] in {DirSep, AltSep} +when doslikeFileSystem: + import std/private/ntpath + proc addNormalizePath*(x: string; result: var string; state: var int; dirSep = DirSep) = ## Low level proc. Undocumented. + when doslikeFileSystem: # Add Windows drive at start without normalization + var x = x + if result == "": + let (drive, file) = splitDrive(x) + x = file + result.add drive + for c in result.mitems: + if c in {DirSep, AltSep}: + c = dirSep + # state: 0th bit set if isAbsolute path. Other bits count # the number of path components. var it: PathIter diff --git a/lib/std/private/ntpath.nim b/lib/std/private/ntpath.nim new file mode 100644 index 000000000..7c8661bb7 --- /dev/null +++ b/lib/std/private/ntpath.nim @@ -0,0 +1,61 @@ +# This module is inspired by Python's `ntpath.py` module. + +import std/[ + strutils, +] + +# Adapted `splitdrive` function from the following commits in Python source +# code: +# 5a607a3ee5e81bdcef3f886f9d20c1376a533df4 (2009): Initial UNC handling (by Mark Hammond) +# 2ba0fd5767577954f331ecbd53596cd8035d7186 (2022): Support for "UNC"-device paths (by Barney Gale) +# +# FAQ: Why use `strip` below? `\\?\UNC` is the start of a "UNC symbolic link", +# which is a special UNC form. Running `strip` differentiates `\\?\UNC\` (a UNC +# symbolic link) from e.g. `\\?\UNCD` (UNCD is the server in the UNC path). +func splitDrive*(p: string): tuple[drive, path: string] = + ## Splits a Windows path into a drive and path part. The drive can be e.g. + ## `C:`. It can also be a UNC path (`\\server\drive\path`). + ## + ## The equivalent `splitDrive` for POSIX systems always returns empty drive. + ## Therefore this proc is only necessary on DOS-like file systems (together + ## with Nim's `doslikeFileSystem` conditional variable). + ## + ## This proc's use case is to extract `path` such that it can be manipulated + ## like a POSIX path. + runnableExamples: + doAssert splitDrive("C:") == ("C:", "") + doAssert splitDrive(r"C:\") == (r"C:", r"\") + doAssert splitDrive(r"\\server\drive\foo\bar") == (r"\\server\drive", r"\foo\bar") + doAssert splitDrive(r"\\?\UNC\server\share\dir") == (r"\\?\UNC\server\share", r"\dir") + + result = ("", p) + if p.len < 2: + return + const sep = '\\' + let normp = p.replace('/', sep) + if p.len > 2 and normp[0] == sep and normp[1] == sep and normp[2] != sep: + + # is a UNC path: + # vvvvvvvvvvvvvvvvvvvv drive letter or UNC path + # \\machine\mountpoint\directory\etc\... + # directory ^^^^^^^^^^^^^^^ + let start = block: + const unc = "\\\\?\\UNC" # Length is 7 + let idx = min(8, normp.len) + if unc == normp[0..<idx].strip(chars = {sep}, leading = false).toUpperAscii: + 8 + else: + 2 + let index = normp.find(sep, start) + if index == -1: + return + var index2 = normp.find(sep, index + 1) + + # a UNC path can't have two slashes in a row (after the initial two) + if index2 == index + 1: + return + if index2 == -1: + index2 = p.len + return (p[0..<index2], p[index2..^1]) + if p[1] == ':': + return (p[0..1], p[2..^1]) |