summary refs log tree commit diff stats
path: root/lib
diff options
context:
space:
mode:
authorhavardjohn <havard.mjaavatten@outlook.com>2022-09-04 02:47:09 +0200
committerGitHub <noreply@github.com>2022-09-03 20:47:09 -0400
commit23e0160af283bb0bb573a86145e6c1c792780d49 (patch)
treed95dbd0c940f44d3265af57073f78ba1bb6934dd /lib
parenta6189fbb988ae9c9e6760cb901e792e043b9086b (diff)
downloadNim-23e0160af283bb0bb573a86145e6c1c792780d49.tar.gz
Add improved Windows UNC path support in std/os (#20281)
* Add improved Windows UNC path support in std/os

Original issue: `std/os.createDir` tries to create every component of
the given path as a directory. The problem is that `createDir`
interprets every backslash/slash as a path separator. For a UNC path
this is incorrect. E.g. one UNC form is `\\Server\Volume\Path`. It's an
error to create the `\\Server` directory, as well as creating
`\\Server\Volume`.

Add `ntpath.nim` module with `splitDrive` proc. This implements UNC path
parsing as implemented in the Python `ntpath.py` module. The following
UNC forms are supported:

* `\\Server\Volume\Path`
* `\\?\Volume\Path`
* `\\?\UNC\Server\Volume\Path`

Improves support for UNC paths in various procs in `std/os`:
---

* pathnorm.addNormalizePath
  * Issue: This had incomplete support for UNC paths
    * The UNC prefix (first 2 characters of a UNC path) was assumed to
      be exactly `\\`, but it can be `//` and `\/`, etc. as well
    * Also, the UNC prefix must be normalized to the `dirSep` argument
      of `addNormalizePath`
  * Resolution: Changed to account for different UNC prefixes, and
    normalizing the prefixes according to `dirSep`
    * Affected procs that get tests: `relativePath`, `joinPath`
  * Issue: The server/volume part of UNC paths can be stripped when
    normalizing `..` path components
    * This error should be negligable, so ignoring this
* splitPath
  * Now make sure the UNC drive is not split; return the UNC drive as
    `head` if the UNC drive is the only component of the path
  * Consequently fixes `extractFilename`, `lastPathPart`
* parentDir / `/../`
  * Strip away drive before working on the path, prepending the drive
    after all work is done - prevents stripping UNC components
  * Return empty string if drive component is the only component; this
    is the behavior for POSIX paths as well
  * Alternative implementation: Just call something like
    `pathnorm.normalizePath(path & "/..")` for the whole proc - maybe
    too big of a change
* tailDir
  * If drive is present in path, just split that from path and return
    path
* parentDirs iterator
  * Uses `parentDir` for going backwards
  * When going forwards, first `splitDrive`, yield the drive field, and
    then iterate over path field as normal
* splitFile
  * Make sure path parsing stops at end of drive component
* createDir
  * Fixed by skipping drive part before creating directories
  * Alternative implementation: use `parentDirs` iterator instead of
    iterating over characters
    * Consequence is that it will try to create the root directory
* isRootDir
  * Changed to treat UNC drive alone as root (e.g. "//?/c:" is root)
  * This change prevents the empty string being yielded by the
    `parentDirs` iterator with `fromRoot = false`
* Internal `sameRoot`
  * The "root" refers to the drive, so `splitDrive` can be used here

This adds UNC path support to all procs that could use it in std/os. I
don't think any more work has to be done to support UNC paths. For the
future, I believe the path handling code can be refactored due to
duplicate code. There are multiple ways of manipulating paths, such as
manually searching string for path separator and also having a path
normalizer (pathnorm.nim). If all path manipulation used `pathnorm.nim`,
and path component splitting used `parentDirs` iterator, then a lot of
code could be removed.

Tests
---

Added test file for `pathnorm.nim` and `ntpath.nim`.
`pathnorm.normalizePath` has no tests, so I'm adding a few unit tests.
`ntpath.nim` contains tests copied from Python's test suite.

Added integration tests to `tos.nim` that tests UNC paths.

Removed incorrect `relativePath` runnableExamples from being tested on Windows:
---

`relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim"`

This is incorrect on Windows because the `/` and `//` are not the same
root. `/` (or `\`) is expanded to the drive in the current working
directory (e.g. `C:\`). `//` (or `\\`), however, are the first two
characters of a UNC path. The following holds true for normal Windows
installations:

* `dirExists("/Users") != dirExists("//Users")`
* `dirExists("\\Users") != dirExists("\\\\Users")`

Fixes #19103

Questions:
---

* Should the `splitDrive` proc be in `os.nim` instead with copyright
  notice above the proc?
* Is it fine to put most of the new tests into the `runnableExamples`
  section of the procs in std/os?

* [skipci] Apply suggestions from code review

Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>

* [skip ci] Update lib/pure/os.nim

Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>

* Move runnableExamples tests in os.nim to tos.nim

* tests/topt_no_cursor: Change from using splitFile to splitDrive

`splitFile` can no longer be used in the test, because it generates
different ARC code on Windows and Linux. This replaces `splitFile` with
`splitDrive`, because it generates same ARC code on Windows and Linux,
and returns a tuple. I assume the test wants a proc that returns a
tuple.

* Drop copyright attribute to Python

Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>
Diffstat (limited to 'lib')
-rw-r--r--lib/pure/os.nim105
-rw-r--r--lib/pure/pathnorm.nim17
-rw-r--r--lib/std/private/ntpath.nim61
3 files changed, 136 insertions, 47 deletions
diff --git a/lib/pure/os.nim b/lib/pure/os.nim
index 82ccd9051..5541e6f15 100644
--- a/lib/pure/os.nim
+++ b/lib/pure/os.nim
@@ -236,6 +236,9 @@ proc `/`*(head, tail: string): string {.noSideEffect, inline.} =
 
   result = joinPath(head, tail)
 
+when doslikeFileSystem:
+  import std/private/ntpath
+
 proc splitPath*(path: string): tuple[head, tail: string] {.
   noSideEffect, rtl, extern: "nos$1".} =
   ## Splits a directory into `(head, tail)` tuple, so that
@@ -258,8 +261,14 @@ proc splitPath*(path: string): tuple[head, tail: string] {.
     assert splitPath("bin") == ("", "bin")
     assert splitPath("") == ("", "")
 
+  when doslikeFileSystem:
+    let (drive, splitpath) = splitDrive(path)
+    let stop = drive.len
+  else:
+    const stop = 0
+
   var sepPos = -1
-  for i in countdown(len(path)-1, 0):
+  for i in countdown(len(path)-1, stop):
     if path[i] in {DirSep, AltSep}:
       sepPos = i
       break
@@ -272,8 +281,12 @@ proc splitPath*(path: string): tuple[head, tail: string] {.
     )
     result.tail = substr(path, sepPos+1)
   else:
-    result.head = ""
-    result.tail = path
+    when doslikeFileSystem:
+      result.head = drive
+      result.tail = splitpath
+    else:
+      result.head = ""
+      result.tail = path
 
 proc isAbsolute*(path: string): bool {.rtl, noSideEffect, extern: "nos$1", raises: [].} =
   ## Checks whether a given `path` is absolute.
@@ -318,9 +331,6 @@ when doslikeFileSystem:
      (path[0] == DirSep and
       (path.len == 1 or path[1] notin {DirSep, AltSep, ':'})))
 
-  proc isUNCPrefix(path: string): bool {.noSideEffect, raises: [].} =
-    path[0] == DirSep and path[1] == DirSep
-
   proc sameRoot(path1, path2: string): bool {.noSideEffect, raises: [].} =
     ## Return true if path1 and path2 have a same root.
     ##
@@ -330,40 +340,12 @@ when doslikeFileSystem:
     assert(isAbsolute(path1))
     assert(isAbsolute(path2))
 
-    let
-      len1 = path1.len
-      len2 = path2.len
-    assert(len1 != 0 and len2 != 0)
-
     if isAbsFromCurrentDrive(path1) and isAbsFromCurrentDrive(path2):
-      return true
-    elif len1 == 1 or len2 == 1:
-      return false
+      result = true
+    elif cmpIgnoreCase(splitDrive(path1).drive, splitDrive(path2).drive) == 0:
+      result = true
     else:
-      if path1[1] == ':' and path2[1] == ':':
-        return path1[0].toLowerAscii() == path2[0].toLowerAscii()
-      else:
-        var
-          p1, p2: PathIter
-          pp1 = next(p1, path1)
-          pp2 = next(p2, path2)
-        if pp1[1] - pp1[0] == 1 and pp2[1] - pp2[0] == 1 and
-           isUNCPrefix(path1) and isUNCPrefix(path2):
-          #UNC
-          var h = 0
-          while p1.hasNext(path1) and p2.hasNext(path2) and h < 2:
-            pp1 = next(p1, path1)
-            pp2 = next(p2, path2)
-            let diff = pp1[1] - pp1[0]
-            if diff != pp2[1] - pp2[0]:
-              return false
-            for i in 0..diff:
-              if path1[i + pp1[0]] !=? path2[i + pp2[0]]:
-                return false
-            inc h
-          return h == 2
-        else:
-          return false
+      result = false
 
 proc relativePath*(path, base: string, sep = DirSep): string {.
   rtl, extern: "nos$1".} =
@@ -384,7 +366,8 @@ proc relativePath*(path, base: string, sep = DirSep): string {.
   runnableExamples:
     assert relativePath("/Users/me/bar/z.nim", "/Users/other/bad", '/') == "../../me/bar/z.nim"
     assert relativePath("/Users/me/bar/z.nim", "/Users/other", '/') == "../me/bar/z.nim"
-    assert relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim"
+    when not doslikeFileSystem: # On Windows, UNC-paths start with `//`
+      assert relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim"
     assert relativePath("/Users/me/bar/z.nim", "/Users/me", '/') == "bar/z.nim"
     assert relativePath("", "/users/moo", '/') == ""
     assert relativePath("foo", ".", '/') == "foo"
@@ -497,6 +480,9 @@ proc parentDir*(path: string): string {.
       assert parentDir("a//./") == "."
       assert parentDir("a/b/c/..") == "a"
   result = pathnorm.normalizePath(path)
+  when doslikeFileSystem:
+    let (drive, splitpath) = splitDrive(result)
+    result = splitpath
   var sepPos = parentDirPos(result)
   if sepPos >= 0:
     result = substr(result, 0, sepPos)
@@ -507,6 +493,13 @@ proc parentDir*(path: string): string {.
     result = ""
   else:
     result = "."
+  when doslikeFileSystem:
+    if result.len == 0:
+      discard
+    elif drive.len > 0 and result.len == 1 and result[0] in {DirSep, AltSep}:
+      result = drive
+    else:
+      result = drive & result
 
 proc tailDir*(path: string): string {.
   noSideEffect, rtl, extern: "nos$1".} =
@@ -526,6 +519,10 @@ proc tailDir*(path: string): string {.
     assert tailDir("usr/local/bin") == "local/bin"
 
   var i = 0
+  when doslikeFileSystem:
+    let (drive, splitpath) = path.splitDrive
+    if drive != "":
+      return splitpath.strip(chars = {DirSep, AltSep}, trailing = false)
   while i < len(path):
     if path[i] in {DirSep, AltSep}:
       while i < len(path) and path[i] in {DirSep, AltSep}: inc i
@@ -544,6 +541,9 @@ proc isRootDir*(path: string): bool {.
     assert not isRootDir("/a")
     assert not isRootDir("a/b/c")
 
+  when doslikeFileSystem:
+    if splitDrive(path).path == "":
+      return true
   result = parentDirPos(path) < 0
 
 iterator parentDirs*(path: string, fromRoot=false, inclusive=true): string =
@@ -588,7 +588,11 @@ iterator parentDirs*(path: string, fromRoot=false, inclusive=true): string =
       current = current.parentDir
       yield current
   else:
-    for i in countup(0, path.len - 2): # ignore the last /
+    when doslikeFileSystem:
+      let start = path.splitDrive.drive.len
+    else:
+      const start = 0
+    for i in countup(start, path.len - 2): # ignore the last /
       # deal with non-normalized paths such as /foo//bar//baz
       if path[i] in {DirSep, AltSep} and
           (i == 0 or path[i-1] notin {DirSep, AltSep}):
@@ -608,11 +612,15 @@ proc `/../`*(head, tail: string): string {.noSideEffect.} =
       assert "a/b/c" /../ "d/e" == "a/b/d/e"
       assert "a" /../ "d/e" == "a/d/e"
 
+  when doslikeFileSystem:
+    let (drive, head) = splitDrive(head)
   let sepPos = parentDirPos(head)
   if sepPos >= 0:
     result = substr(head, 0, sepPos-1) / tail
   else:
     result = head / tail
+  when doslikeFileSystem:
+    result = drive / result
 
 proc normExt(ext: string): string =
   if ext == "" or ext[0] == ExtSep: result = ext # no copy needed here
@@ -680,7 +688,13 @@ proc splitFile*(path: string): tuple[dir, name, ext: string] {.
 
   var namePos = 0
   var dotPos = 0
-  for i in countdown(len(path) - 1, 0):
+  when doslikeFileSystem:
+    let (drive, _) = splitDrive(path)
+    let stop = len(drive)
+    result.dir = drive
+  else:
+    const stop = 0
+  for i in countdown(len(path) - 1, stop):
     if path[i] in {DirSep, AltSep} or i == 0:
       if path[i] in {DirSep, AltSep}:
         result.dir = substr(path, 0, if likely(i >= 1): i - 1 else: 0)
@@ -2541,8 +2555,13 @@ proc createDir*(dir: string) {.rtl, extern: "nos$1",
   ## * `moveDir proc`_
   var omitNext = false
   when doslikeFileSystem:
-    omitNext = isAbsolute(dir)
-  for i in 1.. dir.len-1:
+    var start = 1
+    if isAbsolute(dir):
+      omitNext = true
+      start = dir.splitDrive.drive.len + 1
+  else:
+    let start = 1
+  for i in start.. dir.len-1:
     if dir[i] in {DirSep, AltSep}:
       if omitNext:
         omitNext = false
diff --git a/lib/pure/pathnorm.nim b/lib/pure/pathnorm.nim
index 10a2a0b67..a71ae0762 100644
--- a/lib/pure/pathnorm.nim
+++ b/lib/pure/pathnorm.nim
@@ -29,10 +29,6 @@ proc next*(it: var PathIter; x: string): (int, int) =
   if not it.notFirst and x[it.i] in {DirSep, AltSep}:
     # absolute path:
     inc it.i
-    when doslikeFileSystem: # UNC paths have leading `\\`
-      if hasNext(it, x) and x[it.i] == DirSep and
-          it.i+1 < x.len and x[it.i+1] != DirSep:
-        inc it.i
   else:
     while it.i < x.len and x[it.i] notin {DirSep, AltSep}: inc it.i
   if it.i > it.prev:
@@ -56,10 +52,23 @@ proc isDotDot(x: string; bounds: (int, int)): bool =
 proc isSlash(x: string; bounds: (int, int)): bool =
   bounds[1] == bounds[0] and x[bounds[0]] in {DirSep, AltSep}
 
+when doslikeFileSystem:
+  import std/private/ntpath
+
 proc addNormalizePath*(x: string; result: var string; state: var int;
     dirSep = DirSep) =
   ## Low level proc. Undocumented.
 
+  when doslikeFileSystem: # Add Windows drive at start without normalization
+    var x = x
+    if result == "":
+      let (drive, file) = splitDrive(x)
+      x = file
+      result.add drive
+      for c in result.mitems:
+        if c in {DirSep, AltSep}:
+          c = dirSep
+
   # state: 0th bit set if isAbsolute path. Other bits count
   # the number of path components.
   var it: PathIter
diff --git a/lib/std/private/ntpath.nim b/lib/std/private/ntpath.nim
new file mode 100644
index 000000000..7c8661bb7
--- /dev/null
+++ b/lib/std/private/ntpath.nim
@@ -0,0 +1,61 @@
+# This module is inspired by Python's `ntpath.py` module.
+
+import std/[
+  strutils,
+]
+
+# Adapted `splitdrive` function from the following commits in Python source
+# code:
+# 5a607a3ee5e81bdcef3f886f9d20c1376a533df4 (2009): Initial UNC handling (by Mark Hammond)
+# 2ba0fd5767577954f331ecbd53596cd8035d7186 (2022): Support for "UNC"-device paths (by Barney Gale)
+#
+# FAQ: Why use `strip` below? `\\?\UNC` is the start of a "UNC symbolic link",
+# which is a special UNC form. Running `strip` differentiates `\\?\UNC\` (a UNC
+# symbolic link) from e.g. `\\?\UNCD` (UNCD is the server in the UNC path).
+func splitDrive*(p: string): tuple[drive, path: string] =
+  ## Splits a Windows path into a drive and path part. The drive can be e.g.
+  ## `C:`. It can also be a UNC path (`\\server\drive\path`).
+  ##
+  ## The equivalent `splitDrive` for POSIX systems always returns empty drive.
+  ## Therefore this proc is only necessary on DOS-like file systems (together
+  ## with Nim's `doslikeFileSystem` conditional variable).
+  ##
+  ## This proc's use case is to extract `path` such that it can be manipulated
+  ## like a POSIX path.
+  runnableExamples:
+    doAssert splitDrive("C:") == ("C:", "")
+    doAssert splitDrive(r"C:\") == (r"C:", r"\")
+    doAssert splitDrive(r"\\server\drive\foo\bar") == (r"\\server\drive", r"\foo\bar")
+    doAssert splitDrive(r"\\?\UNC\server\share\dir") == (r"\\?\UNC\server\share", r"\dir")
+
+  result = ("", p)
+  if p.len < 2:
+    return
+  const sep = '\\'
+  let normp = p.replace('/', sep)
+  if p.len > 2 and normp[0] == sep and normp[1] == sep and normp[2] != sep:
+
+    # is a UNC path:
+    # vvvvvvvvvvvvvvvvvvvv drive letter or UNC path
+    # \\machine\mountpoint\directory\etc\...
+    #           directory ^^^^^^^^^^^^^^^
+    let start = block:
+      const unc = "\\\\?\\UNC" # Length is 7
+      let idx = min(8, normp.len)
+      if unc == normp[0..<idx].strip(chars = {sep}, leading = false).toUpperAscii:
+        8
+      else:
+        2
+    let index = normp.find(sep, start)
+    if index == -1:
+      return
+    var index2 = normp.find(sep, index + 1)
+
+    # a UNC path can't have two slashes in a row (after the initial two)
+    if index2 == index + 1:
+      return
+    if index2 == -1:
+      index2 = p.len
+    return (p[0..<index2], p[index2..^1])
+  if p[1] == ':':
+    return (p[0..1], p[2..^1])