summary refs log tree commit diff stats
path: root/lib/pure/pathnorm.nim
diff options
context:
space:
mode:
authorhavardjohn <havard.mjaavatten@outlook.com>2022-09-04 02:47:09 +0200
committerGitHub <noreply@github.com>2022-09-03 20:47:09 -0400
commit23e0160af283bb0bb573a86145e6c1c792780d49 (patch)
treed95dbd0c940f44d3265af57073f78ba1bb6934dd /lib/pure/pathnorm.nim
parenta6189fbb988ae9c9e6760cb901e792e043b9086b (diff)
downloadNim-23e0160af283bb0bb573a86145e6c1c792780d49.tar.gz
Add improved Windows UNC path support in std/os (#20281)
* Add improved Windows UNC path support in std/os

Original issue: `std/os.createDir` tries to create every component of
the given path as a directory. The problem is that `createDir`
interprets every backslash/slash as a path separator. For a UNC path
this is incorrect. E.g. one UNC form is `\\Server\Volume\Path`. It's an
error to create the `\\Server` directory, as well as creating
`\\Server\Volume`.

Add `ntpath.nim` module with `splitDrive` proc. This implements UNC path
parsing as implemented in the Python `ntpath.py` module. The following
UNC forms are supported:

* `\\Server\Volume\Path`
* `\\?\Volume\Path`
* `\\?\UNC\Server\Volume\Path`

Improves support for UNC paths in various procs in `std/os`:
---

* pathnorm.addNormalizePath
  * Issue: This had incomplete support for UNC paths
    * The UNC prefix (first 2 characters of a UNC path) was assumed to
      be exactly `\\`, but it can be `//` and `\/`, etc. as well
    * Also, the UNC prefix must be normalized to the `dirSep` argument
      of `addNormalizePath`
  * Resolution: Changed to account for different UNC prefixes, and
    normalizing the prefixes according to `dirSep`
    * Affected procs that get tests: `relativePath`, `joinPath`
  * Issue: The server/volume part of UNC paths can be stripped when
    normalizing `..` path components
    * This error should be negligable, so ignoring this
* splitPath
  * Now make sure the UNC drive is not split; return the UNC drive as
    `head` if the UNC drive is the only component of the path
  * Consequently fixes `extractFilename`, `lastPathPart`
* parentDir / `/../`
  * Strip away drive before working on the path, prepending the drive
    after all work is done - prevents stripping UNC components
  * Return empty string if drive component is the only component; this
    is the behavior for POSIX paths as well
  * Alternative implementation: Just call something like
    `pathnorm.normalizePath(path & "/..")` for the whole proc - maybe
    too big of a change
* tailDir
  * If drive is present in path, just split that from path and return
    path
* parentDirs iterator
  * Uses `parentDir` for going backwards
  * When going forwards, first `splitDrive`, yield the drive field, and
    then iterate over path field as normal
* splitFile
  * Make sure path parsing stops at end of drive component
* createDir
  * Fixed by skipping drive part before creating directories
  * Alternative implementation: use `parentDirs` iterator instead of
    iterating over characters
    * Consequence is that it will try to create the root directory
* isRootDir
  * Changed to treat UNC drive alone as root (e.g. "//?/c:" is root)
  * This change prevents the empty string being yielded by the
    `parentDirs` iterator with `fromRoot = false`
* Internal `sameRoot`
  * The "root" refers to the drive, so `splitDrive` can be used here

This adds UNC path support to all procs that could use it in std/os. I
don't think any more work has to be done to support UNC paths. For the
future, I believe the path handling code can be refactored due to
duplicate code. There are multiple ways of manipulating paths, such as
manually searching string for path separator and also having a path
normalizer (pathnorm.nim). If all path manipulation used `pathnorm.nim`,
and path component splitting used `parentDirs` iterator, then a lot of
code could be removed.

Tests
---

Added test file for `pathnorm.nim` and `ntpath.nim`.
`pathnorm.normalizePath` has no tests, so I'm adding a few unit tests.
`ntpath.nim` contains tests copied from Python's test suite.

Added integration tests to `tos.nim` that tests UNC paths.

Removed incorrect `relativePath` runnableExamples from being tested on Windows:
---

`relativePath("/Users///me/bar//z.nim", "//Users/", '/') == "me/bar/z.nim"`

This is incorrect on Windows because the `/` and `//` are not the same
root. `/` (or `\`) is expanded to the drive in the current working
directory (e.g. `C:\`). `//` (or `\\`), however, are the first two
characters of a UNC path. The following holds true for normal Windows
installations:

* `dirExists("/Users") != dirExists("//Users")`
* `dirExists("\\Users") != dirExists("\\\\Users")`

Fixes #19103

Questions:
---

* Should the `splitDrive` proc be in `os.nim` instead with copyright
  notice above the proc?
* Is it fine to put most of the new tests into the `runnableExamples`
  section of the procs in std/os?

* [skipci] Apply suggestions from code review

Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>

* [skip ci] Update lib/pure/os.nim

Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>

* Move runnableExamples tests in os.nim to tos.nim

* tests/topt_no_cursor: Change from using splitFile to splitDrive

`splitFile` can no longer be used in the test, because it generates
different ARC code on Windows and Linux. This replaces `splitFile` with
`splitDrive`, because it generates same ARC code on Windows and Linux,
and returns a tuple. I assume the test wants a proc that returns a
tuple.

* Drop copyright attribute to Python

Co-authored-by: Clay Sweetser <Varriount@users.noreply.github.com>
Diffstat (limited to 'lib/pure/pathnorm.nim')
-rw-r--r--lib/pure/pathnorm.nim17
1 files changed, 13 insertions, 4 deletions
diff --git a/lib/pure/pathnorm.nim b/lib/pure/pathnorm.nim
index 10a2a0b67..a71ae0762 100644
--- a/lib/pure/pathnorm.nim
+++ b/lib/pure/pathnorm.nim
@@ -29,10 +29,6 @@ proc next*(it: var PathIter; x: string): (int, int) =
   if not it.notFirst and x[it.i] in {DirSep, AltSep}:
     # absolute path:
     inc it.i
-    when doslikeFileSystem: # UNC paths have leading `\\`
-      if hasNext(it, x) and x[it.i] == DirSep and
-          it.i+1 < x.len and x[it.i+1] != DirSep:
-        inc it.i
   else:
     while it.i < x.len and x[it.i] notin {DirSep, AltSep}: inc it.i
   if it.i > it.prev:
@@ -56,10 +52,23 @@ proc isDotDot(x: string; bounds: (int, int)): bool =
 proc isSlash(x: string; bounds: (int, int)): bool =
   bounds[1] == bounds[0] and x[bounds[0]] in {DirSep, AltSep}
 
+when doslikeFileSystem:
+  import std/private/ntpath
+
 proc addNormalizePath*(x: string; result: var string; state: var int;
     dirSep = DirSep) =
   ## Low level proc. Undocumented.
 
+  when doslikeFileSystem: # Add Windows drive at start without normalization
+    var x = x
+    if result == "":
+      let (drive, file) = splitDrive(x)
+      x = file
+      result.add drive
+      for c in result.mitems:
+        if c in {DirSep, AltSep}:
+          c = dirSep
+
   # state: 0th bit set if isAbsolute path. Other bits count
   # the number of path components.
   var it: PathIter