summary refs log tree commit diff stats
path: root/lib/pure
diff options
context:
space:
mode:
authorapense <apense@users.noreply.github.com>2015-06-08 19:48:57 -0400
committerapense <apense@users.noreply.github.com>2015-06-08 19:48:57 -0400
commit0ee1672d69272aa75cf9be15dede34773a4fa487 (patch)
tree0c8b567b249b033d5c5d6d83f290294069205aa2 /lib/pure
parentc4009c61820190c188f6bcf7469754b3c40201e5 (diff)
downloadNim-0ee1672d69272aa75cf9be15dede34773a4fa487.tar.gz
Updated whitespace ranges
Ranges sourced from <http://www.unicode.org/Public/7.0.0/ucd/PropList.txt>_. Wikipedia also uses these ranges on its information page <http://en.wikipedia.org/wiki/Whitespace_character#Unicode>_. 0xfeff isn't included in the list, but it is a no-break space, so I guess it makes sense. 0x200b is actually a format character, but it is a zero-width space. To fit Unicode, both 0x200b and 0xfeff would be removed.
Diffstat (limited to 'lib/pure')
-rw-r--r--lib/pure/unicode.nim10
1 files changed, 8 insertions, 2 deletions
diff --git a/lib/pure/unicode.nim b/lib/pure/unicode.nim
index 5fd3c2418..4446eaa0c 100644
--- a/lib/pure/unicode.nim
+++ b/lib/pure/unicode.nim
@@ -372,11 +372,17 @@ const
     0xfe74]  #
 
   spaceRanges = [
-    0x0009,  0x000a,  # tab and newline
+    0x0009,  0x000d,  # tab and newline
     0x0020,  0x0020,  # space
+    0x0085,  0x0085,  # next line
     0x00a0,  0x00a0,  #
-    0x2000,  0x200b,  #  -
+    0x1680,  0x1680,  # Ogham space mark
+    0x2000,  0x200b,  # en dash .. zero-width space
+    0x200e,  0x200f,  # LTR mark .. RTL mark (pattern whitespace)
     0x2028,  0x2029,  #  -     0x3000,  0x3000,  #
+    0x202f,  0x202f,  # narrow no-break space
+    0x205f,  0x205f,  # medium mathematical space
+    0x3000,  0x3000,  # ideographic space
     0xfeff,  0xfeff]  #
 
   toupperRanges = [