Nim - This repository contains the Nim compiler, Nim's stdlib, tools, and documentation. (mirror)

summary refs log tree commit diff stats

path: root/lib/pure/unidecode/unidecode.nim

	Commit message (Expand)	Author	Age	Files	Lines
*	fix unidecode.unidecode example input string	Julian Fondren	2019-04-20	1	-1/+1
*	fixes #8768 properly	Araq	2018-08-30	1	-3/+3
*	unidecode module: change the default to: embed resource file into the applica...	Araq	2018-08-30	1	-10/+10
*	fixes #8768	Araq	2018-08-30	1	-2/+2
*	even more strict isNil handling for strings/seqs in order to detect bugs	Araq	2018-08-22	1	-1/+0
*	lib: Trim .nim files trailing whitespace	Adam Strzelecki	2015-09-04	1	-11/+11
*	Turn some test outputs into actual tests	Oleh Prypin	2015-04-21	1	-2/+1
*	Don't run non-test code when defined(testing)	Oleh Prypin	2015-04-21	1	-1/+2
*	big rename	Araq	2014-08-27	1	-2/+2
*	Removes executable bit for text files.	Grzegorz Adam Hankiewicz	2013-03-16	1	-0/+0
*	year 2012 for most copyright headers	Araq	2012-01-02	1	-1/+1
*	further steps to get rid of deprecated endOfFile and readLine	Araq	2011-11-29	1	-1/+1
*	small bugfixes to make more tests green	Araq	2011-11-02	1	-1/+2
*	slurp uses path; unidecode is improved and threadsafe	Araq	2011-10-08	1	-19/+27
*	version 0.8.8	Andreas Rumpf	2010-03-14	1	-2/+2
*	fixed pango/pangoutils new wrappers	Andreas Rumpf	2010-02-26	1	-0/+0
*	continued work on html/xmlparser	rumpf_a@web.de	2010-02-14	1	-0/+0
*	further progress on the new XML processing modules	Andreas Rumpf	2010-02-12	1	-0/+0
*	more enhancements for the lib	Andreas Rumpf	2010-02-08	1	-0/+65

an class="nt">H2>Test of invalid NCRs 128-159</H2> <P> Authoring tools on MS Windows, in particular MS FrontPage ("WYSIWYG" HTML editor), generate invalid <DFN>Numerical Character References</DFN> for characters commonly found in positions 128...159 (0x80...0x9f) in Windows fonts. Although these are valid codepoints for <em>windows-1252</em> (and other windows-xxxx) charsets, valid NCRs always refer to the document character set in the SGML sense, not to the character encoding scheme (or charset). For HTML, the SGML document character set is fixed, it is always a subset of Unicode (or ISO 10646). In Unicode and its iso-8859-1 subset, values 128...159 are C1 control characters, they must not appear in HTML. Valid NCRs for the intended characters use Unicode values greater than 256. <p> Lynx tries to interpret some of the invalid codes, by assuming that they are windows-1252 codepoints. <PRE> You may want to press '\' to view the source of this test. <em>Code invalid NCR  <tab id=c>valid NCR, description</em> <em> normal in ALT <a id=table></a> </em> 0x80  <IMG SRC=X ALT=""> <tab to=c>€ #EURO SIGN 0x81  <IMG SRC=X ALT="">  #NOT USED 0x82  <IMG SRC=X ALT=""> <tab to=c>‚ #SINGLE LOW-9 QUOTATION MARK 0x83  <IMG SRC=X ALT=""> <tab to=c>ƒ #LATIN SMALL LETTER F WITH HOOK 0x84  <IMG SRC=X ALT=""> <tab to=c>„ #DOUBLE LOW-9 QUOTATION MARK 0x85  <IMG SRC=X ALT=""> <tab to=c>… #HORIZONTAL ELLIPSIS 0x86  <IMG SRC=X ALT=""> <tab to=c>† #DAGGER 0x87  <IMG SRC=X ALT=""> <tab to=c>‡ #DOUBLE DAGGER 0x88  <IMG SRC=X ALT=""> <tab to=c>ˆ #MODIFIER LETTER CIRCUMFLEX ACCENT 0x89  <IMG SRC=X ALT=""> <tab to=c>‰ #PER MILLE SIGN 0x8a  <IMG SRC=X ALT=""> <tab to=c>Š #LATIN CAPITAL LETTER S WITH CARON 0x8b  <IMG SRC=X ALT=""> <tab to=c>‹ #SINGLE LEFT-POINTING ANGLE QUOTATION MARK 0x8c  <IMG SRC=X ALT=""> <tab to=c>Œ #LATIN CAPITAL LIGATURE OE 0x8d  <IMG SRC=X ALT="">  #NOT USED 0x8e  <IMG SRC=X ALT="">  #NOT USED 0x8f  <IMG SRC=X ALT="">  #NOT USED 0x90  <IMG SRC=X ALT="">  #NOT USED 0x91  <IMG SRC=X ALT=""> <tab to=c>‘ #LEFT SINGLE QUOTATION MARK 0x92  <IMG SRC=X ALT=""> <tab to=c>’ #RIGHT SINGLE QUOTATION MARK 0x93  <IMG SRC=X ALT=""> <tab to=c>“ #LEFT DOUBLE QUOTATION MARK 0x94  <IMG SRC=X ALT=""> <tab to=c>” #RIGHT DOUBLE QUOTATION MARK 0x95  <IMG SRC=X ALT=""> <tab to=c>• #BULLET 0x96  <IMG SRC=X ALT=""> <tab to=c>– #EN DASH 0x97  <IMG SRC=X ALT=""> <tab to=c>— #EM DASH 0x98  <IMG SRC=X ALT=""> <tab to=c>˜ #SMALL TILDE 0x99  <IMG SRC=X ALT=""> <tab to=c>™ #TRADE MARK SIGN 0x9a  <IMG SRC=X ALT=""> <tab to=c>š #LATIN SMALL LETTER S WITH CARON 0x9b  <IMG SRC=X ALT=""> <tab to=c>› #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 0x9c  <IMG SRC=X ALT=""> <tab to=c>œ #LATIN SMALL LIGATURE OE 0x9d  <IMG SRC=X ALT="">  #NOT USED 0x9e  <IMG SRC=X ALT="">  #NOT USED 0x9f  <IMG SRC=X ALT=""> <tab to=c>Ÿ #LATIN CAPITAL LETTER Y WITH DIAERESIS </PRE> </BODY> </HTML>