README.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186

#+HTML_HEAD: <link rel="stylesheet" href="../../static/style.css">
#+HTML_HEAD: <link rel="icon" href="../../static/grus/favicon.png" type="image/png">
#+EXPORT_FILE_NAME: index
#+OPTIONS: toc:nil
#+TOC: headlines 3
#+TITLE: Grus

Grus is a simple word unjumbler written in Go.

| Project Home    | [[https://andinus.nand.sh/grus/][Grus]]           |
| Source Code     | [[https://tildegit.org/andinus/grus][Andinus / Grus]] |
| GitHub (Mirror) | [[https://github.com/andinus/grus][Grus - GitHub]]  |

*Tested on*:
- OpenBSD 6.6 (with /pledge/ & /unveil/)

* Documentation
| Demo Video  | System Information                 |
|-------------+------------------------------------|
| [[https://diode.zone/videos/watch/515e2528-a731-4c73-a0da-4f8da21a90c0][Grus v0.2.0]] | OpenBSD 6.6 (with /pledge/ & /unveil/) |

Grus stops the search as soon as it unjumbles the word, so no anagrams are
returned & maybe all dictionaries were not searched. However, this behaviour can
be changed with two environment variables documented below.

*Note*: If grus couldn't unjumble the word with first dictionary then it'll search
in next dictionary, search stops once the word gets unjumbled.

| Environment variable | Explanation                | Non-default values |
|----------------------+----------------------------+--------------------|
| =GRUS_SEARCH_ALL=      | Search in all dictionaries | 1 / true           |
| =GRUS_ANAGRAMS=        | Print all anagrams         | 1 / true           |
** Default Dictionaries
These files will be checked by default (in order).
- =/usr/local/share/dict/words=
- =/usr/local/share/dict/web2=
- =/usr/share/dict/words=
- =/usr/share/dict/web2=
- =/usr/share/dict/special/4bsd=
- =/usr/share/dict/special/math=
** Examples
#+BEGIN_SRC sh
# unjumble word
grus word

# print all anagrams
GRUS_ANAGRAMS=true grus word

# search for word in all dictionaries
GRUS_SEARCH_ALL=true grus word

# search for word in custom dictionaries too
grus word /path/to/dict1 /path/to/dict2

# search for word in all dictionaries
GRUS_SEARCH_ALL=1 grus word /path/to/dict1 /path/to/dict2

# search for word in all dictionaries & print all anagrams
GRUS_SEARCH_ALL=1 GRUS_ANAGRAMS=1 grus word
#+END_SRC
* Installation
** Pre-built binaries
Pre-built binaries are available for OpenBSD, FreeBSD, NetBSD, DragonFly BSD,
Linux & macOS.

This will just print the steps to install grus & you have to run those commands
manually. Piping directly to =sh= is not a good idea, don't run this unless you
understand what you're doing.
*** v0.2.0
#+BEGIN_SRC sh
curl -s https://tildegit.org/andinus/grus/raw/tag/v0.2.0/scripts/install.sh | sh
#+END_SRC
** Post install
You need to have a dictionary for grus to work, if you don't have one then you
can download the Webster's Second International Dictionary, all 234,936 words
worth. The 1934 copyright has lapsed.
#+BEGIN_SRC sh
curl -L -o /usr/local/share/dict/web2 \
     https://archive.org/download/grus-v0.2.0/web2
#+END_SRC

There is also another big dictionary with around half a million english words.
I'm not allowed to distribute it, you can get it directly from GitHub.
#+BEGIN_SRC sh
curl -o /usr/local/share/dict/words \
     https://raw.githubusercontent.com/dwyl/english-words/master/words.txt
#+END_SRC
* History
Initial version of Grus was just a simple shell script that used the slowest
method of unjumbling words, it checked every permutation of the word with all
words in the file with same length.

Later I rewrote the above logic in python, I wanted to use a better method. Next
version used logic similar to the current one. It still had to iterate through
all the words in the file but it eliminated lots of cases very quickly so it was
faster. It first used the length check then it used this little thing to match
the words.

#+BEGIN_SRC python
import collections

match = lambda s1, s2: collections.Counter(s1) == collections.Counter(s2)
#+END_SRC

I don't understand how it works but it's fast, faster than convert the string to
list & sorting the list. Actually I did that initially & you'll still find it in
grus-add script.

#+BEGIN_SRC python
lexical = ''.join(sorted(word))
if word == lexical:
    print(word)
#+END_SRC

This is equivalent to lexical.SlowSort in current version.

#+BEGIN_SRC go
package lexical

import (
	"sort"
	"strings"
)

// SlowSort returns string in lexical order. This function is slower
// than Lexical.
func SlowSort(word string) (sorted string) {
	// Convert word to a slice, sort the slice.
	t := strings.Split(word, "")
	sort.Strings(t)

	sorted = strings.Join(t, "")
	return
}
#+END_SRC

Next version was also in python & it was stupid, for some reason using a
database didn't cross my mind then. It sorted the word & then created a file
with name as lexical order of that word (if word is "test" then filename would
be "estt"), and it appended the word to that file.

It took user input & sorted the word, then it just had to print the file (if
word is "test" then it had to print "estt"). This was a lot faster than
iterating through all the words but we had to prepare the files before we could
do this.

This was very stupid because the dictionary I was using had around 1/2 million
words so this meant we got around half a million files, actually less than that
because anagrams got appended into a single file but it was still a lot of small
files. Handling that many small files is stupid.

I don't have previous versions of this program. I decided to rewrite this in Go,
this version does things differently & is faster than all previous versions.
Currently we first sort the word in lexical order, we do that by converting the
string to =[]rune= & sorting it, this is faster than lexical.SlowSort.
lexical.SlowSort converts the string to =[]string= & sorts it.

#+BEGIN_SRC go
package lexical

import "sort"

// Sort takes a string as input and returns the lexical order.
func Sort(word string) (sorted string) {
	// Convert the string to []rune.
	var r []rune
	for _, char := range word {
		r = append(r, char)
	}

	sort.Slice(r, func(i, j int) bool {
		return r[i] < r[j]
	})

	sorted = string(r)
	return
}
#+END_SRC

Instead of creating lots of small files, entries are stored in a sqlite3
database.

This was true till v0.1.0, v0.2.0 was rewritten & it dropped the use of database
or any form of pre-parsing the dictionary. Instead it would look through each
line of dictionary & unjumble the word, while this may be slower than previous
version but this is simpler.