about summary refs log tree commit diff stats
path: root/tree-sitter/dsk/PLAN.md
blob: 8351854b3a7a7e69d57a559a36505a916e3cbeee (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
# Actionable Plan: DSL Development Kit (dsk)

This document outlines the complete implementation plan for `dsk`, a command-line tool designed to streamline the creation of Domain-Specific Languages (DSLs).


## **Technical Requirements**

**Target Platforms:** macOS and Unix-like systems (Linux). Windows support is not included in the initial scope.

**Tree-sitter Version:** Use the latest version available through standard package managers (npm or Homebrew). This ensures easy installation and access to the most current features.

**Project Naming:** No specific constraints on DSL project names. The tool will handle standard naming conventions for C libraries and npm packages automatically.


## **Core Philosophy**

The `dsk` tool is built on two guiding principles:

1. **Convention over Configuration:** The tool enforces a standardized project structure and build process, eliminating boilerplate and ensuring consistency across all DSL projects.

2. **Example-Driven Inference:** Instead of requiring users to write complex regular expressions, the tool infers grammar rules from simple, intuitive examples of the DSL's syntax. This dramatically lowers the barrier to entry for language design.


## **Phase 0: `dsk` Tool Setup & Prerequisites**

**Objective:** Establish the development environment and project structure for the `dsk` command-line tool itself.

**Technology Stack:**

- **Language:** TypeScript

- **Runtime:** Bun (with Node.js as fallback)

- **CLI Framework:** `commander.js`

- **Core Utilities:**

  - `execa`: For running shell commands (like `tree-sitter`, `gcc`, `bun`, `npm`).

  - `fs-extra`: For robust file system operations.

  - `chalk`: For creating colorful and readable console output.

  - `chokidar`: For the file-watching system in `dsk dev`.

  - `inquirer`: For handling the interactive command-line prompts.

**Action Steps:**

1. **Initialize Project:** Create a new Bun project for the `dsk` tool.

       mkdir dsk-cli && cd dsk-cli
       bun init -y

2. **Install Dependencies:**

       bun add typescript commander execa fs-extra chalk chokidar inquirer
       bun add -d @types/node @types/fs-extra @types/inquirer

3. **Configure TypeScript:** Run `bunx tsc --init` to create a `tsconfig.json` file.

4. **Define Project Structure:** Create the following directory structure for the `dsk` tool's source code.

       dsk-cli/
       ├── src/
       │   ├── commands/         # Logic for each command (new, build, etc.)
       │   ├── utils/            # Helper functions (inference engine, shell wrappers)
       │   └── index.ts          # Main CLI entrypoint
       ├── templates/
       │   ├── default/          # Base template for a new DSL project
       │   └── js-addon/         # Template for the JS native addon
       ├── package.json
       └── tsconfig.json

5. **Enable `dsk` Command:** In `package.json`, add a `bin` entry to link the `dsk` command and run `bun link` to make it available during development.

       "bin": { "dsk": "./dist/index.js" }


## **Phase 1: Interactive Grammar Scaffolding (`dsk new --interactive`)**

**Objective:** Create a command that interactively prompts the user for examples of their DSL and generates a starter `grammar.js` file from them.


### **User Experience & Interaction Design**

This is the most critical UX component. The interaction should feel like a helpful conversation, not an interrogation.

- **Greeting:** Start with a friendly welcome that explains the process.

  > "Welcome to the `dsk` grammar scaffolder! I'll ask you a few questions about your new language. Just provide examples, and I'll build a starter grammar for you."

- **Clear, Non-Technical Prompts:** Phrase questions simply. Instead of "Define the token for a single-line comment," use "How do you write a single-line comment? (e.g., `//` or `#`)".

- **Progress Indicators:** Use `chalk` to show progress after each successful step, making the process feel tangible.

  > `✔ Comments defined.` `✔ Identifiers defined.`

- **Real-time Confirmation:** After inferring a pattern, always confirm with the user. This builds trust and provides an opportunity for correction.

  > `? I've inferred the pattern for your identifier as: /[a-zA-Z_]\w*/. Does this look correct? (Y/n)`

- **Graceful Failure & Escape Hatches:** If inference fails, don't crash. Explain the problem and offer a "power-user" alternative. The inference system is designed as a helpful starting point - users are expected to manually edit the grammar as needed.

  > `I couldn't determine a pattern from those examples. Would you like to provide a custom regular expression instead? (y/N)`

- **Summary & Next Steps:** Conclude the session with a summary of what was created and clear instructions on what to do next.

  > "All done! Your `grammar.js` has been created with rules for: Comments, Identifiers, Numbers, Strings, and Variable Declarations. To start editing and testing, run: `dsk dev`"


### **Action Steps**

1. **Implement the Inference Engine (`src/utils/inference.ts`):**

   - Create an extensible library of common token patterns (e.g., for identifiers, integers, floats, hex codes) with high-quality regular expressions. Ship with solid defaults but allow for user customization.

   - Write a function `inferPattern(validExamples, invalidExamples)` that:

     - Takes arrays of valid and invalid example strings.

     - Iterates through the internal pattern library.

     - Returns the first pattern that matches all `validExamples` and none of the `invalidExamples`.

2. **Build the Interactive Command (`src/commands/new.ts`):**

   - Use the `inquirer` library to implement the **Question Flow** described in the UX section.

   - **Question Flow:**

     **Phase A: Language Architecture & Paradigm**

     - **Language Purpose:** "What is your language designed for? (e.g., configuration, scripting, domain modeling, data processing)"

     - **Programming Paradigm:** "What programming style does your language follow?"
       - Functional (immutable data, functions as first-class)
       - Object-Oriented (classes, inheritance, methods)
       - Procedural (step-by-step instructions, functions)
       - Declarative (describe what, not how)
       - Mixed (combination of above)

     - **Data Philosophy:** "How does your language handle data?"
       - Immutable by default (functional style)
       - Mutable variables (imperative style)
       - Mixed approach

     **Phase B: Core Language Features**

     - **Control Flow:** "What control structures does your language support?"
       - Conditionals (if/else, switch/match)
       - Loops (for, while, foreach)
       - Pattern matching
       - Exception handling (try/catch)
       - Early returns/breaks

     - **Data Structures:** "What built-in data structures does your language have?"
       - Arrays/Lists: `[1, 2, 3]`
       - Objects/Maps: `{key: value}`
       - Tuples: `(a, b, c)`
       - Sets: `{1, 2, 3}`
       - Custom structures

     - **Functions:** "How are functions defined in your language?"
       - Named functions: `function foo() { ... }`
       - Anonymous functions: `(x) => x + 1`
       - Methods on objects: `obj.method()`
       - First-class functions (can be passed around)

     **Phase C: Syntax & Tokens**

     - **Comments:** "How do you write a single-line comment? (e.g., `//` or `#`)"

     - **Identifiers:** "Provide 3-5 examples of a valid identifier (e.g., `myVar _val`)." -> "Now provide 2-3 examples of an invalid identifier (e.g., `1var my-var`)." -> Use the inference engine and confirm with the user.

     - **Numbers:** "Provide examples of numbers in your language (e.g., `42 3.14`)." -> Infer integer/float support.

     - **Strings:** "Provide an example of a string (e.g., `"hello"` or `'world'`)." -> Infer quote style.

     - **Variable Declarations:** "Show me how you declare a variable `x` with value `42`." -> Parse the example (e.g., `let x = 42;`) to identify the keyword (`let`), assignment operator (`=`), and statement terminator (`;`).

     - **Concrete Examples:** Based on the paradigm answers, ask for specific examples:
       - If OO: "Show me how you define a class with a method"
       - If Functional: "Show me how you define and call a function"
       - If Declarative: "Show me a typical declaration/rule in your language"

   - **Grammar Assembly:**

     - Collect the answers from the prompts.

     - **Paradigm-Aware Generation:** Use the architectural answers to influence grammar structure:
       - **Functional languages:** Prioritize expression-based rules, immutable bindings, function composition
       - **OO languages:** Generate class/method structures, inheritance syntax, member access
       - **Procedural languages:** Focus on statement sequences, mutable variables, imperative control flow
       - **Declarative languages:** Emphasize rule definitions, pattern matching, constraint syntax

     - **Feature-Driven Rules:** Generate Tree-sitter rules based on supported features:
       - Control flow structures (if/else, loops, match expressions)
       - Data structure literals (arrays, objects, tuples)
       - Function definition patterns (named, anonymous, methods)

     - Generate the corresponding Tree-sitter rule strings for each feature.

     - Assemble the rules into a complete, valid `grammar.js` file with appropriate precedence and associativity.

     - Save this generated grammar into the new project directory created by the standard `dsk new` logic.


## **Phase 2: Core Build Process (`dsk build`)**

**Objective:** Compile the `grammar.js` file into a C static library and a JavaScript package.

**Action Steps:**

1. **Implement the Build Command (`src/commands/build.ts`):**

   - Define the `dsk build` command using `commander`.

   - The command handler will orchestrate the following sub-tasks.

2. **Generate Parser:** Use `execa` to run `tree-sitter generate`. This is the first step and is required by all subsequent steps.

3. **Build C Library:**

   - Create `generated/c/lib` and `generated/c/include` directories.

   - Use `execa` to call the system's C compiler (e.g., `gcc` or `clang`) to compile `src/parser.c` into an object file.

   - Use `execa` to call the system's archiver (`ar`) to bundle the object file into a static library (`lib<dsl-name>.a`).

   - Generate a corresponding header file (`<dsl-name>.h`) with the function signature to access the language.

4. **Build JavaScript Package:**

   - Copy the template files from `dsk-cli/templates/js-addon/` into a `generated/js/` directory. This template contains:

     - `package.json`: Defines the JS package with placeholder "__DSL_NAME__" for dynamic replacement, includes `"gypfile": true` and `"install": "node-gyp rebuild"` script

     - `binding.gyp`: Build configuration for node-gyp, specifies sources including `../../src/parser.c` and `../../src/scanner.c`, includes N-API headers

     - `bindings.cc`: C++ bridge file that exports the Tree-sitter language function using node-addon-api

     - `index.js`: Simple entry point that loads the compiled addon and exports the language object

   - Dynamically update the `package.json` inside `generated/js/` with the correct DSL name, replacing "__DSL_NAME__" placeholder.

   - **Implement Runtime Detection:**

     - Check if `bun` is available on the system's `PATH`.

     - If yes, set the package manager to `bun`.

     - If no, fall back to `npm`.

   - Use `execa` to run `bun install` or `npm install` within the `generated/js/` directory. This triggers `node-gyp` to compile the native addon, resulting in a `.node` file.

   - **Note:** The `dsk` tool itself runs on Bun, but the generated JavaScript packages support both Bun and Node.js runtimes.


## **Phase 3: Development Workflow (`dsk dev` & `dsk test`)**

**Objective:** Create a fast, iterative development loop for the DSL designer.

**Action Steps:**

1. **Implement Test Command (`src/commands/test.ts`):**

   - Create a simple wrapper that calls `tree-sitter test` using `execa`, ensuring its output is streamed directly to the user's console.

   - Include minimal integration tests for generated packages to verify basic functionality.

2. **Implement Dev Command (`src/commands/dev.ts`):**

   - Use the `chokidar` library to watch the `grammar.js` file for changes.

   - On initial startup, run a full `dsk build` and `dsk test` cycle and display the results.

   - When a change to `grammar.js` is detected:

     - Log a message to the console (e.g., "Change detected. Rebuilding...").

     - Trigger the `build` and `test` logic again.

     - Report success or failure clearly to the user.


## **Phase 4: Editor Integration (`dsk highlight`)**

**Objective:** Generate syntax highlighting configurations for popular editors.

**Action Steps:**

1. **Implement Highlight Generation (`src/commands/highlight.ts`):**

   - **Generate Tree-sitter Highlight Queries:** Create `highlights.scm` files by mapping grammar rules to semantic categories (keywords, strings, comments, etc.).

   - **Editor Configuration Templates:** Provide configuration templates for:

     - **Neovim:** Tree-sitter configuration with `highlights.scm`
     - **Emacs:** `tree-sitter-mode` setup with language registration
     - **Micro:** Syntax file generation using Tree-sitter queries
     - **Helix:** Native Tree-sitter integration with `highlights.scm`
     - **Zed:** Tree-sitter language configuration
     - **VS Code:** TextMate grammar generation and basic language configuration

   - **Automatic Rule Mapping:** Map common grammar patterns to highlight categories:

     - Keywords (`if`, `let`, `function`) → `@keyword`
     - String literals → `@string`
     - Comments → `@comment`
     - Numbers → `@number`
     - Identifiers → `@variable`

2. **Output Structure:** Create `generated/editors/` directory with:

       generated/editors/
       ├── tree-sitter/
       │   └── highlights.scm          # Core highlight queries
       ├── neovim/
       │   └── setup-instructions.md   # Installation guide
       ├── emacs/
       │   └── dsl-mode.el            # Basic major mode
       ├── micro/
       │   └── dsl.yaml               # Micro syntax file
       ├── helix/
       │   └── languages.toml         # Helix language configuration
       ├── zed/
       │   └── config.json            # Zed language setup
       └── vscode/
           ├── syntaxes/
           │   └── dsl.tmLanguage.json # TextMate grammar
           └── language-configuration.json

3. **Optional Advanced Features (Future):**

   - **Basic LSP Server Template:** Tree-sitter-powered language server for symbol navigation and basic diagnostics
   - **Custom Linter Generator:** Use Tree-sitter queries to create pattern-based linters
   - **Grammar Documentation:** Auto-generate railroad diagrams and syntax examples

**Note:** This feature leverages Tree-sitter's existing editor ecosystem, keeping implementation complexity low while providing significant value.


## **Phase 5: Packaging & Distribution (`dsk package`)**

**Objective:** Create final, distributable artifacts for the C and JavaScript targets.

**Action Steps:**

1. **Implement Package Command (`src/commands/package.ts`):**

   - Ensure artifacts are current by first running the full `dsk build` logic.

   - Create a top-level `dist/` directory.

   - **Package C Library:** Use a library like `archiver` to create a `.zip` file containing the `generated/c/include` and `generated/c/lib` directories.

   - **Package JS Library:**

     - Run `bun pack` or `npm pack` inside the `generated/js/` directory (prefer `bun pack` when available).

     - Move the resulting `.tgz` package from `generated/js/` into the top-level `dist/` directory.

   - Log a success message showing the final paths of the created package files.