summary refs log tree commit diff stats
path: root/blog/asm/1.html
blob: 124d81d9a5645cc1b5cf2db350ee6765de13377f (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2024-03-22 Fri 14:08 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>x86 Assembly from my understanding</title>
<meta name="author" content="Crystal" />
<meta name="generator" content="Org Mode" />
<link rel="stylesheet" type="text/css" href="../../src/css/colors.css"/>
<link rel="stylesheet" type="text/css" href="../../src/css/style.css"/>
<link rel="icon" type="image/x-icon" href="../../../favicon.png">
</head>
<body>
<div id="org-div-home-and-up">
 <a accesskey="h" href=""> UP </a>
 |
 <a accesskey="H" href="https://crystal.tilde.institute/"> HOME </a>
</div><div id="content" class="content">
<h1 class="title">x86 Assembly from my understanding</h1>
<p>
Soooo this article (or maybe even a series of articles, who knows ?) will be about x86 assembly, or rather, what I understood from it and my road from the bottom-up hopefully reaching a good level of understanding
</p>
<div id="outline-container-orgb0eec26" class="outline-2">
<h2 id="orgb0eec26">Memory :</h2>
<div class="outline-text-2" id="text-orgb0eec26">
<p>
Memory is a sequence of octets (Aka 8bits) that each have a unique integer assigned to them called <b>The Effective Address (EA)</b>, in this particular CPU Architecture (the i8086), the octet is designated by a couple (A segment number, and the offset in the segment)
</p>


<ul class="org-ul">
<li>The Segment is a set of 64 consecutive Koctets (1 Koctet = 1024 octets).</li>
<li>And the offset is to specify the particular octet in that segment.</li>
</ul>

<p>
The offset and segment are encoded in 16bits, so they take a value between 0 and 65535
</p>
</div>
<div id="outline-container-org57cd217" class="outline-4">
<h4 id="org57cd217">Important :</h4>
<div class="outline-text-4" id="text-org57cd217">
<p>
The relation between the Effective Address and the Segment &amp; Offset is as follow :
</p>

<p>
<b><b>Effective address = 16 x segment + offset</b></b> keep in mind that this equation is encoded in decimal, which will change soon as we use Hexadecimal for convention reasons.
</p>
</div>
<ul class="org-ul">
<li><a id="orgcbdf7c0"></a>Example :<br />
<div class="outline-text-5" id="text-orgcbdf7c0">
<p>
Let the Physical address (Or Effective Address, these two terms are interchangeable) <b>12345h</b> (the h refers to Hexadecimal, which can also be written like this <b>0x12345</b>), the register <b>DS = 1230h</b> and the register <b>SI = 0045h</b>, the CPU calculates the physical address by multiplying the content of the segment register <b>DS</b> by 10h (or 16) and adding the content of the register <b>SI</b>. so we get : <b>1230h x 10h + 45h = 12345h</b>
</p>


<p>
Now if you are a clever one ( I know you are, since you are reading this &lt;3 ) you may say that the physical address <b>12345h</b> can be written in more than one way&#x2026;.and you are right, more precisely : <b>2<sup>12</sup> = 4096</b> different ways !!!
</p>
</div>
</li>
</ul>
</div>
<div id="outline-container-org758f05f" class="outline-3">
<h3 id="org758f05f">Registers</h3>
<div class="outline-text-3" id="text-org758f05f">
<p>
The 8086 CPU has 14 registers of 16bits of size. From the POV of the user, the 8086 has 3 groups of 4 registers of 16bits. One state register of 9bits and a counting program of 16bits inaccessible to the user (whatever this means).
</p>
</div>
<div id="outline-container-orgbb2bc8f" class="outline-4">
<h4 id="orgbb2bc8f">General Registers</h4>
<div class="outline-text-4" id="text-orgbb2bc8f">
<p>
General registers contribute to arithmetic&rsquo;s and logic and addressing too.
</p>


<p>
Each half-register is accessible as a register of 8bits, therefor making the 8086 backwards compatible with the 8080 (which had 8bit registers)
</p>


<p>
Now here are the Registers we can find in this section:
</p>


<p>
<b>AX</b>: This is the accumulator. It is of 16 bits and is divided into two 8-bit registers AH and AL to also perform 8-bit instructions. It is generally used for arithmetical and logical instructions but in 8086 microprocessor it is not mandatory to have an accumulator as the destination operand. Example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">ADD</span> <span style="color: #cba6f7;">AX</span>, AX <span style="color: #6c7086;">;</span><span style="color: #6c7086;">(AX = AX + AX)</span>
</pre>
</div>

<p>
<b>BX</b>: This is the base register. It is of 16 bits and is divided into two 8-bit registers BH and BL to also perform 8-bit instructions. It is used to store the value of the offset. Example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">BL</span>, [<span style="color: #fab387;">500</span>] <span style="color: #6c7086;">;</span><span style="color: #6c7086;">(BL = 500H)</span>
</pre>
</div>

<p>
<b>CX</b>: This is the counter register. It is of 16 bits and is divided into two 8-bit registers CH and CL to also perform 8-bit instructions. It is used in looping and rotation. Example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">CX</span>, <span style="color: #fab387;">0005</span>
<span style="color: #89b4fa;">LOOP</span>
</pre>
</div>

<p>
<b>DX</b>: This is the data register. It is of 16 bits and is divided into two 8-bit registers DH and DL to also perform 8-bit instructions. It is used in the multiplication and input/output port addressing. Example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MUL</span> <span style="color: #cba6f7;">BX</span> (DX, AX = AX * BX)
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-org6de3be7" class="outline-3">
<h3 id="org6de3be7">Addressing and registers&#x2026;again</h3>
<div class="outline-text-3" id="text-org6de3be7">
</div>
<div id="outline-container-orgfe32dc7" class="outline-4">
<h4 id="orgfe32dc7">I realized what I wrote here before was almost gibberish, sooo here we go again I guess ?</h4>
<div class="outline-text-4" id="text-orgfe32dc7">
<p>
Well lets take a step back to the notion of effective addresses VS relative ones.
</p>
</div>
</div>
<div id="outline-container-org471cf7b" class="outline-4">
<h4 id="org471cf7b">Effective = 10h x Segment + Offset . Part1</h4>
<div class="outline-text-4" id="text-org471cf7b">
<p>
When trying to access a specific memory space, we use this annotation <b>[Segment:Offset]</b>, so for example, and assuming <b>DS = 0100h</b>. We want to write the value <b>0x0005</b> to the memory space defined by the physical address <b>1234h</b>, what do we do ?
</p>
</div>
<ul class="org-ul">
<li><a id="orgf021c83"></a>Answer :<br />
<div class="outline-text-5" id="text-orgf021c83">
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> [DS:0234h], 0x0005
</pre>
</div>

<p>
Why ? Let&rsquo;s break it down :
</p>



<div id="orgf6af9f9" class="figure">
<p><img src="../../src/gifs/lain-dance.gif" alt="lain-dance.gif" />
</p>
</div>


<p>
We Already know that <b>Effective = 10h x Segment + Offset</b>, So here we have : <b>1234h = 10h x DS + Offset</b>, we already know that <b>DS = 0100h</b>, we end up with this simple equation <b>1234h = 1000h + Offset</b>, therefor the Offset is <b>0234h</b>
</p>


<p>
Simple, right ?, now for another example
</p>
</div>
</li>
</ul>
</div>
<div id="outline-container-org87609ad" class="outline-4">
<h4 id="org87609ad">Another example :</h4>
<div class="outline-text-4" id="text-org87609ad">
<p>
What if we now have this instruction ?
</p>
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">MOV</span> [0234h], 0x0005
</pre>
</div>
<p>
What does it do ? You might or might not be surprised that it does the exact same thing as the other snipped of code, why though ? Because apparently and for some odd reason I don&rsquo;t know, the compiler Implicitly assumes that the segment used is the <b>DS</b> one. So if you don&rsquo;t specify a register( we will get to this later ), or a segment. Then the offset is considered an offset with a DS segment.
</p>
</div>
</div>
<div id="outline-container-org6254e46" class="outline-4">
<h4 id="org6254e46">Segment + Register &lt;3</h4>
<div class="outline-text-4" id="text-org6254e46">
<p>
Consider <b>DS = 0100h</b> and <b>BX = BP = 0234h</b> and this code snippet:
</p>
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">MOV</span> [BX], 0x0005 <span style="color: #6c7086;">; </span><span style="color: #a6e3a1; font-weight: bold;">NOTE</span><span style="color: #6c7086;"> : ITS NOT THE SAME AS MOV BX, 0x0005. Refer to earlier paragraphs</span>
</pre>
</div>


<p>
Well you guessed it right, it also does the same thing, but now consider this :
</p>
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">MOV</span> [BP], 0x0005
</pre>
</div>

<p>
If you answered that its the same one, you are wrong. And this is because the segment used changes according to the offset as I said before in an implicit way. Here is the explicit equivalent of the two commands above:
</p>
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">MOV</span> [DS:BX], 0x0005
    <span style="color: #cba6f7;">MOV</span> [SS:BP], 0x0005
</pre>
</div>

<p>
The General rule of thumb is as follows :
</p>
<ul class="org-ul">
<li>If the offset is : DI SI or BX, the Segment used is DS.</li>
<li>If its BP or SP, then the segment is SS.</li>
</ul>
</div>
<ul class="org-ul">
<li><a id="orgb5208d9"></a>Note<br />
<div class="outline-text-5" id="text-orgb5208d9">
<p>
The values of the registers CS DS and SS are automatically initialized by the OS when launching the program. So these segments are implicit. AKA : If we want to access a specific data in memory, we just need to specify its offset. Also you can&rsquo;t write directly into the DS or CS segment registers, so something like
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">DS</span>, 0x0005 <span style="color: #6c7086;">; </span><span style="color: #6c7086;">Is INVALID</span>
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">DS</span>, AX <span style="color: #6c7086;">; </span><span style="color: #6c7086;">This one is VALID</span>
</pre>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-org660d1f4" class="outline-2">
<h2 id="org660d1f4">The ACTUAL thing :</h2>
<div class="outline-text-2" id="text-org660d1f4">
<p>
Enough technical rambling, and now we shall go to the fun part, the ACTUAL CODE. But first, some names you should be familiar with :
</p>

<ul class="org-ul">
<li><b>Mnemonics</b> : Or <b>Instructions</b>, are the&#x2026;well&#x2026;Instructions executed by the CPU like <b>MOV</b> , <b>ADD</b>, <b>MUL</b>&#x2026;etc, they are case <b>insensitive</b> but i like them better in UPPERCASE.</li>
<li><b>Operands</b> : These are the options passed to the instructions, like <b>MOV dst, src</b>, and they can be anything from a memory location, to a variable to an immediate address.</li>
</ul>
</div>
<div id="outline-container-orgf7c0650" class="outline-3">
<h3 id="orgf7c0650">Structure of an assembly program :</h3>
<div class="outline-text-3" id="text-orgf7c0650">
<p>
While there is no &ldquo;standard&rdquo; structure, i prefer to go with this one :
</p>

<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">org</span> 100h
<span style="color: #cba6f7;">.data</span>
                                <span style="color: #6c7086;">; </span><span style="color: #6c7086;">variables and constants</span>

<span style="color: #cba6f7;">.code</span>
                                <span style="color: #6c7086;">; </span><span style="color: #6c7086;">instructions</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-orgc1466ec" class="outline-3">
<h3 id="orgc1466ec">MOV dst, src</h3>
<div class="outline-text-3" id="text-orgc1466ec">
<p>
The MOV instruction copies the Second operand (src) to the First operand (dst)&#x2026; The source can be a memory location, an immediate value, a general-purpose register (AX BX CX DX). As for the Destination, it can be a general-purpose register or a memory location.
</p>


<p>
these types of operands are supported:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, memory
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">memory</span>, REG
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, REG
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">memory</span>, immediate
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, immediate
</pre>
</div>
<p>
<b>REG</b>: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
</p>

<p>
<b>memory</b>: [BX], [BX+SI+7], variable
</p>

<p>
<b>immediate</b>: 5, -24, 3Fh, 10001101b
</p>


<p>
for segment registers only these types of MOV are supported:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">SREG</span>, memory
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">memory</span>, SREG
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, SREG
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">SREG</span>, REG
<span style="color: #89b4fa;">SREG</span>: <span style="color: #cba6f7;">DS</span>, ES, SS, and only as second operand: CS.
</pre>
</div>
<p>
<b>REG</b>: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
</p>

<p>
<b>memory</b>: [BX], [BX+SI+7], variable
</p>
</div>
<div id="outline-container-org508d45b" class="outline-4">
<h4 id="org508d45b">Note : The MOV instruction <b>cannot</b> be used to set the value of the CS and IP registers</h4>
</div>
</div>
<div id="outline-container-orgb475e10" class="outline-3">
<h3 id="orgb475e10">Variables :</h3>
<div class="outline-text-3" id="text-orgb475e10">
<p>
Let&rsquo;s say you want to use a specific value multiple times in your code, do you prefer to call it using something like <b>var1</b> or <b>E4F9:0011</b> ? If your answer is the second option, you can gladly skip this section, or even better, seek therapy.
</p>

<p>
Anyways, we have two types of variables, <b>bytes</b> and <b>words(which are two bytes)</b>, and to define a variable, we use the following syntax
</p>

<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">name</span> <span style="color: #cba6f7;">DB</span> value <span style="color: #6c7086;">; </span><span style="color: #6c7086;">To Define a Byte</span>
<span style="color: #89b4fa;">name</span> <span style="color: #cba6f7;">DW</span> value <span style="color: #6c7086;">; </span><span style="color: #6c7086;">To Define a Word</span>
</pre>
</div>

<p>
<b>name</b> - can be any letter or digit combination, though it should start with a letter. It&rsquo;s possible to declare unnamed variables by not specifying the name (this variable will have an address but no name).
<b>value</b> - can be any numeric value in any supported numbering system (hexadecimal, binary, or decimal), or &ldquo;?&rdquo; symbol for variables that are not initialized.
</p>
</div>
<div id="outline-container-org3d5d0c5" class="outline-4">
<h4 id="org3d5d0c5">Example code :</h4>
<div class="outline-text-4" id="text-org3d5d0c5">
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">org</span> 100h
    <span style="color: #cba6f7;">.data</span>
    <span style="color: #cba6f7;">x</span> db <span style="color: #fab387;">33</span>
    <span style="color: #cba6f7;">y</span> dw 1350h

    <span style="color: #cba6f7;">.code</span>
    <span style="color: #cba6f7;">MOV</span> AL, x
    <span style="color: #cba6f7;">MOV</span> BX, y
</pre>
</div>
</div>
</div>
<div id="outline-container-orgd4e5244" class="outline-4">
<h4 id="orgd4e5244">Arrays :</h4>
<div class="outline-text-4" id="text-orgd4e5244">
<p>
We can also define Arrays instead of single values using comma separated vaues. like this for example
</p>
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">a</span> db 48h, 65h, 6Ch, 6Fh, 00H
    <span style="color: #cba6f7;">b</span> db 'Hello', <span style="color: #fab387;">0</span>
</pre>
</div>

<p>
Surprise Surprise, the arrays a and b are identical, the reason behind it is that characters are first converted to their ASCII values then stored in memory!!! Wonderful right ? And guess what, accessing values in assembly IS THE SAME AS IN C !!!
</p>
<div class="org-src-container">
<pre class="src src-asm">    <span style="color: #cba6f7;">MOV</span> AL, a[<span style="color: #fab387;">0</span>] <span style="color: #6c7086;">; </span><span style="color: #6c7086;">Copies 48h to AL</span>
    <span style="color: #cba6f7;">MOV</span> BL, b[<span style="color: #fab387;">0</span>] <span style="color: #6c7086;">; </span><span style="color: #6c7086;">Also Copies 48h to BL</span>
</pre>
</div>
<p>
You can also use any of the memory index registers BX, SI, DI, BP, for example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">SI</span>, <span style="color: #fab387;">3</span>
<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">AL</span>, a[SI]
</pre>
</div>

<p>
If you need to declare a large array you can use DUP operator.
The syntax for <b>DUP</b>:
</p>

<p>
number DUP ( value(s) )
<b>number</b> - number of duplicate to make (any constant value).
<b>value</b> - expression that DUP will duplicate.
</p>

<p>
for example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">c</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">5</span> DUP(<span style="color: #fab387;">9</span>)
<span style="color: #6c7086;">;</span><span style="color: #6c7086;">is an alternative way of declaring:</span>
<span style="color: #89b4fa;">c</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>
</pre>
</div>
<p>
one more example:
</p>
<div class="org-src-container">
<pre class="src src-asm"><span style="color: #89b4fa;">d</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">5</span> DUP(<span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>)
<span style="color: #6c7086;">;</span><span style="color: #6c7086;">is an alternative way of declaring:</span>
<span style="color: #89b4fa;">d</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>
</pre>
</div>
<p>
Of course, you can use DW instead of DB if it&rsquo;s required to keep values larger then 255, or smaller then -128. DW cannot be used to declare strings.
</p>
</div>
</div>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="author">Author: Crystal</p>
<p class="date">Created: 2024-03-22 Fri 14:08</p>
</div>
</body>
</html>