diff options
-rw-r--r-- | blog/asm/1.html | 260 | ||||
-rw-r--r-- | src/org/blog/assembly/1.org | 113 |
2 files changed, 337 insertions, 36 deletions
diff --git a/blog/asm/1.html b/blog/asm/1.html index fcaae55..124d81d 100644 --- a/blog/asm/1.html +++ b/blog/asm/1.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2024-03-17 Sun 21:28 --> +<!-- 2024-03-22 Fri 14:08 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>x86 Assembly from my understanding</title> @@ -23,9 +23,9 @@ <p> Soooo this article (or maybe even a series of articles, who knows ?) will be about x86 assembly, or rather, what I understood from it and my road from the bottom-up hopefully reaching a good level of understanding </p> -<div id="outline-container-orgd547ad6" class="outline-2"> -<h2 id="orgd547ad6">Memory :</h2> -<div class="outline-text-2" id="text-orgd547ad6"> +<div id="outline-container-orgb0eec26" class="outline-2"> +<h2 id="orgb0eec26">Memory :</h2> +<div class="outline-text-2" id="text-orgb0eec26"> <p> Memory is a sequence of octets (Aka 8bits) that each have a unique integer assigned to them called <b>The Effective Address (EA)</b>, in this particular CPU Architecture (the i8086), the octet is designated by a couple (A segment number, and the offset in the segment) </p> @@ -40,9 +40,9 @@ Memory is a sequence of octets (Aka 8bits) that each have a unique integer assig The offset and segment are encoded in 16bits, so they take a value between 0 and 65535 </p> </div> -<div id="outline-container-org024e482" class="outline-4"> -<h4 id="org024e482">Important :</h4> -<div class="outline-text-4" id="text-org024e482"> +<div id="outline-container-org57cd217" class="outline-4"> +<h4 id="org57cd217">Important :</h4> +<div class="outline-text-4" id="text-org57cd217"> <p> The relation between the Effective Address and the Segment & Offset is as follow : </p> @@ -52,8 +52,8 @@ The relation between the Effective Address and the Segment & Offset is as fo </p> </div> <ul class="org-ul"> -<li><a id="org6cfa3c7"></a>Example :<br /> -<div class="outline-text-5" id="text-org6cfa3c7"> +<li><a id="orgcbdf7c0"></a>Example :<br /> +<div class="outline-text-5" id="text-orgcbdf7c0"> <p> Let the Physical address (Or Effective Address, these two terms are interchangeable) <b>12345h</b> (the h refers to Hexadecimal, which can also be written like this <b>0x12345</b>), the register <b>DS = 1230h</b> and the register <b>SI = 0045h</b>, the CPU calculates the physical address by multiplying the content of the segment register <b>DS</b> by 10h (or 16) and adding the content of the register <b>SI</b>. so we get : <b>1230h x 10h + 45h = 12345h</b> </p> @@ -66,16 +66,16 @@ Now if you are a clever one ( I know you are, since you are reading this <3 ) </li> </ul> </div> -<div id="outline-container-org6b34cdf" class="outline-3"> -<h3 id="org6b34cdf">Registers</h3> -<div class="outline-text-3" id="text-org6b34cdf"> +<div id="outline-container-org758f05f" class="outline-3"> +<h3 id="org758f05f">Registers</h3> +<div class="outline-text-3" id="text-org758f05f"> <p> The 8086 CPU has 14 registers of 16bits of size. From the POV of the user, the 8086 has 3 groups of 4 registers of 16bits. One state register of 9bits and a counting program of 16bits inaccessible to the user (whatever this means). </p> </div> -<div id="outline-container-org67926ce" class="outline-4"> -<h4 id="org67926ce">General Registers</h4> -<div class="outline-text-4" id="text-org67926ce"> +<div id="outline-container-orgbb2bc8f" class="outline-4"> +<h4 id="orgbb2bc8f">General Registers</h4> +<div class="outline-text-4" id="text-orgbb2bc8f"> <p> General registers contribute to arithmetic’s and logic and addressing too. </p> @@ -126,28 +126,28 @@ Now here are the Registers we can find in this section: </div> </div> </div> -<div id="outline-container-org824a260" class="outline-3"> -<h3 id="org824a260">Addressing and registers…again</h3> -<div class="outline-text-3" id="text-org824a260"> +<div id="outline-container-org6de3be7" class="outline-3"> +<h3 id="org6de3be7">Addressing and registers…again</h3> +<div class="outline-text-3" id="text-org6de3be7"> </div> -<div id="outline-container-orgaa8f029" class="outline-4"> -<h4 id="orgaa8f029">I realized what I wrote here before was almost gibberish, sooo here we go again I guess ?</h4> -<div class="outline-text-4" id="text-orgaa8f029"> +<div id="outline-container-orgfe32dc7" class="outline-4"> +<h4 id="orgfe32dc7">I realized what I wrote here before was almost gibberish, sooo here we go again I guess ?</h4> +<div class="outline-text-4" id="text-orgfe32dc7"> <p> Well lets take a step back to the notion of effective addresses VS relative ones. </p> </div> </div> -<div id="outline-container-org85a2533" class="outline-4"> -<h4 id="org85a2533">Effective = 10h x Segment + Offset . Part1</h4> -<div class="outline-text-4" id="text-org85a2533"> +<div id="outline-container-org471cf7b" class="outline-4"> +<h4 id="org471cf7b">Effective = 10h x Segment + Offset . Part1</h4> +<div class="outline-text-4" id="text-org471cf7b"> <p> When trying to access a specific memory space, we use this annotation <b>[Segment:Offset]</b>, so for example, and assuming <b>DS = 0100h</b>. We want to write the value <b>0x0005</b> to the memory space defined by the physical address <b>1234h</b>, what do we do ? </p> </div> <ul class="org-ul"> -<li><a id="orgae0f70c"></a>Answer :<br /> -<div class="outline-text-5" id="text-orgae0f70c"> +<li><a id="orgf021c83"></a>Answer :<br /> +<div class="outline-text-5" id="text-orgf021c83"> <div class="org-src-container"> <pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> [DS:0234h], 0x0005 </pre> @@ -159,7 +159,7 @@ Why ? Let’s break it down : -<div id="orgd01a20f" class="figure"> +<div id="orgf6af9f9" class="figure"> <p><img src="../../src/gifs/lain-dance.gif" alt="lain-dance.gif" /> </p> </div> @@ -177,9 +177,9 @@ Simple, right ?, now for another example </li> </ul> </div> -<div id="outline-container-org811633c" class="outline-4"> -<h4 id="org811633c">Another example :</h4> -<div class="outline-text-4" id="text-org811633c"> +<div id="outline-container-org87609ad" class="outline-4"> +<h4 id="org87609ad">Another example :</h4> +<div class="outline-text-4" id="text-org87609ad"> <p> What if we now have this instruction ? </p> @@ -192,9 +192,9 @@ What does it do ? You might or might not be surprised that it does the exact sam </p> </div> </div> -<div id="outline-container-orge6219c5" class="outline-4"> -<h4 id="orge6219c5">Segment + Register <3</h4> -<div class="outline-text-4" id="text-orge6219c5"> +<div id="outline-container-org6254e46" class="outline-4"> +<h4 id="org6254e46">Segment + Register <3</h4> +<div class="outline-text-4" id="text-org6254e46"> <p> Consider <b>DS = 0100h</b> and <b>BX = BP = 0234h</b> and this code snippet: </p> @@ -230,8 +230,8 @@ The General rule of thumb is as follows : </ul> </div> <ul class="org-ul"> -<li><a id="orga9c3a1b"></a>Note<br /> -<div class="outline-text-5" id="text-orga9c3a1b"> +<li><a id="orgb5208d9"></a>Note<br /> +<div class="outline-text-5" id="text-orgb5208d9"> <p> The values of the registers CS DS and SS are automatically initialized by the OS when launching the program. So these segments are implicit. AKA : If we want to access a specific data in memory, we just need to specify its offset. Also you can’t write directly into the DS or CS segment registers, so something like </p> @@ -246,10 +246,198 @@ The values of the registers CS DS and SS are automatically initialized by the OS </div> </div> </div> +<div id="outline-container-org660d1f4" class="outline-2"> +<h2 id="org660d1f4">The ACTUAL thing :</h2> +<div class="outline-text-2" id="text-org660d1f4"> +<p> +Enough technical rambling, and now we shall go to the fun part, the ACTUAL CODE. But first, some names you should be familiar with : +</p> + +<ul class="org-ul"> +<li><b>Mnemonics</b> : Or <b>Instructions</b>, are the…well…Instructions executed by the CPU like <b>MOV</b> , <b>ADD</b>, <b>MUL</b>…etc, they are case <b>insensitive</b> but i like them better in UPPERCASE.</li> +<li><b>Operands</b> : These are the options passed to the instructions, like <b>MOV dst, src</b>, and they can be anything from a memory location, to a variable to an immediate address.</li> +</ul> +</div> +<div id="outline-container-orgf7c0650" class="outline-3"> +<h3 id="orgf7c0650">Structure of an assembly program :</h3> +<div class="outline-text-3" id="text-orgf7c0650"> +<p> +While there is no “standard” structure, i prefer to go with this one : +</p> + +<div class="org-src-container"> +<pre class="src src-asm"> <span style="color: #cba6f7;">org</span> 100h +<span style="color: #cba6f7;">.data</span> + <span style="color: #6c7086;">; </span><span style="color: #6c7086;">variables and constants</span> + +<span style="color: #cba6f7;">.code</span> + <span style="color: #6c7086;">; </span><span style="color: #6c7086;">instructions</span> +</pre> +</div> +</div> +</div> +<div id="outline-container-orgc1466ec" class="outline-3"> +<h3 id="orgc1466ec">MOV dst, src</h3> +<div class="outline-text-3" id="text-orgc1466ec"> +<p> +The MOV instruction copies the Second operand (src) to the First operand (dst)… The source can be a memory location, an immediate value, a general-purpose register (AX BX CX DX). As for the Destination, it can be a general-purpose register or a memory location. +</p> + + +<p> +these types of operands are supported: +</p> +<div class="org-src-container"> +<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, memory +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">memory</span>, REG +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, REG +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">memory</span>, immediate +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, immediate +</pre> +</div> +<p> +<b>REG</b>: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP. +</p> + +<p> +<b>memory</b>: [BX], [BX+SI+7], variable +</p> + +<p> +<b>immediate</b>: 5, -24, 3Fh, 10001101b +</p> + + +<p> +for segment registers only these types of MOV are supported: +</p> +<div class="org-src-container"> +<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">SREG</span>, memory +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">memory</span>, SREG +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">REG</span>, SREG +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">SREG</span>, REG +<span style="color: #89b4fa;">SREG</span>: <span style="color: #cba6f7;">DS</span>, ES, SS, and only as second operand: CS. +</pre> +</div> +<p> +<b>REG</b>: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP. +</p> + +<p> +<b>memory</b>: [BX], [BX+SI+7], variable +</p> +</div> +<div id="outline-container-org508d45b" class="outline-4"> +<h4 id="org508d45b">Note : The MOV instruction <b>cannot</b> be used to set the value of the CS and IP registers</h4> +</div> +</div> +<div id="outline-container-orgb475e10" class="outline-3"> +<h3 id="orgb475e10">Variables :</h3> +<div class="outline-text-3" id="text-orgb475e10"> +<p> +Let’s say you want to use a specific value multiple times in your code, do you prefer to call it using something like <b>var1</b> or <b>E4F9:0011</b> ? If your answer is the second option, you can gladly skip this section, or even better, seek therapy. +</p> + +<p> +Anyways, we have two types of variables, <b>bytes</b> and <b>words(which are two bytes)</b>, and to define a variable, we use the following syntax +</p> + +<div class="org-src-container"> +<pre class="src src-asm"><span style="color: #89b4fa;">name</span> <span style="color: #cba6f7;">DB</span> value <span style="color: #6c7086;">; </span><span style="color: #6c7086;">To Define a Byte</span> +<span style="color: #89b4fa;">name</span> <span style="color: #cba6f7;">DW</span> value <span style="color: #6c7086;">; </span><span style="color: #6c7086;">To Define a Word</span> +</pre> +</div> + +<p> +<b>name</b> - can be any letter or digit combination, though it should start with a letter. It’s possible to declare unnamed variables by not specifying the name (this variable will have an address but no name). +<b>value</b> - can be any numeric value in any supported numbering system (hexadecimal, binary, or decimal), or “?” symbol for variables that are not initialized. +</p> +</div> +<div id="outline-container-org3d5d0c5" class="outline-4"> +<h4 id="org3d5d0c5">Example code :</h4> +<div class="outline-text-4" id="text-org3d5d0c5"> +<div class="org-src-container"> +<pre class="src src-asm"> <span style="color: #cba6f7;">org</span> 100h + <span style="color: #cba6f7;">.data</span> + <span style="color: #cba6f7;">x</span> db <span style="color: #fab387;">33</span> + <span style="color: #cba6f7;">y</span> dw 1350h + + <span style="color: #cba6f7;">.code</span> + <span style="color: #cba6f7;">MOV</span> AL, x + <span style="color: #cba6f7;">MOV</span> BX, y +</pre> +</div> +</div> +</div> +<div id="outline-container-orgd4e5244" class="outline-4"> +<h4 id="orgd4e5244">Arrays :</h4> +<div class="outline-text-4" id="text-orgd4e5244"> +<p> +We can also define Arrays instead of single values using comma separated vaues. like this for example +</p> +<div class="org-src-container"> +<pre class="src src-asm"> <span style="color: #cba6f7;">a</span> db 48h, 65h, 6Ch, 6Fh, 00H + <span style="color: #cba6f7;">b</span> db 'Hello', <span style="color: #fab387;">0</span> +</pre> +</div> + +<p> +Surprise Surprise, the arrays a and b are identical, the reason behind it is that characters are first converted to their ASCII values then stored in memory!!! Wonderful right ? And guess what, accessing values in assembly IS THE SAME AS IN C !!! +</p> +<div class="org-src-container"> +<pre class="src src-asm"> <span style="color: #cba6f7;">MOV</span> AL, a[<span style="color: #fab387;">0</span>] <span style="color: #6c7086;">; </span><span style="color: #6c7086;">Copies 48h to AL</span> + <span style="color: #cba6f7;">MOV</span> BL, b[<span style="color: #fab387;">0</span>] <span style="color: #6c7086;">; </span><span style="color: #6c7086;">Also Copies 48h to BL</span> +</pre> +</div> +<p> +You can also use any of the memory index registers BX, SI, DI, BP, for example: +</p> +<div class="org-src-container"> +<pre class="src src-asm"><span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">SI</span>, <span style="color: #fab387;">3</span> +<span style="color: #89b4fa;">MOV</span> <span style="color: #cba6f7;">AL</span>, a[SI] +</pre> +</div> + +<p> +If you need to declare a large array you can use DUP operator. +The syntax for <b>DUP</b>: +</p> + +<p> +number DUP ( value(s) ) +<b>number</b> - number of duplicate to make (any constant value). +<b>value</b> - expression that DUP will duplicate. +</p> + +<p> +for example: +</p> +<div class="org-src-container"> +<pre class="src src-asm"><span style="color: #89b4fa;">c</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">5</span> DUP(<span style="color: #fab387;">9</span>) +<span style="color: #6c7086;">;</span><span style="color: #6c7086;">is an alternative way of declaring:</span> +<span style="color: #89b4fa;">c</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span>, <span style="color: #fab387;">9</span> +</pre> +</div> +<p> +one more example: +</p> +<div class="org-src-container"> +<pre class="src src-asm"><span style="color: #89b4fa;">d</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">5</span> DUP(<span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>) +<span style="color: #6c7086;">;</span><span style="color: #6c7086;">is an alternative way of declaring:</span> +<span style="color: #89b4fa;">d</span> <span style="color: #cba6f7;">DB</span> <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span>, <span style="color: #fab387;">1</span>, <span style="color: #fab387;">2</span> +</pre> +</div> +<p> +Of course, you can use DW instead of DB if it’s required to keep values larger then 255, or smaller then -128. DW cannot be used to declare strings. +</p> +</div> +</div> +</div> +</div> </div> <div id="postamble" class="status"> <p class="author">Author: Crystal</p> -<p class="date">Created: 2024-03-17 Sun 21:28</p> +<p class="date">Created: 2024-03-22 Fri 14:08</p> </div> </body> </html> diff --git a/src/org/blog/assembly/1.org b/src/org/blog/assembly/1.org index e268824..3fd21e4 100644 --- a/src/org/blog/assembly/1.org +++ b/src/org/blog/assembly/1.org @@ -128,3 +128,116 @@ The values of the registers CS DS and SS are automatically initialized by the OS MOV DS, 0x0005 ; Is INVALID MOV DS, AX ; This one is VALID #+END_SRC + +* The ACTUAL thing : +Enough technical rambling, and now we shall go to the fun part, the ACTUAL CODE. But first, some names you should be familiar with : + +- *Mnemonics* : Or *Instructions*, are the...well...Instructions executed by the CPU like *MOV* , *ADD*, *MUL*...etc, they are case *insensitive* but i like them better in UPPERCASE. +- *Operands* : These are the options passed to the instructions, like *MOV dst, src*, and they can be anything from a memory location, to a variable to an immediate address. + +** Structure of an assembly program : +While there is no "standard" structure, i prefer to go with this one : + +#+BEGIN_SRC asm + org 100h +.data + ; variables and constants + +.code + ; instructions +#+END_src +** MOV dst, src +The MOV instruction copies the Second operand (src) to the First operand (dst)... The source can be a memory location, an immediate value, a general-purpose register (AX BX CX DX). As for the Destination, it can be a general-purpose register or a memory location. + + +these types of operands are supported: +#+BEGIN_SRC asm +MOV REG, memory +MOV memory, REG +MOV REG, REG +MOV memory, immediate +MOV REG, immediate +#+END_SRC +*REG*: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP. + +*memory*: [BX], [BX+SI+7], variable + +*immediate*: 5, -24, 3Fh, 10001101b + + +for segment registers only these types of MOV are supported: +#+BEGIN_SRC asm +MOV SREG, memory +MOV memory, SREG +MOV REG, SREG +MOV SREG, REG +SREG: DS, ES, SS, and only as second operand: CS. +#+END_SRC +*REG*: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP. + +*memory*: [BX], [BX+SI+7], variable + +*** Note : The MOV instruction *cannot* be used to set the value of the CS and IP registers +** Variables : +Let's say you want to use a specific value multiple times in your code, do you prefer to call it using something like *var1* or *E4F9:0011* ? If your answer is the second option, you can gladly skip this section, or even better, seek therapy. + +Anyways, we have two types of variables, *bytes* and *words(which are two bytes)*, and to define a variable, we use the following syntax + +#+BEGIN_SRC asm +name DB value ; To Define a Byte +name DW value ; To Define a Word +#+END_SRC + +*name* - can be any letter or digit combination, though it should start with a letter. It's possible to declare unnamed variables by not specifying the name (this variable will have an address but no name). +*value* - can be any numeric value in any supported numbering system (hexadecimal, binary, or decimal), or "?" symbol for variables that are not initialized. + +*** Example code : +#+BEGIN_SRC asm + org 100h + .data + x db 33 + y dw 1350h + + .code + MOV AL, x + MOV BX, y +#+END_SRC + +*** Arrays : +We can also define Arrays instead of single values using comma separated vaues. like this for example +#+BEGIN_SRC asm + a db 48h, 65h, 6Ch, 6Fh, 00H + b db 'Hello', 0 +#+END_SRC + +Surprise Surprise, the arrays a and b are identical, the reason behind it is that characters are first converted to their ASCII values then stored in memory!!! Wonderful right ? And guess what, accessing values in assembly IS THE SAME AS IN C !!! +#+BEGIN_SRC asm + MOV AL, a[0] ; Copies 48h to AL + MOV BL, b[0] ; Also Copies 48h to BL +#+END_SRC +You can also use any of the memory index registers BX, SI, DI, BP, for example: +#+BEGIN_SRC asm +MOV SI, 3 +MOV AL, a[SI] +#+END_SRC + +If you need to declare a large array you can use DUP operator. +The syntax for *DUP*: + +number DUP ( value(s) ) +*number* - number of duplicate to make (any constant value). +*value* - expression that DUP will duplicate. + +for example: +#+BEGIN_SRC asm +c DB 5 DUP(9) +;is an alternative way of declaring: +c DB 9, 9, 9, 9, 9 +#+END_SRC +one more example: +#+BEGIN_SRC asm +d DB 5 DUP(1, 2) +;is an alternative way of declaring: +d DB 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 +#+END_SRC +Of course, you can use DW instead of DB if it's required to keep values larger then 255, or smaller then -128. DW cannot be used to declare strings. |