Starting from:

$35

COMP2300-Lab-2 Solved

Introduction
 CPU/MCU is built out of logic gates which are built into different components, including registers for storing memory and adders (as part of the ALU) for adding numbers together.

Today you will see how to explain to your CPU that your plan is to calculate 2+2, and see how words (i.e. numbers) in memory can be formed and interpreted as opcodes (operation codes) which will instruct your CPU to do things (e.g. add two values).

Exercise 1: 1+2
Fork the lab 2 template on gitlab them clone to your computer with Git, as you did last week in lab 1.

Your job in this first exercise is to write an assembly program which calculates 1+2 and leaves the result in register 1 (r1).

Remember from last week’s lab that you can see the values in your discoboard’s registers while debugging, in the registers pane:

 

You can set the display format for a specific register in the register view. Right click on the register, select “set number format” and then select the desired format. This will help you make sense of the value of a register.

ARM assembly syntax
This is probably the first time you’ve written any ARM assembly code, so for this course we’ve prepared a cheat sheet to help you out. It looks pretty intimidating at first - mostly because it crams a lot of information into a small space. So let’s pick one line of the cheat sheet - the sub instruction - and pick it apart.

First, the syntax column:

sub{s}<c<q {<Rd,} <Rn, <Rm {,<shift}

The first token on the line is the instruction name, and after that is the (comma-separated) argument list. Conveniently, all of our assembly instructions will have a similar format.

Anything in braces ({}) is optional, e.g. the s at the end of sub{s} means that it can be either sub or subs. Adding s to instructions will cause flags to be set by the operation - we’ll cover this later in the course.
The <c and <q parts relate to the condition codes and opcode size boxes on the second page of the cheat sheet. They are also optional and we’ll visit these later in the course.
{<Rd,} is the destination register (e.g. r3 or r11), where the result of the instruction is stored. If you do not specify a destination register, <Rn will be used instead.
<Rn, <Rm are the two operands (arguments) for the sub instruction.
Finally, the optional {,<shift} is for using the discoboard’s barrel shifter to do logical shifts. We’ll also cover this later in the course.
There are a couple of other parts of the syntax which aren’t covered in the sub instruction:

Instructions which use constant values will use decimal by default. You can prefix your values to indicate a different base: 0b for binary (e.g. 0b1101101), 0o for octal (e.g. 0o125) or 0x for hexadecimal (0xef20).
When it comes to load & store operations, square brackets [] indicate that the instruction should use the memory address in the register, e.g. [r2] tells the discoboard to “use the memory address in r2” for that instruction.
You won’t need to know all of this stuff to complete this lab, so just remember that it’s here if you need to come back to it.

The semantic column on your cheat sheet describes what the instruction does. For example, the semantic for the sub instruction is Rd(n) := Rn - Rm{shifted}, which in English translates to something like:

 in the Rd register (or Rn, if Rd was not specified) store the result of subtracting the value in the Rm register (with an optional bit-shift, if present) from the value in the Rn register.
You can probably see why we use assembly language for telling our CPU what to do rather than English - it’s much less wordy.

The flags column of the cheat sheet specifies which of the special condition code flags that instruction sets if the optional s suffix is present. (We’ll cover this in next week’s lab, but if you’re curious there’s a box on the second page of the cheat sheet which lists the flags.)

Since there is a lot of information in the ARM instruction syntax, you don’t need to memorize everything. Just keep the cheat sheet nearby and take a closer look when you need to find a specific syntax. You will be given an assembly instruction cheat-sheet in exams.

Please be aware that there are several instructions that can use different sets of arguments. For example, the sub instruction can be:

sub with <Rn, <Rm {,<shift}
sub with <Rn, #<const Both of them will do the same sub operation, but using different parameters for executing the sub. For this instruction, we can either subtract two registers, or a register and a constant value.
The task
To actually complete your “1 + 2” task, you’ll need to

get number 1 into any register
add 2 to the register value and put the result in r1
Look over the cheat sheet—which assembly instructions allow you to place a constant value into a register? There are also a number of machine instructions which will implement an addition - which one do you want, and why?

Once you’ve written a program which you think will do what you want, step through it with the debugger and make sure that the value which r1 holds at the end is actually 3.

The method you use to find “1 + 2” is only one solution to the problem. There are other ways that we can use to answer the question using different instructions. Can you think of any?

Save & make a commit now that you have finished Exercise 1. It’s a good idea to keep a version once you have completed each exercise in the lab; that’s what a version control system is for, right?

Exercise 2: reverse engineering with the memory viewer
Now we really look under the hood and leave no bit unturned. Start a debugging session with your program from the previous exercise and step through until you get to main, then leave it “hanging”—do not execute any further. It will “pause” the program execution for the moment.

Although most of the process of code execution is displayed inside VSCode, in reality, all the register and memory values are taken directly from your discoboard’s CPU.

Do you know where each of your numbers are represented in your program when it’s actually running on your discoboard?

To do that we need to be able to view the memory in the disco board. In VSCode you can look at your memory directly using the Memory View: type memory in the command palette and select the Cortex Debug view memory option. VSCode will then ask you to input the starting address and the number of bytes to read. In the example below, the starting adress used is 0x080001dc, and 512 bytes of memory have been read.

 

This might look overwhelming, but the 2D “grid” layout is pretty simple: the hex numbers down the left hand side are the base memory addresses, and the hex numbers along the top represent the “offset” of that particular byte from the base address. So, to work out the exact address of a particular byte, add the row and column values(base+offset).

You can find the bytes in memory which correspond to the instructions you wrote in your main.S file using this view. The trick is figuring out where to look—what should the starting address be? Even your humble discoboard has a lot of addressable memory. Discuss with your lab neighbor—where should you look to find your program? The value in the pc, or program counter may be a good place to start. See if you can figure out exactly what this value represents.

There’s an assembler directive called .hword which you can use to put 16-bit numbers into your program (“hword” is short for half-word, and comes from the fact that your discoboard’s CPU uses 32-bit “words”).

Modify your program to use the .hword directive to put some data into your program, so it looks something like this:

.syntax unified
.global main
.type main, %function

main:
  nop
  .hword 0xdead
  .hword 0xbeef
  b main
.size main, .-main

Note we need to add a nop instruction here for the debugger to work correctly - 0xdead isn’t a real instruction, and our CPU can get confused.

What do you think a nop instruction does? Aside from this very specific case, when do you think such an instruction might be useful?

Build & upload your program and open a memory viewer session. Use the address of your first instruction as the base address, and load enough bytes to view your entire program. Can you see the 0xdead and 0xbeef values you put into your program?

If you can’t see them exactly, can you see something which looks suspiciously like them? What do you notice?

Endianness
To make sense of the numbers displayed in the memory view, we need to talk about endianness.

Values are stored in memory as individual bytes (i.e. 8-bit numbers, which can be represented with two hex digits). Endianness refers to the order in which these small 8-bit bytes are arranged into larger numbers (e.g. 32-bit words). In the little-endian format used by our discoboards, the byte stored at the lowest address is the least significant byte(LSB). Big-endian is the opposite - the byte stored at the lowest address is the most significant byte(MSB).

Here’s an example: suppose we have the number 0x01 stored at a lower memory address (e.g. 0x000001e0), and the number 0xF1 stored at a higher memory address (0x000001e1), as shown below:

 

If we tell our CPU to read a half-word(16 bits) from the memory address 0x000001e0 under the little-endian format, it represents 0xF101 (the 0x01 at the lower address is treated as less significant). In a CPU under the big-endian format, it represents 0x01F1 (the 0x01 at the lower address is treated as more significant).

When reading four bytes from the memory, the CPU can read them as four 8-bit bytes, two 16-bit half-words, or as one 32-bit word. The endianness format applies everytime when combining bytes into bigger words. The following diagram illustrates this using the little-endian format:

 

You need to be aware of this byte ordering to make sense of the memory view.

How might you figure out on your discoboard’s Cortex-M4 CPU whether a 32-bit instruction is read as one 32-bit word or as two 16-bit half-words? (hint: have a look at A5.1 in the ARM®v7-M Architecture Reference Manual).

According to Wikipedia, Danny Cohen introduced the terms Little-Endian and Big-Endian for byte ordering in an article from 1980. In this technical and political examination of byte ordering issues, the “endian” names were drawn from Jonathan Swift’s 1726 satire, Gulliver’s Travels, in which civil war erupts over whether the big end or the little end of a boiled egg is the proper end to crack open, which is analogous to counting from the end that contains the most significant bit or the least significant bit.

Instruction encoding(s)
Now that you know how bytes fit together into words, let’s get back to the task of figuring out how the instructions in your program are encoded in memory.

To help, you can use the known half-words you put into your program earlier to help you out. Update your program like so to add a single instruction (i.e. one line of assembly code) from the your 1+2 program you wrote in Exercise 1.

.syntax unified
.global main
.type main, %function

main:
  nop
  .hword 0xdead
  @ put a single "real" assembly instruction here from your 1+2 program
  .hword 0xbeef
  b main
.size main, .-main

Build, upload and start a new debug session, then find your program again in the memory view. What does your instruction look like in memory? Try making a note of the bytes, then modify the instruction arguments (e.g. change the number, or the register you’re using) and see how the bytes change in memory (you’ll need to re-build & run your program and call the view memory command each time you do).

Discuss with your neighbour: what do you think the different bits (and bytes) in the instruction mean? How does the discoboard know what to do with them? And if you’ve figured that out, why doesn’t your program actually work as written?

To fully make sense of these instruction encodings you need more than just your cheat sheet - you need the ARM®v7-M Architecture Reference Manual. Dig to the deepest levels of the manual, by going to section A7.7 Alphabetical list of ARMv7-M Thumb instructions (page A7-184). Use the bookmarks in your pdf viewer to navigate to the relevant instructions inside this huge document.

For each instruction you will see a number of encodings. They detail bit-by-bit the different ways of specifying the machine instructions that your discoboard CPU understands. You may find this number format conversion tool helpful:

Decimal
 
Hex
 
Binary
 
Commit your “reverse engineering” program with a comment about what the instruction looks like in memory. It doesn’t matter that it doesn’t actually run at this point—you’ll get there in the next exercise.

Can you tell which specific encoding has been used for the instruction you wrote earlier in exercise 1? Note that not every encoding can express every version of the instruction, but sometimes a more complex encoding can also express what the a simpler form could have done as well. Can you hint the assembler towards the specific encoding you want?

Excercise 3: hand-crafted instructions
Now that you’ve identified the spot in memory where your instructions live, in this exercise we turn our approach around and program the CPU by writing specific numbers directly to memory locations. Where we earlier inserted 0xdead and 0xbeef, we are going to insert hex values that correspond to the machine code for real instructions.

Instead of calculating 1+2, you are going to make the CPU calculate 3-1 by putting the right numbers into memory.

In fact, you have been doing this all along, except that the assembler has helped you by converting your human-readable instructions in to their raw machine code representations. Replace your program with the following assembly code:

.syntax unified
.global main
.type main, %function

main:
  nop
  .hword 0xffff
  .hword 0xffff
  b main
.size main, .-main

This time we want you to put on your assembler hat and figure out the actual .hword values which will make the CPU load 3 into a r1 and subtract 1 from it. Remember that it’ll be similar to the words you looked at in the memory viewer earlier, but some of the bits will be different (since we’re dealing with -, 3 and 1 instead of +, 1 and 2).

You can find the architecture reference manual pages for the add instruction here, and the pages for mov here.

Note the line at the bottom:

.size main, .-main

This tells the assembler the size of the main function, and it is essential for the disassembler to work correctly. The disassembler view can be opened by typing Cortex-Debug: View Disassembly (Function) in the command palette during an active debug session. It will then ask for which function to disassemble, type the function name (e.g. main). It will look something like this:

 

Bring up the disassembler view for main—you’re now looking at the program as it will be understood by the CPU.

Looking at the disassembled code (i.e. the way the CPU will interpret the instructions) did you see what you intended? Will your new, hand-assembled program show the correct result in r1 after it has been run? If it doesn’t, what might have gone wrong?

You can now cheer as loud as your upbringing allows that you will never again have to hand-craft the bits needed to instruct the CPU to do this, and you can leave this job to the assembler from now on.

As a side effect, you also learnt something about security: your system can be compromised by injecting some data into memory (an array of numbers, a string or anything which the host system would accept) and making the CPU somehow stumble into executing it.

Discuss with your lab neighbour: after completing this exercise, how would you explain the way a CPU works to your grandma/grandpa?

Make a commit now that you’ve knocked down Exercise 3. Congrats!

Exercise 4: Boolean logic, bit vectors, and labels
Load some new data into register r1 by adding this assembly code to your program:

  @ load "COPE" into r1
  ldr r1, string

loop:
  nop
  b loop

string:
  .ascii "COPE"

This code introduces a new assembler directive: labels.

If you’re the kind of person who likes documentation, you can find it here.

In the above code there are two new labels: string and loop. A label is a way of attaching a human-readable name to a location in your program. Always remember that labels are just a name attached to a memory location - when the assembler builds your program, it will have a specific memory address which you can store in a register, do arithmetic on, etc.

While you can use just about anything for a label name, try and use something informative - as if you are naming a function or variable.

What will you see in r1 after the ldr r1, string line? Can you guess the address of string and find it in the memory viewer?

You’ll notice a new .ascii compiler directive in this code: this allows you to put data into your program using the ASCII encoding. This works the same as the .hword directive you used earlier, except for the data format. While .hword uses numbers, .ascii takes characters and encodes each one to a specific byte value. You can find a table of characters and their ascii-encoded value here.

Your goal in this exercise is to isolate and modify individual bytes within the "COPE" word:

first, change it into "HOPE" and store in r2
then, change it into "HOPS" and store in r3
Each of these steps requires isolating and manipulating one 8-bit (1-character) part of the 32-bit word without messing with the rest of it. What boolean logic or arithmetic operations can you use to modify the appropriate bits and bytes?

It might be helpful to use a piece of paper here: write out what the "COPE" data looks like in memory (remember endianness!), and figure out what operations you need to make the transformations into "HOPS".

There are several ways to do this, how many can you think of? Show your program to your neighbour or tutor to get ideas about how it could be done differently.

You might have used one (or more) large numbers in your bit manipulation adventures. How many bits is that number? Do you think it will fit into the instruction encoding? Have a look at the disassembly, and cross check it with the instruction manual. Can you make sense of what’s happening here? (Hint: this blog post might be helpful to you). Give it a crack, but we will revisit this again later in the course.

Finalise your program so that the main function performs the "COPE" - "HOPE" - "HOPS" transformation and leaves the "HOPS" value in r2. Commit & push your changes up to your repo on the GitLab server.

Summary
Congratulations! In this week’s lab you learned how to

watch what happens in memory as your program executes
use the .hword and .ascii assembler directives to insert bytes into your programs
figure out the exact sequence of bits to get the discoboard to do what you want
isolate and manipulate specific bits in a word of data
read and write numbers in different bases: decimal, binary, hex
Make sure you logout to terminate your session, and pack up your board and USB cable carefully.

More products