Projects‎ > ‎


ARM Single-Cycle Processor Implementation in Digital

Due Tuesday, December 3rd at 11:59pm. Interactive Grading on Wednesday, December 4th.

NOTE: If you don't start early, you will not finish this project. It is worth 17% of your grade.
NOTE: You must work individually on this project.

For this project you are going to implement a single-cycle processor implementation of a subset of the ARMv7 instruction set using digital logic in Digital.

Your implementation should be able to execute the machine code versions of the following Project03 functions: quadratic, sum_array, find_max, fib_iter, and fib_rec.

Unlike Project05, for this project you can use any pre-built Digital component.

The Single Cycle Processor is based on the following main sub-circuits:
  • The PC (program counter) - a 32-bit register
  • Instruction memory - a sub-circuit with a ROM component for each test program.
  • The Register File - a sub-circuit that contains 15 32-bit registers
  • An ALU (arithmetic logic unit for basic calculations such as add, sub, mov) - An adder plus addition logic
  • Sign Extension Unit - a custom sub-circuit
  • Data memory - A RAM component
  • The Control Unit
  • The Data Path - busses that connect data between different sub-circuits
  • The Control Path - wires that connect the control unit to the different sub-circuits and MUXes
Your job is to construct or configure each of these components for use in your processor. You will need to implement both a Data Path and a Control Path. The Control Unit will be the most complicated sub-circuit you need to develop.

I will go over processor design and the various components you need to implement in class. You can also use the following slides from the book Digital Design and Computer Architecture: ARM Edition:

In order to prepare your Project03 programs for execution in your processor implementation you will need to do a couple of things:

1. Create standalone assembly code in which each program has an assembly main. The main function will setup the stack if needed by initializing the SP to the top of the stack in Data Memory, setup any input data and arguments, then use bl to call the target function. You can end the main code by using an infinite loop: "end: b end". Note that in addition to main, you need to remove any ".global" or ".func" directives from your assembly code. 

2. You will use the "as" program on your Raspberry Pi to create .o files. We will then use the "objdump" command to extract the binary machine code to create an input file that we can load in to the instruction memory ROM. See below for a Python script that can create the Digital friendly input files.

You will create an Instruction memory component that consists of several ROM components, where each ROM component will hold a single program.

Design and Implementation Strategy

I highly recommend that you take an incremental approach to developing your solution. Trying to implement everything first, then testing will not work well and you won't understand how the entire processor works. Instead, try to implement just enough of the processor to support a single instruction, like add or mov. Then incrementally add more to the Data Path, Control Path, and Control Unit to slowly support addition instruction. This is similar to the approach I recommended for the ARMemu project.

You need to submit your Digital circuit files GitHub.

Note for sum_array and find_max you will need to pre-populate RAM with the numbers that make up the input array. In the main function you can allocate memory on the stack and build a test array on the stack.

Implementation Requirements

  • You should put the top-level of your CPU on the "main" circuit.
  • Ensure that it is easy to see the state of the CPU (registers). That is it should be easy to see the final result of each test function and you should be able to single step through a program.
  • Make each program you support selectable via a "program selection" input. You can implement this using several ROM components and a MUX to select the appropriate instruction word.
  • Be prepared to explain all of your design and implementation choices during interactive grading.
Extra Credit (1 point each)
  • 24 hours early
  • Be creative in displaying the processor state at the top schematic level.
  • Replace all Digital components with your own components built from AND, OR, and NOT gates.
  • Implement additional instructions to get find_str() working.
  • Implement a pipelined version of your ARM subset. See the DDCA Chapter 7 Slides.
  • For the pipelined processor implement branch prediction.
  • Implement a direct mapped cache and integrate it into your single-cycle processor.
  • Implement a fully associative cache and integrate it into your single-cycle processor.
  • Implement a multi-way set associative cache and integrate it into your single-cycle processor.
  • In Digital it is possible to connect a custom processor to a remote debugger. Figure out how to use gdb to debug programs running in your processor.


import sys


hexdigits =  ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']

print "v2.0 raw"


for line in sys.stdin:

    tokens = line.split()

    if len(tokens) < 2:


    if len(tokens[1]) != 8:


    if tokens[1][0] not in hexdigits:


    print tokens[1]




On a Raspberry Pi


$ objdump -d file.o | python > file_rom.txt


Make sure you .s files DO NOT have the .global or .func directives. These will result in incorrect offset values in the branches.

Subpages (1): digital
Greg Benson,
Nov 14, 2019, 9:11 AM