Homepage

Microcontrollers — PCs — Embedded Electronics

Projects: Extending Machine Instructions

Projects Home

Workspace Pictures

ReAl Computer Projects

ReAl Computer Architecture

Site in German

Imprint

Mail

Instruction examples
In some figures, we use a simplified assembler notation related to also simplified RISC-type instruction formats. The processor’s native instruction is followed by a vertical line and the appropriate mnemonics of the extension.

Sideband effects
The instructions the processor core executes are neither modified nor tapped. The additional effects are caused by the extensions alone. Appropriate problems – to load some registers, save and restore register contents, toggle signals, inhibit interrupts temporarily, and the like – are rather frequently to solve. Conventionally, it is done via general-purpose I/O (GPIO) signals. To assert and deassert them requires particular I/O instructions. When implementing our design ideas, however, it would be a sideband effect. Machine instructions may be accompanied by microinstructions acting independently.

Disabling interrupts temporarily
Sometimes there are sequences of instructions that must be not interrupted. A straightforward example is the generation of a pulse by setting an output bit, waiting the required time, and finally clearing the bit. If this sequence is interrupted, the width of the pulse can increase unpredictably. Conventionally, we would disable the interrupts by a DI instruction and enable them by an EI instruction after the pulse has been generated. Our alternative is a bit position in the microinstructions that can disable the interrupts temporarily without impeding the interrupt control exerted by the interrupt enable flag (IF) in the processor’s flag register..

Loading registers
Conventionally, application-specific registers will be loaded via I/O instructions. Here we solve this problem by accompanying microinstructions. Typical registers to be loaded this way are backup registers, capture registers, assembly registers (for outputs wider than a machine word), history buffers (for debugging and error handling), and the like. The advantage of this principle is that loading such registers does not require additional clock cycles and hence does not affect the real-time behavior of the machine. In our example, the content of a working register is to be saved into a backup register by a SAVE micro-order and restored by a RESTORE micro-order. WRK_DI/DO symbolize the input and output signals of the register flip-flops by which the working register is connected to the ambient circuitry.

Micro-orders cause various registers to be loaded. SENSE captures data from the outside world (a), and EMIT causes the contents of various output registers to appear in the outside world at once (b). CKPT (Checkpoint) loads signals to be saved for debugging or error handling into a log-out register or history buffer, respectively (c). In this example, it is read out serially.

Emit ouput data
The data to be output here are literals. Such never-changing data could be embedded in our microinstructions. OPC = the extension’s opcode; SELECT = output register selection; EMIT = output the literal. In the instruction example, the literal 0x1234 into the output register out_reg2 while a RISC-type ADD instruction is executed.

Concurrent output
Data moved between processor and memory are diverted as output data. They may come form the memory or the processor core. Memory contents are tapped during read cycles, immediate values (literals) during instruction-fetch cycles, and data out of the processor during write cycles.

The microinstruction taps the data the processor reads out from or writes into the memory, respectively. OPC = the extension’s opcode; SELECT = output register selection; DAOR = Data Output while Reading; DAOW = Data Output while Writing. In instruction example (a), a load instruction causes the addressed memory content to be loaded into the output register out_reg1 too. In instruction example (b), a store instruction causes the data to be stored to be loaded into the output register out_reg2 too.

The microinstruction taps the immediate value (literal) contained in the instruction. Because the literal appears during instruction fetch, it must be buffered. OPC = the extension’s opcode; SELECT = output register selection ; DAOI = Data Output Immedate. In the instruction example, a load immediate instruction causes the immediate value (literal) to be loaded into the output register out_reg1 too.

Even the memory address could be output
In a read cycle, the address will be tapped and the read-in data ignored. In a write cycle, address and data may be output both, provided the writing into the memory can be inhibited. To mention a historical example, an 8-bit processor may output 16 or even 24 bits at once this way. A 32-bit processor could output up to 64 bits.

The micronstruction taps the memory address too. OPC = the extension’s opcode; SELECT = output register selection ; ADOR = Address Output while Reading; ADOW = Address Output while Writing; DADOW = Data and Address Output while Writing. In both instruction examples (a) and (b), a load or store instruction causes the memory address loaded into an output register. In the instruction example (c), a store instruction causes the the address and the data to be stored to be loaded into two output registers, addressed by out_reg3.

Input by injecting data
The extensions we have described above could be characterized as passive or sideband operations, because the interface between memory and processor core is not touched at all or only tapped. To support input operations, however, we must inject data from outside into the data paths. In our block diagrams, injecting input data is illustrated by data selectors or 2-to-1 multiplexers – exactly how it must be done in typical FPGA implementations, where tri-state buses cannot be implemented. In a legacy implementation based on a conventional microprocessor, tri-state drivers would be used instead.

Input data are to be injected during a read cycle if they are to be delivered to the processor, or during a write cycle if they are to be written into the memory. Data coming from outside must be synchronized before being fed into a data path.

Input data are read into the processor core. OPC = the extension’s opcode; SELECT = output register selection; DAIR = Data Input while Reading. In the instruction example, during the data access of a load instruction, the content of the input register in_reg1 will be injected into the data path and thus loaded into the destination register R1 instead of the addressed memory content.

Input data are written into the memory. OPC = the extension’s opcode; SELECT = output register selection; DAIW = Data Input while Writing. In the instruction example, during the data access of a write instruction, the content of the input register in_reg1 will be injected into the data path and thus written into the memory instead of the processor’s register content.

Executing instructions conditionally
In computer architecture, the principle is known as predication. Instructions are addressed and fetched sequentially. Predicates decide whether a fetched instruction is executed or not. We intervene during instruction fetch and let the instruction either pass unmodified or substitute it with a no-operation (NOP) instruction code.

Our predicates are conditions selected from the outside world. In contrast to some well-known architectures, we are not limited to the content of a predicate register or a few condition bits but can select any number of predicates from arbitrary sources. A branch on condition is programmed by an unconditional jump accompanied by the appropriate extension.

The details of the implementation are somewhat tricky. The extension may belong to the instruction that is to be executed conditionally (as shown in Figure 13). Because the condition can be selected no more early than at the beginning of the instruction fetch cycle, it could be necessary to insert a wait state. Alternatively, the extension could accompany the previous instruction. Then we have to ensure that interrupts between these two instructions are inhibited.

If the condition is not satisfied, the processor core will receive a NOP instead of the instruction and thus skip it. OPC = the extension’s opcode; SAMPLE = capture the conditions; CNDSEL = condition selection; CPL = complement (invert) the condition. In the instruction example, a jump to an error-handling routine will be skipped if no parity check has been detected. If more than one instruction is to be executed conditionally, the condition must be captured at the beginning of this instruction block and then kept. To this end, latching (sampling) the conditions is controlled by the SAMPLE microinstruction bit.

Injecting NOPs and other instructions
The design idea to feed the processor with a NOP instruction instead of the instruction it has addressed could be applied beyond the conditional execution of the instruction. The basic idea is to inject something other during the instruction fetch phase.

Outputting instructions instead of executing them
When we inject NOPs, the instructions read out of the memory could be tapped for other purposes. In memory locations accompanied by appropriate extensions, arbitrary content could be stored. The stored words could be special-purpose instructions controlling an accelerator or merely immediate data to be output. Thus it is possible to use the access width of the program memory and the instruction fetch cycles for output purposes. Consecutive instruction fetches – without data accesses in between – are, concerning data rate, often by far superior to programmed output loops.

Instructions extended this way never make it into the processor. It will receive NOPs instead. So the memory may contain arbitrary bit patterns. They may serve as special instructions (a) of an accelerator or a peripheral control unit or may be output immediately (b).

Injecting modified or completely other instructions
Instead of NOPs, individual bits, bit fields, or complete instructions could be injected. Today, it is doubtless not appropriate to implement application-specific circuits this way. Instead of beefing up a processor with such tricks, we will simply choose a more powerful model.

The EXEC instruction
A sometimes useful application, however, could be to extend the processor's instruction set by an EXECUTE instruction. Such an instruction causes a memory or register content to be executed as an instruction. If this instruction causes a branch, the program continues in the direction of the branch. Otherwise, the instruction following the EXECUTE instruction is executed. EXECUTE may be thought of as a subroutine consisting of only one instruction. In the early days of computer development, it was common practice to modify instructions in the application program or to create them on the fly. For some time, however, so-called pure procedures are preferred. These are programs that may not be changed during execution. Here, the EXECUTE instruction is some kind of backdoor. This way, you may create your own instructions even in pure procedures. Sometimes, this may come in handy to speed up program sequences or circumvent shortcomings of the architecture.

Modifying or substituting instructions from outside. Bits or bit fields of the instruction may be injected (a). Think, for example, of an address field or a literal value set or modified according to external conditions. Complete instructions could also be injected (b). Here a program-accessible register is shown to be loaded with an instruction that has been assembled by the application program. Injecting such an instruction is equivalent to the EXECUTE instruction provided in some legacy architectures. We could also think of supplying a complete instruction from outside, for example, from another processor in a multiprocessor system.

Attaching an accelerator
A conventional accelerator is operated as some kind of an I/O device, thus showing a considerable software overhead. How to attach accelerators, see, for example, [9] to [12].

The accelerator is an autonomous device with program-accessible registers at the inputs and outputs. First, the program running in the processor loads the operands into the input registers and starts the operation to be executed. The processor waits for the accelerator to finish. Then it fetches the results. One parameter of an I/O instruction (IN, OUT) addresses a processor register, the other is an I/O address selecting a register in the accelerator.

In our alternative solution, machine instructions are extended outside the processor so that they act like microinstructions. The accelerator is operated by extensions. They accompany instructions that provide the operands, select and initiate the operation and fetch the result.

The processor reads the operands from memory and loads them into its registers. At the same time, they are tapped to be loaded into the accelerator’s registers. In our example, the operation will start immediately after the last parameter has been entered. Until the result is available, the processor will be held in a wait state. The last of the extended instructions will fetch the result and load it into a processor register. Thus the software overhead typical of such additional circuitry is eliminated. The accelerator behaves as if it were an inherent part of the processor core instead of some kind of an afterthought.

Supporting an unlimited number of breakpoints
Additional memory bits support an unlimited number of breakpoints. During normal operation, the additional memory may serve as an error-checking memory (parity or ECC).

A bit position in the additional memory causes an address-compare event that triggers an interrupt. If the entire memory is extended this way, you can set any number of breakpoints, up to single-stepping through the instructions. In contrast, the typical built-in breakpoint provisions of microcontrollers support only a few breakpoints (for example, four).

To establish debugging mode, clear the control register and read and rewrite the whole memory content. To set a breakpoint, set INJECT and INJECT HI and read from and write to the desired address. Clear the INJECT bits, set ENABLE TRACE and the desired trace conditions.
To return to normal operation, clear ENABLE TRACE, set ENABLE PARITY and ENABLE INJECT and read and rewrite the whole memory content. Then set ENABLE ERROR SIGNALIZATION.

This is a single-board computer (SBC) out of the Eighties. It features an 9th memory bit, serving as parity or address compare stop bit. Details in [4] and [5]. The arrow points to the 32k • 9 bits DRAM memory, populated by 18 DRAMs of 16 kbits.

The so-called SBC frame, displayed on a CRT. It allows for viewing and altering the content of the processor’s registers, setting up compare stop modes, and single-stepping through the program. Menu items are selected via cursor keys (no mouse in those bygone times). A selected menu item is displayed inversely (dark characters in a white rectangle). In fields filled with zeros, hexadecimal numbers may be entered.

Suitable processor cores
A processor core is considered to be suitable if it executes the instructions as they are fetched out of the memory. Processor cores with internal instruction buffering (such as the venerable 8086) or built-in, inaccessible instruction caches, with deep pipelines and speculative instruction execution are out of the question. Such machines are anyway not particularly well-suited when it comes to interacting with the outside world.

Our design ideas have been proven with conventional 8-bit microprocessors. The Zilog Z80, Rockwell 6502, Motrola 6800, and Intel 8051 are historical examples. For Z80-based solutions see [4] to [8]. It should pose no particular difficulty to adapt soft cores like MicroBlaze or NIOS appropriately. With an ARM, MIPS, RISC V, and similar architectures we should be able to implement our proposals provided we choose a suitable processor core. Maybe even high-performance cores could be adapted by appropriately loading page attribute tables, memory type range registers, and the like. In this respect, programs using our extensions can be likened to device drivers controlling the physical I/O circuitry.

Accessing the memory
We don't want to interfere with the addressing. We will only tap the data paths (and occasionally, the address paths too). A few of our extensions may require inserting wait states. Others require writing into the memory to be inhibited and particular error signals ignored.When data or address paths are only tapped, the processor core will not be affected. When somewhat is injected into the data path, our extensions will appear only as a somewhat slower memory. If required, a wait state is to be inserted. Nevertheless, the extended instruction will be faster than a sequence of instructions that would be required otherwise to produce the same effect.

Extending a conventional microprocessor system
When reading instructions, the control storage is addressed in the same way as the conventional memory. The extension – principally an additional microinstruction – is loaded into the control storage data register (CSDR). The extension control circuitry energizes control signals to load data or addresses into output registers and feed register contents or literals to the data bus.

1	Conventional memories
2	Control storage and control storage data register. Here the control storage is shown as a ROM. In practice, it is often a RAM that can be loaded in a particular access mode.
3	Extension control circuitry.
4	Bidirectional data bus buffer. Disconnects the conventional memorioes from the data bus when instruction or input data are to be injected.
5	Sideband output. The output data are literals in the microinstruction or come out of the extension control circuitry.
6	Address output. The data address of the current instruction is used as a bit pattern to be output.
7	Data output. The bit pattern on the data bus is output. It may come out of the processor or out of the memory.
8	Inject input data. If injected in a read cycle, they will be read by the processor, if injected in a write cycle, they will be stored.
9	Inject an instruction. An instruction from outside is fed to the processor instead of the instruction read out of the memory. A typical example is a NOP instruction causing the fetched instruction to be skipped.