Sunday, July 29, 2012

What is the difference between Von Neumann Architecture and Harvard Architecture ?

The Von Neumann and the Harvard Architecture is one important concept introduced in the basics of Computer Organization a subject which is included in the Engineering curriculum in 3rd semester for Information Technology Engineering and 4th semester for Computer Engineering in  Pune University.

 Today let us have a deeper look into both the architectures and then discuss the difference between the two.

The Von Neumann Architecture

In the von Neumann Architecture, the computer consisted of a CPU, memory and I/O devices. The program is stored in the memory. The CPU fetches an instruction from the memory at a time and executes it.
Thus, the instructions are executed sequentially which is a slow process. Neumann m/c are called control flow computer because instruction are executed sequentially as controlled by a program counter. To increase the speed, parallel processing of computer have been developed in which serial CPU’s are connected in parallel to solve a problem. Even in parallel computers, the basic building blocks are Neumann processors.
The von Neumann architecture is a design model for a stored-program digital computer that uses a processing unit and a single separate storage structure to hold both instructions and data. It is named after mathematician and early computer scientist John von Neumann. Such a computer implements a universal Turing machine, and the common "referential model" of specifying sequential architectures, in contrast with parallel architectures.
One shared memory for instructions (program) and data with one data bus and one address bus between processor and memory. Instructions and data have to be fetched in sequential order (known as the Von Neuman Bottleneck), limiting the operation bandwidth. Its design is simpler than that of the Harvard architecture. It is mostly used to interface to external memory.

Harvard Architecture

The term originated from the Harvard Mark 1 relay-based computer, which stored instructions on punched tape and data in relay latches.

Harvard Architecture: The Harvard architecture uses physically separate memories for their instructions and data, requiring dedicated buses for each of them. Instructions and operands can therefore be fetched simultaneously.
Different program and data bus widths are possible, allowing program and data memory to be better optimized to the architectural requirements. E.g.: If the instruction format requires 14 bits then program bus and memory can be made 14-bit wide, while the data bus and data memory remain 8-bit wide.

Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape (24 bits wide) and data in electro-mechanical counters (23 digits wide). These early machines had limited data storage, entirely contained within the data processing unit, and provided no access to the instruction storage as data, making loading and modifying programs an entirely offline process.

Memory details

In a Harvard architecture, there is no need to make the two memories share characteristics. In particular, the word width, timing, implementation technology, and  memory address structure can differ. Instruction memory is often wider than data memory. In some systems, instructions can be stored in read-only memory while data  memory generally requires read-write memory. In some systems, there is much more instruction memory than data memory so instruction addresses are much wider than data addresses.

Contrast with other computer architectures

In a computer with the contrasting von Neumann architecture (and no cache), the CPU  can be either reading an instruction or reading/writing data from/to the memory. Both  cannot occur at the same time since the instructions and data use the same bus system. In a  computer using the Harvard architecture, the CPU can both read an instruction  and erform a data memory access at the same time, even without a cache. A Harvard architecture computer can thus be faster for a given circuit complexity because instruction fetches and data access do not contend for a single memory pathway.

The Modified Harvard architecture is very much like the Harvard architecture but provides a pathway between the instruction memory and the CPU that allows words  from the instruction memory to be treated as read-only data. This allows constant data, particularly text strings, to be accessed without first having to be copied into data memory, thus preserving more data memory for read/write variables. Special machine language instructions are provided to read data from the instruction memory. Most modern computers that are documented as Harvard Architecture are, in fact, Modified
Harvard Architecture.

In recent years the speed of the CPU has grown many times in comparison to the access speed of the main memory. Care needs to be taken to reduce the number of times main memory is accessed in order to maintain performance. If, for instance, every instruction run in the CPU requires an access to memory, the computer gains nothing for increased CPU speed — a problem referred to as being memory bound.

It is possible to make extremely fast memory but this is only practical for small amounts of memory for both cost and signal routing reasons. The solution is to provide a small amount of very fast memory known as a CPU cache which holds recently accessed data. As long as the memory that the CPU needs is in the cache, the performance hit is much smaller than it is when the cache has to turn around and get the data from the main memory. Cache tuning is an important aspect of computer  design.

Modern high performance CPU chip designs incorporate aspects of both Harvard and  von Neumann architecture. On-chip cache memory is divided into an instruction cache and a data cache. Harvard architecture is used as the CPU accesses the cache. In the case of a cache miss, however, the data is retrieved from the main memory, which is  not divided into separate instruction and data sections. Thus, while a von Neumann architecture is presented to the programmer, the hardware implementation gains the efficiencies of the Harvard architecture.

Harvard architectures are also frequently used in:
  • Specialized digital signal processors, DSPs, commonly used in audio or video processing products. For example, Blackfin processors by Analog Devices, Inc. use a Harvard architecture.
  •  Most general purpose small microcontrollers used in many electronics applications, such as the PIC by Microchip Technology, Inc., and AVR by Atmel Corp. These processors are characterized by having small amounts of program and data memory, and take advantage of the Harvard architecture and reduced instruction sets (RISC) to ensure that most instructions can be executed within only one machine cycle, which is not necessarily one clock cycle. The separate storage means the program and data memories can have different bit depths.

PICs have an 8-bit data word but (depending on specific range of PICs) a 12-, 14-, or 16-bit program word. This allows a single instruction to contain a full-size data constant. Other RISC architectures, for example the ARM, typically must use at least two instructions to load a full-size constant.

What is the difference between a von Neumann architecture and a Harvard architecture?

Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both busses. A von Neumann architecture has only one bus which is used for both data transfers and instruction fetches, and therefore data transfers and instruction fetches must be scheduled - they can not be performed at the same time.
It is possible to have two separate memory systems for a Harvard architecture. As long as data and instructions can be fed in at the same time, then it doesn't matter whether it comes from a cache or memory. But there are problems with this. Compilers generally embed data (literal pools) within the code, and it is often also necessary to be able to write to the instruction memory space, for example in the case of self modifying code, or, if an ARM debugger is used, to set software breakpoints in memory. If there are two completely separate, isolated memory systems, this is not possible. There must be some kind of bridge between the memory systems to allow this.
Using a simple, unified memory system together with a Harvard architecture is highly inefficient. Unless it is possible to feed data into both busses at the same time, it might be better to use a von Neumann architecture processor.
Use of caches
At higher clock speeds, caches are useful as the memory speed is proportionally slower. Harvard architectures tend to be targeted at higher performance systems, and so caches are nearly always used in such systems.
Von Neumann architectures usually have a single unified cache, which stores both instructions and data. The proportion of each in the cache is variable, which may be a good thing. It would in principle be possible to have separate instruction and data caches, storing data and instructions separately. This probably would not be very useful as it would only be possible to ever access one cache at a time.
Caches for Harvard architectures are very useful. Such a system would have separate caches for each bus. Trying to use a shared cache on a Harvard architecture would be very inefficient since then only one bus can be fed at a time. Having two caches means it is possible to feed both buses simultaneously....exactly what is necessary for a Harvard architecture.
This also allows to have a very simple unified memory system, using the same address space for both instructions and data. This gets around the problem of literal pools and self modifying code. What it does mean, however, is that when starting with empty caches, it is necessary to fetch instructions and data from the single memory system, at the same time. Obviously, two memory accesses are needed therefore before the core has all the data needed. This performance will be no better than a von Neumann architecture. However, as the caches fill up, it is much more likely that the instruction or data value has already been cached, and so only one of the two has to be fetched from memory. The other can be supplied directly from the cache with no additional delay. The best performance is achieved when both instructions and data are supplied by the caches, with no need to access external memory at all.
This is the most sensible compromise and the architecture used by ARMs Harvard processor cores. Two separate memory systems can perform better, but would be difficult to implement.




Post a Comment