Registers are the fastest memories in a computer. So if we want to build a computer with just registers and not even caches is it possible? I think of even replacing the magnetic discs with registers although they are naturally volatile memories. Do we have some nonvolatile registers for that use? It would become so fast! I'm just wondering if that could be happen or not?
相关问题
- What uses more memory in c++? An 2 ints or 2 funct
- Memory for python.exe on Windows 7 python 32 - Num
- How to make memory allocation in MSVC C++ determin
- java.lang.OutOfMemoryError: GC overhead limit exce
- Memory usage of class instance in c# [duplicate]
相关文章
- Why are memory addresses incremented by 4 in MIPS?
- Is my heap fragmented
- Is there a way to avoid this memory error?
- How do I store a Python object in memory for use b
- Why am I not getting a stack overflow?
- How much memory does a Java object use when all it
- InputMethodManager holds reference to the tabhost
- What is the C++ equivalent of an 'allocated ob
Most of these answers address whether it would be practical. David Johnstone's also mentions the fact that a register name needs to be mentioned in each instruction that touches it. Further to this, in most modern instruction sets an instruction always has its operand registers coded in it. E.g. there's the
mov %eax, %ebx
instruction, and there's themov %eax, %ecx
instruction. It may so happen that their binary representation appears to look like:and differs only in that
dest reg
is equal to 3 rather than 2 -- but it also may not! (I haven't checked how these particular instructions are represented in 386, but I recall there are examples in that instruction set of instructions easily broken down into fields like this, and examples where they aren't.)The problem is that most interesting programs are going to want to operate on locations of information, determined at runtime. E.g. in this iteration of the loop, we want to look at byte 37; the next iteration we will be interested in byte 38, etc.
I won't prove it but I suspect that in order to get anything approaching Turing completeness, your programs would need either:
At school we had a theoretical computer with 100 registers (plus accumulator), and 10 instructions, each of which was a three digit decimal number. The first digit indicated the operation (load, save, arithmetical, jump, conditional jump, halt), and the last two the register to operate on. Many sample programs could be written for this, like the factorial function. But it soon became apparent that a static program could only operate on a fixed set of data. If you wanted to write a loop to sum the values in a list, you would need a LOAD instruction that pointed to a different input register on each iteration. This meant you would arithmetically calculate the new code for the load instruction each time, and patch the code just prior to running that instruction.
Modern GPUs have about 5MB of registers and very little caches (comparing to CPUs). So yes it is possible to have a processor with lots of registers.
But you still need a memory hierarchy (registers -> scratchpad/caches -> device memory -> CPU memory). Note also that GPUs are completly different beasts in the sense that they are build with massive parallelism goals from day one and that GPUs are not general purpose but coprocessors.
Each GPU thread eats up some registers - the whole GPU program is register allocated - resulting in thousand of threads that can execute/pause/resume in parallel. Threads are used for hiding memory latency on GPUs whereas on CPUs huge caches are used for that purpose. Think of it like Hyper-Threading pushed to the extrem.
Hot off the rouncer hardware theory plate->
If you manage to link every permutation of the address bits, to the individual words - then you could have a ram register system. imagine if you use nand to make the address groups. (in other words link the opposite of the address to the flop) One not, and youve done the addressing with wires alone + the little not switch, which could be a solenoid type coil which will not the value. then every register ors into the same output - the content pins. And only the address that was past, will get power to the output content pins.
simples.
The reason you get so little register memory is because it is incredibly expensive. This is why we have the memory hierarchy.
Registers are fast because most of the registers are connected directly to most of the functional units. While a program is loading one register, another register is feeding the ALU and yet another register is writing a result from some other functional unit.
Registers are made with logic elements such as flip-flops, so that most of the values from most of the registers are all available at the same time, all the time. This is different from a memory where only a selected address is available at any one time and only a very limited number of read ports is available. Typically, it's just one read circuit.
However this kind of implementation and interconnection is what uses up the die space on the microprocessor. When that is used up, you start adding memory for additional storage.
There have been architectures with extra banks of registers. (SPARC!)
The problem with that is registers are present inside the cpu. Since its present in the cpu, its having minimum latency. Also because its lesser in size. When you increase the size, say you consider you build one big processor with lot of transistors (flip-flops) that holds the registers, then the heat dissipation, the energy consumption, the cost, etc will be enormous. Also when the space increase, the latency also increases. So basically there isn't much difference in doing so. Its worse actually.