There was this problem that has been asked about implementing a load byte into a single cycle datapath without having to change the data memory, and the solution was something below.
alt text http://img214.imageshack.us/img214/7107/99897101.jpg
This is actually quite a realistic question; most memory systems are entirely word-based, and individual bytes are typically only dealt with inside the processor. When you see a “bus error” on many computers, this often means that the processor tried to access a memory address that was not properly word-aligned, and the memory system raised an exception. Anyway, because byte addresses might not be a multiple of 4, we cannot pass them to memory directly. However, we can still get at any byte, because every byte can be found within some word, and all word addresses are multiples of 4. So the first thing we do is to make sure we get the right word. If we take the high 30 bits of the address (i.e., ALUresult[31-2]) and combine them with two 0 bits at the low end (this is what the “left shift 2” unit is really doing), we have the byte address of the word that contains the desired byte. This is just the byte’s own address, rounded down to a multiple of 4. This change means that lw will now also round addresses down to multiples of 4, but that’s OK since non-aligned addresses wouldn’t work for lw anyway with this memory unit. OK, now we get the data word back from memory. How do we get the byte we want out of it? Well, note that the byte’s byte-offset within the word is just given by the low-order 2 bits of the byte’s address. So, we simply use those 2 bits to select the appropriate byte out of the word using a mux. Note the use of big-endian byte numbering, as is appropriate for MIPS. Next, we have to zero-extend the byte to 32 bits (i.e., just combine it with 24 zeros at its high end), because the problem specifies to do so. Actually, this was a slight mistake in the question: in reality, the lbu instruction zero-extends the byte, but lb sign-extends it. Oh, well. Finally, we have to extend the MemtoReg-controlled mux to accept one new input: the zero-extended byte for the lb case. The MemtoReg control signal must be widened to 2 bits. The original 0 and 1 cases change to 00 and 01, respectively, and we add a new case 10 which is only used in the case of lb.
I don't quite actually understand on how this works even after reading the explanation, especially about left shift the ALU result by 2 would give the byte address... how is this possible?? so if I would like to load a half word then I would do one left shift and I would get the address of the half word?? what would be a better way to do load byte, load half word by modifying the data memory? (the question above puts constraints that we can't modify the data memory)