ARM Bootloader: Disable MMU and Caches

According to some tutorials, we will disable MMU and I/D-Caches at the beginning of bootlaoder. If I understand correctly, it aims to use the physical address directly in the program, so please correct me if I'm wrong. Thank you!

Secondly, we do this to disable MMU and Caches:

mrc P15, 0, R0, C1, C0, 0

bic R0, R0, #0x00002300 @ clear bits 13, 9:8

bic R0, R0, #0x00000087 @ clear bits 7, 2:0

orr R0, R0, #0x00000002 @ set bit 2 (A) Align

orr R0, R0, #0x00001000 @ set bit 12 (I) I-Cache

mcr P15, 0, R0, C1, C0, 0

D-Cache, MMU and Data Address Alignment Fault Checking have been disabled by clear bits 2:0, but why we enable bit 2 immediately in the following instrument? To make sure this manipulation is valid?

Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?

回答1:

Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?

The MMU has settings to determine which memory regions are cacheable or not. If you do not have the mmu on but you have the data cache on (if possible) then you cannot safely talk to peripherals. if you read the uart status register for example that goes through the cache just like any other data operation, whatever that status is stays in the cache for subsequent reads until such time as that cache line is evicted and you get one more shot at the actual register. Lets say for example you have some code that polls the uart status register waiting for a character in the rx buffer. If that first read shows there is no character, that status goes in the cache, you will remain in the loop forever since you will never get to talk to the status register again you will simply get the cached copy of the register. if there was a character in there then that status also gets cached, you read the rx register, and perhaps do something, if when you come back again if the status has not been evicted from the data cache then you get the stale status which shows there is a character, you rx buffer read may or may not also be cached so you may get the stale value in the cache, you may get a stale value or whatever the peripheral does when you read and there is no new value or you might get a new value, but what you dont get in these situations is proper access to the peripheral. When the mmu is on, you use the mmu to mark the address space used by that peripheral as non-(data)-cacheable, and you dont have this problem. With the mmu off you need the data cache off for arm systems.

Leaving the I-cache on is okay because instruction fetches only read instructions...Well for a bare metal application that is okay, it helps for example if you are using a flash that has a potential for read disturb (spi or i2c flashes). The problem is this application is a bootloader, so you must take some extra care. For example your bootloader has some code at address 0x8000 that it runs through at least once, then you choose to use it as a bootloader, the bootloader might be at say address 0x10000000 allowing you to load a new program at 0x8000, this load uses data accesses so it does not go through the instruction cache. So there is a potential that the instruction cache has some or all of the code from the last time you were in the 0x8000 area, and when you branch to the bootloaded code at 0x8000 you will get either the old program from cache or a nasty mixture of old program and new program for the parts that are cached and not cached. So if your bootloader allows for the i-cache to be on, you need to invalidate the cache before branching to bootloaded code.

Lastly, if you or anyone using this bootloader wants to use jtag, then you have that same problem but worse, data cycles that do not go through the i-cache are used to write the new program to ram, when you tell the jtag debugger to then run the new program you will get 1) only the new program, 2) a mixture of the new program and old program fragments from cache 3) the old program from cache.

So d-cache is bad without an mmu because of things that are not in ram, peripherals, etc. The i-cache is a use at your own risk kind of thing which you can mitigate except for the times that jtag is used for debugging.

If you have concerns or have confirmed read-disturb in your (external) flash, then I recommend turn on the i-cache, use a tight loop to copy your application to ram, branch to the ram copy and run there, turn off the i-cache (or use at your own risk) and dont touch the flash again, certainly not heavy read accesses to small areas. A tight uart polling loop like you might have for a command line parser, is a really good place to get hit with read-disturb.

回答2:

You did not specified on which ARM you are working. Capabilities may vary from one ARM to an other (there is a huge gap between an ARM9 and an ARM Cortex A15).

In the given code, bit 2 is cleared and then set, but it does not matter, as those changes are done in R0. There is no change in the ARM behavior until the write in CP15 register (done by the instruction mcr P15, 0, R0, C1, C0, 0).

Concerning d-cache/i-cache enabling, it is only a matter of choice, there is no requirement. On the products I work on, the bootloader enables L1 I-cache, D-cache, L2 cache, and MMU (and it disables all that stuff before jumping on Linux). Be sure to follow ARM documentations about cache invalidation and memory barriers (according to your actual ARM Core) if you use cache and MMU in your bootloader.