I have been reading Wikipedia's article on K programming language and this is what I saw:
The small size of the interpreter and compact syntax of the language makes it possible for K applications to fit entirely within the level 1 cache of the processor.
I am intrigued. How is it possible to have the whole program in L1 cache? Say, CPU has 256kb L1 cache. Say my program is way less than that and it needs a very little amount of memory (say, just for the call stack and such). Say, it doesn't need any libraries (although if a program is for an OS, it would need to include kernel32.dll or whatever). And doesn't OS automatically allocates some minimal memory for any program (well, for executable code and stack and heap)?
Thank you.
I think what they're saying is not that the entire program fits in L1 cache, but that all the code that runs most of the time fits in the L1 cache.
Yes, the OS allocates lots of other structures, but those are hit rarely enough to not matter.
Of course, this is all speculation -- I know nothing about the 'K' language.
I believe they are speaking to the advantage that the main executing code will fit in the L1 cache; regardless of the memory allocated to the program. Once the K application is loaded, if it never touches that memory then it doesn't matter if it's allocated in terms of performance (i.e. the perf benefit of being totally in L1 cache).
You confuse all the program code with the most frequently executed code.
For the interpreted languages the interpreter core is certainly among the most frequently executed code. Having most frequently executed code in cache speeds up execution the same way as having most frequently accessed data in cache does.
The key part is "most frequently" - it's not necessary to have all the code/data cached to see a significant acceleration.
The interpreter runs as a normal program managed by the OS. The interpreted program runs within the memory space of the interpreter, in the data segment. Many K programs may easily fit into the L1 cache completely, even though the entire interpreter may not. The main interpreter loop will probably fit though.