What is the difference between the kernel space and the user space? Do kernel space, kernel threads, kernel processes and kernel stack mean the same thing? Also, why do we need this differentiation?
相关问题
- Kernel oops Oops: 80000005 on arm embedded system
- Why should we check WIFEXITED after wait in order
- Linux kernel behaviour on heap overrun or stack ov
- Where is the standard kernel libraries to let kern
- What is the difference between /dev/mem, /dev/kmem
相关文章
- How do I get to see DbgPrint output from my kernel
- Enable dynamic debug for multiple files at boot
- What happens to dynamic allocated memory when call
- Is it possible to run 16 bit code in an operating
- How to arrange a Makefile to compile a kernel modu
- Difference in ABI between x86_64 Linux functions a
- What does “exposition only” mean? Why use it?
- Why are there no debug symbols in my vmlinux when
CPU rings are the most clear distinction
In x86 protected mode, the CPU is always in one of 4 rings. The Linux kernel only uses 0 and 3:
This is the most hard and fast definition of kernel vs userland.
Why Linux does not use rings 1 and 2: CPU Privilege Rings: Why rings 1 and 2 aren't used?
How is the current ring determined?
The current ring is selected by a combination of:
global descriptor table: a in-memory table of GDT entries, and each entry has a field
Privl
which encodes the ring.The LGDT instruction sets the address to the current descriptor table.
See also: http://wiki.osdev.org/Global_Descriptor_Table
the segment registers CS, DS, etc., which point to the index of an entry in the GDT.
For example,
CS = 0
means the first entry of the GDT is currently active for the executing code.What can each ring do?
The CPU chip is physically built so that:
ring 0 can do anything
ring 3 cannot run several instructions and write to several registers, most notably:
cannot change its own ring! Otherwise, it could set itself to ring 0 and rings would be useless.
In other words, cannot modify the current segment descriptor, which determines the current ring.
cannot modify the page tables: How does x86 paging work?
In other words, cannot modify the CR3 register, and paging itself prevents modification of the page tables.
This prevents one process from seeing the memory of other processes for security / ease of programming reasons.
cannot register interrupt handlers. Those are configured by writing to memory locations, which is also prevented by paging.
Handlers run in ring 0, and would break the security model.
In other words, cannot use the LGDT and LIDT instructions.
cannot do IO instructions like
in
andout
, and thus have arbitrary hardware accesses.Otherwise, for example, file permissions would be useless if any program could directly read from disk.
More precisely thanks to Michael Petch: it is actually possible for the OS to allow IO instructions on ring 3, this is actually controlled by the Task state segment.
What is not possible is for ring 3 to give itself permission to do so if it didn't have it in the first place.
Linux always disallows it. See also: Why doesn't Linux use the hardware context switch via the TSS?
How do programs and operating systems transition between rings?
when the CPU is turned on, it starts running the initial program in ring 0 (well kind of, but it is a good approximation). You can think this initial program as being the kernel (but it is normally a bootloader that then calls the kernel still in ring 0).
when a userland process wants the kernel to do something for it like write to a file, it uses an instruction that generates an interrupt such as
int 0x80
orsyscall
to signal the kernel. x86-64 Linux syscall hello world example:compile and run:
GitHub upstream.
When this happens, the CPU calls an interrupt callback handler which the kernel registered at boot time. Here is a concrete baremetal example that registers a handler and uses it.
This handler runs in ring 0, which decides if the kernel will allow this action, do the action, and restart the userland program in ring 3. x86_64
when the
exec
system call is used (or when the kernel will start/init
), the kernel prepares the registers and memory of the new userland process, then it jumps to the entry point and switches the CPU to ring 3If the program tries to do something naughty like write to a forbidden register or memory address (because of paging), the CPU also calls some kernel callback handler in ring 0.
But since the userland was naughty, the kernel might kill the process this time, or give it a warning with a signal.
When the kernel boots, it setups a hardware clock with some fixed frequency, which generates interrupts periodically.
This hardware clock generates interrupts that run ring 0, and allow it to schedule which userland processes to wake up.
This way, scheduling can happen even if the processes are not making any system calls.
What is the point of having multiple rings?
There are two major advantages of separating kernel and userland:
How to play around with it?
I've created a bare metal setup that should be a good way to manipulate rings directly: https://github.com/cirosantilli/x86-bare-metal-examples
I didn't have the patience to make a userland example unfortunately, but I did go as far as paging setup, so userland should be feasible. I'd love to see a pull request.
Alternatively, Linux kernel modules run in ring 0, so you can use them to try out privileged operations, e.g. read the control registers: How to access the control registers cr0,cr2,cr3 from a program? Getting segmentation fault
Here is a convenient QEMU + Buildroot setup to try it out without killing your host.
The downside of kernel modules is that other kthreads are running and could interfere with your experiments. But in theory you can take over all interrupt handlers with your kernel module and own the system, that would be an interesting project actually.
Negative rings
While negative rings are not actually referenced in the Intel manual, there are actually CPU modes which have further capabilities than ring 0 itself, and so are a good fit for the "negative ring" name.
One example is the hypervisor mode used in virtualization.
For further details see: https://security.stackexchange.com/questions/129098/what-is-protection-ring-1
ARM
In ARM, the rings are called Exception Levels instead, but the main ideas remain the same.
There exist 4 exception levels in ARMv8, commonly used as:
EL0: userland
EL1: kernel ("supervisor" in ARM terminology).
Entered with the
svc
instruction (SuperVisor Call), previously known asswi
before unified assembly, which is the instruction used to make Linux system calls. Hello world ARMv8 example:GitHub upstream.
Test it out with QEMU on Ubuntu 16.04:
Here is a concrete baremetal example that registers an SVC handler and does an SVC call.
EL2: hypervisors, for example Xen.
Entered with the
hvc
instruction (HyperVisor Call).A hypervisor is to an OS, what an OS is to userland.
For example, Xen allows you to run multiple OSes such as Linux or Windows on the same system at the same time, and it isolates the OSes from one another for security and ease of debug, just like Linux does for userland programs.
Hypervisors are a key part of today's cloud infrastructure: they allow multiple servers to run on a single hardware, keeping hardware usage always close to 100% and saving a lot of money.
AWS for example used Xen until 2017 when its move to KVM made the news.
EL3: yet another level. TODO example.
Entered with the
smc
instruction (Secure Mode Call)The ARMv8 Architecture Reference Model DDI 0487C.a - Chapter D1 - The AArch64 System Level Programmer's Model - Figure D1-1 illustrates this beautifully:
Note how ARM, maybe due to the benefit of hindsight, has a better naming convention for the privilege levels than x86, without the need for negative levels: 0 being the lower and 3 highest. Higher levels tend to be created more often than lower ones.
The current EL can be queried with the
MRS
instruction: what is the current execution mode/exception level, etc?ARM does not require all exception levels to be present to allow for implementations that don't need the feature to save chip area. ARMv8 "Exception levels" says:
QEMU for example defaults to EL1, but EL2 and EL3 can be enabled with command line options: qemu-system-aarch64 entering el1 when emulating a53 power up
Code snippets tested on Ubuntu 18.10.
Kernel space and user space is the separation of the privileged operating system functions and the restricted user applications. The separation is necessary to prevent user applications from ransacking your computer. It would be a bad thing if any old user program could start writing random data to your hard drive or read memory from another user program's memory space.
User space programs cannot access system resources directly so access is handled on the program's behalf by the operating system kernel. The user space programs typically make such requests of the operating system through system calls.
Kernel threads, processes, stack do not mean the same thing. They are analogous constructs for kernel space as their counterparts in user space.
Kernel Space and User Space are logical spaces.
Most of the modern processors are designed to run in different privileged mode. x86 machines can run in 4 different privileged modes.
And a particular machine instruction can be executed when in/above particular privileged mode.
Because of this design you are giving a system protection or sand-boxing the execution environment.
Kernel is a piece of code, which manages your hardware and provide system abstraction. So it needs to have access for all the machine instruction. And it is most trusted piece of software. So i should be executed with the highest privilege. And Ring level 0 is the most privileged mode. So Ring Level 0 is also called as Kernel Mode.
User Application are piece of software which comes from any third party vendor, and you can't completely trust them. Someone with malicious intent can write a code to crash your system if he had complete access to all the machine instruction. So application should be provided with access to limited set of instructions. And Ring Level 3 is the least privileged mode. So all your application run in that mode. Hence that Ring Level 3 is also called User Mode.
Note: I am not getting Ring Levels 1 and 2. They are basically modes with intermediate privilege. So may be device driver code are executed with this privilege. AFAIK, linux uses only Ring Level 0 and 3 for kernel code execution and user application respectively.
So any operation happening in kernel mode can be considered as kernel space. And any operation happening in user mode can be considered as user space.
In Linux there are two space 1st is user space and another one is kernal space. user space consist of only user application which u want to run. as the kernal service there is process management, file management, signal handling, memory management, thread management, and so many services are present there. if u run the application from the user space that appliction interact with only kernal service. and that service is interact with device driver which is present between hardware and kernal. the main benefit of kernal space and user space seperation is we can acchive a security by the virus.bcaz of all user application present in user space, and service is present in kernal space. thats why linux doesn,t affect from the virus.