-->

Linux IOMMU page tables

2019-03-30 11:42发布

问题:

I've been reading about IOMMU support in Linux and have some questions regarding page tables in IOMMU:

  1. Does the IOMMU uses the CPU MMU page tables for storing VA->PA mapping?
  2. If not i.e. the virtual addresses are different then are the mappings created per device or per IOMMU unit?

I haven't looked at any driver code yet so it would be great if anyone can point me to some sample driver code.

Thanks in advance.

回答1:

Does the IOMMU uses the CPU MMU page tables for storing VA->PA mapping?

No. There are many processes in OS, and every process has its own VA->PA mapping (they all running in separate virtual address spaces).

There is physical memory, controlled by memory controller. And there are devices which want to access physical memory: CPU and external bus controller. CPU has own translation and bus controller has own.

If not i.e. the virtual addresses are different then are the mappings created per device or per IOMMU unit?

Mappings are created according to capabilities of IOMMU. Some simple IOMMU may have one global mapping for the device bus root controller (PCI-express root comples). Complex IOMMU like Intel's VT-d may have several mappings or nested translations, selected based on some per-port rules. (But two devices behind bridge typically will have same translation.)

https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt

The Intel IOMMU driver allocates a virtual address per domain. Each PCIE device has its own domain (hence protection). Devices under p2p bridges share the virtual address with all devices under the p2p bridge due to transaction id aliasing for p2p bridges.

https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt

In some systems, bus addresses are identical to CPU physical addresses, but in general they are not. IOMMUs and host bridges can produce arbitrary mappings between physical and bus addresses.

(check also picture near "Here's a picture and some examples:" in the https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt)

virtual address (X)... The virtual memory system maps X to a physical address (Y) in system RAM. The driver can use virtual address X to access the buffer, but the device itself cannot because DMA doesn't go through the CPU virtual memory system.

In some simple systems, the device can do DMA directly to physical address Y. But in many others, there is IOMMU hardware that translates DMA addresses to physical addresses, e.g., it translates Z to Y.

Check also https://events.linuxfoundation.org/sites/events/files/slides/20140429-dma.pdf (2014) and http://www.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_IOMMU.txt

And http://developer.amd.com/wordpress/media/2012/10/IOMMU-ben-yehuda.pdf paper (2012) for history of device memory remapping and IOMMU usage for virtualization.



回答2:

While osgx's answer is true with historical use of IOMMUs in kernel, shared virtual memory use cases, specially with PCIe PASID will require sharing or shadowing IOMMU and CPU page tables, such that a pointer/VA (say to a pinned buffer) can be passed directly from user space driver to the device without any dma_map related kernel services. Of course this will require new APIs for user space to be able to request SVM/shared page tables.

See https://archive.fosdem.org/2016/schedule/event/intel_svm/attachments/slides/1269/export/events/attachments/intel_svm/slides/1269/FOSDEM_2016___SVM_on_Intel_Graphics.pdf