Access PCI memory BAR with low latency (Linux)

2019-09-02 16:55发布

Background:

I have a PCI card, which is basically a clock. It gets the time by GPS and saves the current time in a certain register.

Goal:

I want to read a limited number of registers/bytes (for example the current time) over and over again, with the lowest possible latency. (The clock provides very high precision and I think I will loose precision the higher the latency is.). The operating system is RedHat. The programming language is C/C++. I also want to write to the device memory, whereby latency is not an issue.

Possible Ways to go:

I see these ways. If you see another, please tell me:

  1. Writing a Linux kernel module driver, which creates a character device (or one character device for each register to read). Then a user space application can do a "read" on the /dev/ file(s).
  2. DMA
  3. mmap the sysfs resourceX file to user space by a user space application (systemcall). (like here for example)
  4. Write a Linux kernel module driver which implements a mmap file operation.

Questions:

  1. Which is the way with the lowest latency when it comes to the actual reading of the register? I am aware that mmap causes a lot of overhead in the kernel, but as far as I understand that is only for initialisation.
  2. Is way 3 a legit way to go? It looks like a hack to me. How can I determine the /sys/ path automatically from the application?
  3. Is there a difference between way 3 and 4? I am new to PCI driver programming and I think I didn't really understand how way 4 works. I read this (and other chapters of that book), but maybe you can give me a hint or an example. I would appreciate that.

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-09-02 17:25

Method 3 or 4 should work fine. There’s no difference between them with respect to latency. Latency would be in the order of 100 ns.

Method 4 would be needed if you need to initialize the device, or control which applications are allowed to access it, or enforce one reader at a time, etc. Method 3 does seem like a bit of a hack because it skips all of this. But it is simpler if you don’t need such things.

A character device is definitely higher latency, because it requires a kernel transition each time the device is read.

The latency of a DMA method depends entirely on how frequently the device writes the time to memory. It is lower latency for the CPU to access memory than MMIO, but if the device only does DMA once a millisecond, then that would be your latency. Also, that method generates a lot of useless DMA traffic, since the CPU would read the value far less often than it is written.

查看更多
我命由我不由天
3楼-- · 2019-09-02 17:33

Adding to @prl's answer...

Method 3 seems perfectly legit to me. That's what it's for. You may want to take a look at the kernel documentation file: https://www.kernel.org/doc/Documentation/filesystems/sysfs-pci.txt

You can also use the /sys filesystem to find your device. First, note the vendor ID and device ID for your clock card (and subsystem vendor / device if necessary), then you can easily walk the /sys/devices hierarchy, looking for a matching device (using the vendor, device, etc. special files). Once you've found it, you presumably know which resourceN file to open from the device's data sheet, then mmap it at the appropriate offset and you're done.

That all assumes that your device is configured and enabled already. Typically a PCI device is not enabled to do anything when the system boots. Some driver needs to claim the device, and initialize / configure it. Once that is done, if the time is accessible just by reading a register or two, you can can go with method 3. (I'm not sure: it may be possible for a PCI device to be self-initializing but I've never seen one. I think probably something needs to enable its memory space at the very least. Likely that could be done from user-space if the setup is small enough / simple enough.)

The primary difference with method 4 is that the driver controlling the device would provide support for allowing the area to be mmap'd explicitly. For the user-space application, there is little difference between the two methods aside from the device name used. For method 4, the driver's probably going to provide a symbolic device name /dev/clock0 or something like that for use by the user-space application (and presumably the application then doesn't need to go find the device, it would just know the device file name to open).

From user-space, you will do the mmap operation in much the same way with either method. In method 4, the driver internally supplies the physical address to map -- and possibly the offset -- instead of the generic PCI subsystem doing so, but either way, it's just open + mmap.

Linux driver programming is not terribly difficult, but there's a significant learning curve there if you haven't done it before, so I definitely wouldn't go with method 4 unless there were a real need to do so.

查看更多
登录 后发表回答