可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have some fortran code compiled with intel fortran compiler ifort. When I do a profile test with gprof, I get that most of the time is used in IO operations, I think finding the end of the files, but I can no find any more documentation on this:

index % time    self  children    called     name
                                                 <spontaneous>
[1]     20.6    0.07    0.00                 _IO_wfile_seekoff [1]
-----------------------------------------------
                                                 <spontaneous>
[2]     20.6    0.07    0.00                 sforcepf_ [2]
-----------------------------------------------
                                                 <spontaneous>
[3]     20.6    0.02    0.05                 _IO_wfile_underflow [3]
                0.01    0.04  258716/258717      strncmp [4]
-----------------------------------------------
                0.00    0.00       1/258717      _IO_wdefault_doallocate [15]
                0.01    0.04  258716/258717      _IO_wfile_underflow [3]
[4]     14.7    0.01    0.04  258717         strncmp [4]
                0.04    0.00 3104592/3109256     strerror_r [5]
-----------------------------------------------
                0.00    0.00    4664/3109256     __strcmp_sse42 [14]
                0.04    0.00 3104592/3109256     strncmp [4]
[5]     11.8    0.04    0.00 3109256         strerror_r [5]
-----------------------------------------------

So, the question would be, is this IO specific to Linux, or to ifort, or to fortran? I am trying to optimize this code, and have found no useful info about this terms in google.

回答1:

You write Fortran statements. The Intel Fortran compiler translates those statements into assembler including calls to system functions. For example, strncmp is an ISO C standard function to compare parts of strings. So it looks like you are writing Fortran statements to compare strings, and the Intel Fortran compiler is calling an existing function to implement the comparisons. Some of those system functions will themselves be implemented (in part) by calls to even more fundamental functions provided on your platform.

gprof is showing you the calls to the functions that it finds referred to in the products of your compilation. Most of what you see is specific to Linux I/O -- on a Windows machine the I/O would use similar functions with different names. It's possible that some of what you see is specific to the Intel compilers, that all Intel compilers use the same (Intel-created) function for some operation and that that function uses platform-specific lower-level functions.

Unless you are prepared to rewrite these low-level functions, and take the risk that you will screw them up for other programs using the same functions, then just about the only optimisation you can make is to call them less often. For example, if you have reason to think that reading past the end of a file is an expensive I/O operation, and if your program strategy is to read a file until you read past the end and then deal with the error that arises, then you may want to implement a superior program strategy. That will be easier than re-writing the low-level I/O routines which deal with the consequences of your strategy.

回答2:

Suppose you write the following in any language

loop for a long time
  write something to somewhere

and profile it with gprof.

gprof suspends sampling during IO or any other blocked state. This program does very little, period, but of the cycles it does spend, most of them are spent going in and out of the built-in library routines that start IO and wait for it to finish.

So if your program is like that, it's not surprising that that's what you see.

There's a lot more to this issue.

回答3:

Looks like you are seeing Fortran I/O operations. Formatted I/O is quite slow in ifort. If standard input/standard output redirection is used, it gets even worse; and still worse with pipes -- Intel docs specifically warn against doing it. gfortran is not nearly as bad, but still pretty slow.

Some possibilities are:

try to do as few I/O calls as possible (e.g. move them out of loops)
avoid redirection and read/write files directly instead
check blocksize, buffercount and other I/O related options in open()

If this is insufficient, and I/O is your major bottleneck, you may consider:

looking into stream I/O in ifort, it is faster, and you can do things like buffering yourself, to avoid making multiple calls. It may, however, introduce portability problems since other compilers may not support it yet or do it differently. Don't do it on standard input/output (might work in ifort, but it's undocumented, and won't work with other compilers).
using iso_c_binding to call a C function -- e.g. if you are writing to standard output, you can call puts() from libc. This is even faster and actually quite portable since it's standard, and in fact every compiler on every OS I've done it on (Win32/linux64/sparc solaris) requires (and automatically links) libc anyway; but it's rather ugly, and you have to take care of things like null-termination yourself (e.g. by writing a wrapper function), which obscures code and can induce bugs.
Don't mix any of these methods with regular I/O on the same file!!

If you are doing string comparisons explicitly in your code, these would eventually call strncmp() too. String operations are also a bit slow in ifort (although nowhere near as bad as I/O), so if you are doing A LOT of comparisons, you might gain a few seconds by calling strncmp() directly, but I would advice against that -- the gain is not that large, and again, it obscures the code.