Fortran: How do I allocate arrays when reading a f

2020-05-05 04:01发布

My typical use of Fortran begins with reading in a file of unknown size (usually 5-100MB). My current approach to array allocation involves reading the file twice. First to determine the size of the problem (to allocate arrays) and a second time to read the data into those arrays.

Are there better approaches to size determination/array allocation? I just read about automatic array allocation (example below) in another post that seemed much easier.

array = [array,new_data]

What are all the options and their pros and cons?

1条回答
霸刀☆藐视天下
2楼-- · 2020-05-05 04:53

I'll bite, though the question is teetering close to off-topicality. Your options are:

  1. Read the file once to get the array size, allocate, read again.
  2. Read piece-by-piece, (re-)allocating as you go. Choose the size of piece to read as you wish (or, perhaps, as you think is likely to be most speedy for your case).
  3. Always, always, work with files which contain metadata to tell an interested program how much data there is; for example a block header line telling you how many data elements are in the next block.

Option 3 is the best by far. A little extra thought, and about one whole line of code, at the beginning of a project and so much wasted time and effort saved down the line. You don't have to jump on HDF5 or a similar heavyweight file design method, just adopt enough discipline to last the useful life of the contents of the file. For iteration-by-iteration dumps from your simulation of the universe, a home-brewed approach will do (be honest, you're the only person who's ever going to look at them). For data gathered at an approximate cost of $1M per TB (satellite observations, offshore seismic traces, etc) then HDF5 or something similar.

Option 1 is fine too. It's not like you have to wait for the tapes to rewind between reads any more. (Well, some do, but they're in a niche these days, and a de-archiving system will often move files from tape to disk if they're to be used.)

Option 2 is a faff. It may also be the worst performing but on all but the largest files the worst performance may be within a nano-century of the best. If that's important to you then check it out.

If you want quantification of my opinions run your own experiments on your files on your hardware.

PS I haven't really got a clue how much it costs to get 1TB of satellite or seismic data, it's a factoid invented to support an argument.

查看更多
登录 后发表回答