Error while parsing a very large (10 GB) XML file

2019-09-13 20:16发布

Context
I'm currently working on a project involving osm data (Open Street Map). In order to manipulate geographic objects, I have to convert the data (an osm xml file) into an object. The osmar package lets me do this, but it fails to parse the raw xml data.

The error

Error in paste(file, collapse = "\n") : result would exceed 2^31-1 bytes

The code

require(osmar)
osmar_obj <- get_osm("anything", source = osmsource_file("my filename"))

Inside the get_osm function, the code calls ret <- xmlParse(raw), which triggers the error after a few seconds.

The question
How am I supposed to read a large XML file (here 10GB), knowing that I have 64G of memory ?

Thanks a lot !

1条回答
叼着烟拽天下
2楼-- · 2019-09-13 20:39

This is the solution I came up with, even though it is not 100% satisfying.

  1. Transform the .osm file by removing every newline (but the last) in your shell
  2. Run the exact same code as before, skipping the paste that is not needed anymore (since you just did the equivalent in shell)

Profit :)

Obviously, I'm not very happy with it because modifying the data file in shell is more a trick that an actual solution :(

查看更多
登录 后发表回答