Extract data from large files excel

2019-09-09 07:39发布

问题:

I'm using Pentaho Data Integration to create a transformation from xlsx files to mysql, but I can't import data from large files with Excel 2007 xlsx(apache POI Straiming). It gives me out of memory errors.

回答1:

Did you try this option ?

Advanced settings -> Generation mode -> Less memory consumed for large excel(Event mode

(You need to check "Read excel2007 file format" first)



回答2:

I would recommend you to increase jvm memory allocation before running the transformation. By default, pentaho data integration aka kettle comes with low memory allocation which would cause issues with running ETLs involving large files. You would need to modify the -Xmx value so that it specifies a larger upper memory limit in spoon.bat accordingly.

If you are using spoon in windows and edit spoon.bat in the line show below.

if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xmx512m" "-XX:MaxPermSize=256m"

If you are using kitchen or pan, edit in those pan.bat or kitchen.bat accordingly. If you are using in linux, change in .sh files.



标签: kettle pdi