i'm currently working with Excel files (*.xlsm) and Apache POI , and i have been cracking my head over a task. I receive some excel files that have PDFs embedded in it and i want to extract them and rename them based on the row and column they are in. This seems weird as i know the embedded objects are represented as images ,they can occupy more than one cell and technically they are not "In" the cell.
The following code snippet lets me extract the embedded PDFs but they are named OleObject[1..2..3.etc..] wich doesnt give me any clue.
inStream = new FileInputStream(file);
XSSFWorkbook workbook = new XSSFWorkbook(inStream);
for (PackagePart pPart : workbook.getAllEmbedds()) {
String contentType = pPart.getContentType();
if (contentType.equals("application/vnd.openxmlformats-officedocument.oleObject")){
POIFSFileSystem fs = new POIFSFileSystem(pPart.getInputStream());
TikaInputStream stream = TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));
byte[] bytes = IOUtil.toByteArray(stream);
stream.close();
OutputStream outStream = new FileOutputStream(new File(ROOT_DIRECTORY.getAbsolutePath()+"\\PDF"+i+".pdf"));
IOUtil.copy(bytes, outStream);
outStream.close();
}}
I wanted to know if org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet will let me see the xml code of the excell sheet and maybe eith taht i can get the info i need. Like this.
<oleObjects><mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"><mc:Choice Requires="x14"><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"><objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr></oleObject></mc:Choice><mc:Fallback><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"/></mc:Fallback></mc:AlternateContent></oleObjects>
--
<objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr>
I guess using the anchor information would be possible but im just unable to find how to get it.
Hope this information makes things clear on what im trying to do .
Thanks in advance.