Tc, Tw and Tz operators with PDFBox

2019-07-20 00:12发布

问题:

I tried to read an existing PDF document through PDFBox, extract the Tj operator and then change the spacing between words (Tw), characters (Tc), the horizontal spacing (Tz) in order to generate the modified document. My problem is when i edit the modified document to read the modified file structure, the values of Tc, Tw, Tz operators are changed. What is the solution to prevent this change?

let us consider this code:

public static void main(String[] args) throws IOException, COSVisitorException {
        // TODO code application logic here   
           tes= new Test1();
           tes.CreatePdf(src);
           PDDocument doc ;
           doc = PDDocument.load("doc.pdf");
           List pages = doc.getDocumentCatalog().getAllPages();  
           for (int i = 0; i < pages.size(); i++)  {
              PDPage page = (PDPage) pages.get(i);  
              PDStream contents = page.getContents();  
           COSDictionary dic= page.getCOSDictionary();
          System.out.println (dic.getCOSObject());
              PDFStreamParser parser = new PDFStreamParser(contents.getStream());
              parser.parse();  
              List tokens = parser.getTokens();  
              System.out.println(tokens);
                for (int j = 0; j < tokens.size(); j++)  
            {  
                  Object next = tokens.get(j); 
                     if (next instanceof PDFOperator)  {
                       PDFOperator op = (PDFOperator) next;  
                    // Tj and TJ are the two operators that display strings in a PDF  
                             if (op.getOperation().equals("Tj"))  
                    { 


            tokens.set(j-1, COSFloat.get("0.00416145"));
            tokens.set(j, PDFOperator.getOperator("Tc"));
            tokens.add(++j, new COSString("he"));
            tokens.add(++j, PDFOperator.getOperator("Tj"));
             tokens.add(++j, COSFloat.get("0.001611215"));
            tokens.add(++j, PDFOperator.getOperator("Tc"));
            tokens.add(++j, COSFloat.get("0.0067152"));
            tokens.add(++j, PDFOperator.getOperator("Tw"));
             tokens.add(++j, new COSString("llo w"));
             tokens.add(++j, PDFOperator.getOperator("Tj"));
             tokens.add(++j, COSFloat.get("100.001410144"));
             tokens.add(++j, PDFOperator.getOperator("Tz"));
            tokens.add(++j, new COSString("orld"));
            tokens.add(++j, PDFOperator.getOperator("Tj"));


                    }
                 }      
            }
                // now that the tokens are updated we will replace the page content stream.
            PDStream updatedStream = new PDStream(doc);  
            OutputStream out = updatedStream.createOutputStream();  
            ContentStreamWriter tokenWriter = new ContentStreamWriter(out);  
            tokenWriter.writeTokens(tokens);  
            page.setContents(updatedStream);

    }

      doc.save("a.pdf"); 
      doc.close();  
    }

This file structure is obtained as follow:
6 0 obj
<<
/Length 8 0 R
>>
stream
BT
 /F0 12 Tf
 15 385 Td
 0.0041614501 Tc
 (he) Tj
 0.0016112149 Tc
 0.0067151999 Tw
 (llo w) Tj
 100.001411438 Tz
 (orld) Tj
 ET

endstream
endobj

Best regards,