Replace data in a PDF file

2019-09-09 11:20发布

问题:

I have to replace string between << and >>. However I'm unable to do so.

public void doIt( String inputFile, String outputFile) throws IOException, COSVisitorException
{

    PDDocument doc = null;
    try
    {
        doc = PDDocument.load( inputFile );
        List pages = doc.getDocumentCatalog().getAllPages();
        for( int i=0; i<pages.size(); i++ )
        {
            PDPage page = (PDPage)pages.get( i );
            PDStream contents = page.getContents();
            PDFStreamParser parser = new PDFStreamParser(contents.getStream());
            parser.parse();
            List tokens = parser.getTokens();
            for( int j=0; j<tokens.size(); j++ )
            {
                Object next = tokens.get( j );
                if( next instanceof PDFOperator )
                {

                    PDFOperator op = (PDFOperator)next;
                    if( op.getOperation().equals( "Tj" ))

                    {
                        Scanner in = new Scanner(System.in);
                        COSString previous = (COSString)tokens.get( j-1 );
                        String string = previous.getString();
                        if(string.startsWith("<<") && string.endsWith(">>"))
                        {
                        System.out.println(string);
                        System.out.println("enter the word to be replaced");
                        String string2=in.nextLine();
                        string = string.replaceAll( string, string2 );
                        previous.reset();
                        previous.append( string.getBytes() );
                        }
                    }     
                    else if( op.getOperation().equals( "TJ" ))
                    {
                        COSArray previous = (COSArray)tokens.get( j-1 );
                        for( int k=0; k<previous.size(); k++ )
                        {
                            Scanner in = new Scanner(System.in);
                            Object arrElement = previous.getObject( k );
                            if(arrElement instanceof COSString)
                            {
                                COSString cosString = (COSString)arrElement;
                                String string = cosString.getString();
                                if(string.startsWith("<<") && string.endsWith(">>"))
                                {
                                    System.out.println(string);
                                    System.out.println("enter the word to be replaced");
                                    String string2=in.nextLine();
                                    string = string.replaceAll( string, string2 );
                                    cosString.reset();
                                    cosString.append( string.getBytes());
                                }
                            }
                        }
                    }
                }
            }
            PDStream updatedStream = new PDStream(doc);
            OutputStream out = updatedStream.createOutputStream();
            ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
            tokenWriter.writeTokens(tokens);
            page.setContents(updatedStream);
        }
        doc.save( outputFile );
        System.out.println("Done!! Now You can Open.");
    }
    finally
    {
        if( doc != null )
        {
            doc.close();
        }
    }
}

回答1:

Please read the intro of chapter 6 of my book. You're assuming that PDF is a format for editing text. PDF wasn't designed for word processing.

Of course: maybe you're asking how to create a static form as explained in section 6.3.5 of my book, but I doubt the static nature of AcroForm technology will meet your needs. A pure XFA form (dynamic PDF) may solve your problem, but explaining XFA isn't something that can be done within the scope of an answer on SO. The XFA spec is several hundreds of pages long. As indicated in the comments by Duncan Jones, you should first do some preliminary work.