Convert PDF to clean SVG? [closed]

2020-02-07 14:12发布

I'm attempting to convert a PDF to SVG. However, the one I am using currently maps a path for every letter in every piece of text, meaning if I change the text in its source file, it looks ugly.

I was wondering what the cleanest PDF to SVG converter is, hopefully one that doesn't have a path for it's text areas that simply don't need one. As we know, PDF and SVG are fairly similar, so I assume there's some good converters out there.

标签: pdf svg
9条回答
爷、活的狠高调
2楼-- · 2020-02-07 14:35

Bash script to convert each page of a PDF into its own SVG file.

#!/bin/bash
#
#  Make one PDF per page using PDF toolkit.
#  Convert this PDF to SVG using inkscape
#

inputPdf=$1

pageCnt=$(pdftk $inputPdf dump_data | grep NumberOfPages | cut -d " " -f 2)

for i in $(seq 1 $pageCnt); do
    echo "converting page $i..."
    pdftk ${inputPdf} cat $i output ${inputPdf%%.*}_${i}.pdf
    inkscape --without-gui "--file=${inputPdf%%.*}_${i}.pdf" "--export-plain-svg=${inputPdf%%.*}_${i}.svg"
done

To generate in png, use --export-png, etc...

查看更多
相关推荐>>
3楼-- · 2020-02-07 14:37

Inkscape is used by many people on Wikipedia to convert PDF to SVG.

http://inkscape.org/

They even have a handy guide on how to do so!

http://en.wikipedia.org/wiki/Wikipedia:Graphic_Lab/Resources/PDF_conversion_to_SVG#Conversion_with_Inkscape

查看更多
叛逆
4楼-- · 2020-02-07 14:37

I found that xfig did an excellent job:

pstoedit -f fig foo.pdf foo.fig
xfig foo.fig

export to svg

It did much better job than inkscape. Actually it was probably pdtoedit that did it.

查看更多
狗以群分
5楼-- · 2020-02-07 14:40

This topic is quite old, but here is a handy solution that I found:

http://www.cityinthesky.co.uk/opensource/pdf2svg/

It offers a tool, pdf2png, which once installed does exactly the job in command line. I've tested it with irreproachable results so far, including with bitmaps.

EDIT : My mistake, this tool also converts letters to paths, so it does not address the initial question. However it does a good job anyway, and can be useful to anyone who does not intend to modify the code in the svg file, so I'll leave the post.

查看更多
神经病院院长
6楼-- · 2020-02-07 14:42

Here is the process that I ended up using. The main tool I used was Inkscape which was able to convert text alright.

  • used Adobe Acrobat Pro actions with JavaScript to split-up the PDF sheets
  • ran Inkscape Portable 0.48.5 from Windows Cmd to convert to SVG
  • made some manual edits to a particular SVG XML attribute I was having issues with by using Windows Cmd and Windows PowerShell

Separate Pages: Adobe Acrobat Pro with JavaScript

Using Adobe Acrobat Pro Actions (formerly Batch Processing) create a custom action to separate PDF pages into separate files. Alternatively you may be able to split up PDFs with GhostScript

Acrobat JavaScript Action to split pages

/* Extract Pages to Folder */

var re = /.*\/|\.pdf$/ig;
var filename = this.path.replace(re,"");

{
    for ( var i = 0;  i < this.numPages; i++ )
    this.extractPages
     ({
        nStart: i,
        nEnd: i,
        cPath : filename + "_s" + ("000000" + (i+1)).slice (-3) + ".pdf"
    });
};

PDF to SVG Conversion: Inkscape with Windows CMD batch file

Using Windows Cmd created batch file to loop through all PDF files in a folder and convert them to SVG

Batch file to convert PDF to SVG in current folder

:: ===== SETUP =====
@echo off
CLS
echo Starting SVG conversion...
echo.

:: setup working directory (if different)
REM set "_work_dir=%~dp0"
set "_work_dir=%CD%"

:: setup counter
set "count=1"

:: setup file search and save string
set "_work_x1=pdf"
set "_work_x2=svg"
set "_work_file_str=*.%_work_x1%"

:: setup inkscape commands
set "_inkscape_path=D:\InkscapePortable\App\Inkscape\"
set "_inkscape_cmd=%_inkscape_path%inkscape.exe"

:: ===== FIND FILES IN WORKING DIRECTORY =====
:: Output from DIR last element is single  carriage return character. 
:: Carriage return characters are directly removed after percent expansion, 
:: but not with delayed expansion.

pushd "%_work_dir%"
FOR /f "tokens=*" %%A IN ('DIR /A:-D /O:N /B %_work_file_str%') DO (
    CALL :subroutine "%%A"
)
popd

:: ===== CONVERT PDF TO SVG WITH INKSCAPE =====

:subroutine
echo.
IF NOT [%1]==[] (

    echo %count%:%1
    set /A count+=1

    start "" /D "%_work_dir%" /W "%_inkscape_cmd%" --without-gui --file="%~n1.%_work_x1%" --export-dpi=300 --export-plain-svg="%~n1.%_work_x2%"

) ELSE (
    echo End of output
)
echo.

GOTO :eof

:: ===== INKSCAPE REFERENCE =====

:: print inkscape help
REM "%_inkscape_cmd%" --help > "%~dp0\inkscape_help.txt"
REM "%_inkscape_cmd%" --verb-list > "%~dp0\inkscape_verb_list.txt"

Cleanup attributes: Windows Cmd and PowerShell

I realize it is not best practice to manually brute force edit SVG or XML tags or attributes due to potential variations and should use an XML parser instead. However I had a simple issue where the stroke width on one drawing was very small, and on another the font family was being incorrectly identified, so I basically modified the previous Windows Cmd batch script to do a simple find and replace. The only changes were to the search string definitions and changing to call a PowerShell command. The PowerShell command will perform a find and replace and save the modified file with an added suffix. I did find some other references that could be better used to parse or modify the resultant SVG files if some other minor cleanup is needed to be performed.

Modifications to manually find and replace SVG XML data

:: setup file search and save string
set "_work_x1=svg"
set "_work_x2=svg"
set "_work_s2=_mod"
set "_work_file_str=*.%_work_x1%"

powershell -Command "(Get-Content '%~n1.%_work_x1%') | ForEach-Object {$_ -replace 'stroke-width:0.06', 'stroke-width:1'} | ForEach-Object {$_ -replace 'font-family:Times Roman','font-family:Times New Roman'} | Set-Content '%~n1%_work_s2%.%_work_x2%'"

Hope this might help someone

References

Adobe Acrobat Pro Actions and JavaScript references to Separate Pages

GhostScript references to Separate Pages

Inkscape Command Line references for PDF to SVG Conversion

Windows Cmd Batch File Script references

XML tag/attribute replacement research

查看更多
Ridiculous、
7楼-- · 2020-02-07 14:49

Here is the NodeJS REST api for two PDF render scripts. https://github.com/pumppi/pdf2images

Scripts are: pdf2svg and Imagemagicks convert

查看更多
登录 后发表回答