David Mathog
2010-05-06 21:19:15 UTC
Ghostscript may be used to extract pages from a PDF file with a
command like this:
gs -sDEVICE=pdfwrite \
-dNOPAUSE -dBATCH -dSAFER \
-dFirstPage=48 -dLastPage=48 \
-sOutputFile=onepage.pdf input.pdf
The problem is, while that page looks the same as the original in a
PDF reader, it seems to be an image rather than an "object"
representation. That is, open the extracted PDF in something like
Acrobat or PDF XChange Viewer and "search" and "text selection" work,
whereas in the extracted one neither function works. Presumably this
is because the text has been rasterized.
Is it possible to use gs to extract ranges of pages, preferably also
reducing the resolution of the embedded images, but leaving the text
as text? I frequently need to reduce the size of PDF files, but it
should all come out of the resolution of the images, and the text
should remain as accessible as it was in the original.
If ghostscript cannot do this, is there another linux tool that can?
Thanks,
David Mathog
command like this:
gs -sDEVICE=pdfwrite \
-dNOPAUSE -dBATCH -dSAFER \
-dFirstPage=48 -dLastPage=48 \
-sOutputFile=onepage.pdf input.pdf
The problem is, while that page looks the same as the original in a
PDF reader, it seems to be an image rather than an "object"
representation. That is, open the extracted PDF in something like
Acrobat or PDF XChange Viewer and "search" and "text selection" work,
whereas in the extracted one neither function works. Presumably this
is because the text has been rasterized.
Is it possible to use gs to extract ranges of pages, preferably also
reducing the resolution of the embedded images, but leaving the text
as text? I frequently need to reduce the size of PDF files, but it
should all come out of the resolution of the images, and the text
should remain as accessible as it was in the original.
If ghostscript cannot do this, is there another linux tool that can?
Thanks,
David Mathog