The Living Thing / Notebooks :


Using a thousand dollar computer to simulate a one cent piece of paper with zero day exploits

Command line tips

Reduce size of bloated PDF:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 \
    -dPDFSETTINGS=/ebook \
    -sOutputFile=output.pdf input.pdf

or, wrapped up into a nice little script, ShrinkPDF: (90 is the dpi here.)

./ in.pdf out.pdf 90

This works to concatenate PDFs:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite \
    -dPDFSETTINGS=/prepress -sOutputFile=output.pdf input*.pdf

EPS to PDF conversion:

ps2pdf14 -dEPSCrop Logo.eps

Quick and dirty RGB-CMYK using recent ghostscript? Hmm.

Diff PDFs? “(Scientific) Reviews: you reviewed version A of a paper, and receive version B, and wonder what the changes are.”


Programmatic editing and generation


pdfrw is a Python library and utility that reads and writes PDF files:

Here is a gentle HOWTO. You can use it to put matplotlib plots in reportlab PDFs

svglib provides a pure python library that can convert SVG to PDF, and a command line utility for same, svg2pdf. you can add SVGs to PDFs.

reportlab is far more famous and even includes a modicum of typesetting. It doesn't edit PDFs so much, but it generates them pretty well. It's integration with other things is often a little week – if you though that dropping LaTeX equations in would be simple, or HTML snippets etc. OTOH it includes its own chart generation and so on. Use it if this is a natural way to make two columns for you:

from reportlab.platypus import BaseDocTemplate, Frame, Paragraph, PageBreak, PageTemplate
from reportlab.lib.styles import getSampleStyleSheet
import random

words = "lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et".split()


doc = BaseDocTemplate('basedoc.pdf',showBoundary=1)

#Two Columns
frame1 = Frame(doc.leftMargin, doc.bottomMargin, doc.width/2-6, doc.height, id='col1')
frame2 = Frame(doc.leftMargin+doc.width/2+6, doc.bottomMargin, doc.width/2-6, doc.height, id='col2')

Elements.append(Paragraph(" ".join([random.choice(words) for i in range(1000)]),styles['Normal']))
doc.addPageTemplates([PageTemplate(id='TwoCol',frames=[frame1,frame2]), ])

#start the construction of the pdf

pypdf2 is another alternative python pdf library.

scribus is a reasonable open-source desktop publishing tool. If your content cannot automtically be layed out it is a good choice, for e.g. posters. It includes a Python API, albeit a reputedly quirky one, which is AFAICT Python 2.

For all that, it's the cleanest way I have yet seen of generating PDFs, so might be a goer for you.

crop marks

There are a few options

None makes it clear which of TrimBox, BleedBox, Cropbox or ArtBox is what you truly want. This might clarify it slightly but I vagued out.

You can add crop marks to a PDF document with different PDF tools, eg. pdftk.:

  1. Export the first page with crop marks to a PDF file (your_cropmark.pdf)

  2. Join it with your PDF document (your_document.pdf) in the command line:

pdftk your_document.pdf multistamp your_cropmark.pdf output result.pdf

NOTE: you can also set PDF cropping values with GhostScript for printing:

  1. Create a plain text file with the right cropping values (eg. this is 5mm crop of A4):
[/CropBox [14.17 14.17 581.1 827.72] /PAGES pdfmark

Alternatively, use the command line

gs -c "[/CropBox  [14.17 14.17 581.1 827.72] /PAGES pdfmark" \
  1. Convert your_document.pdf using the previous file (pdfmark.txt):
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
    $OPTIONS \
    -c .setpdfwrite \
    -sOutputFile=result.pdf \
    -f your_document.pdf

Color conversion

Nightmares. Colour management is generally complicated. ghostcript colour management speciifically is complicated, and has many moving parts plus rapid changes – e.g. the -dUseCIEColor option was removed in ghostscript 9, because it is apparently a noob feature which has broken functionality. Its replacement is broken documentation.


NOTE II: optional color conversion of RGB PDF with GhostScript:

PDF to TIFF example.

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
    -sColorConversionStrategy=CMYK \
    -sColorConversionStrategyForImages=CMYK \
    -sDEVICE=pdfwrite \
    -dProcessColorModel=/DeviceCMYK \
    -dCompatibilityLevel=1.5 \
    -sOutputFile=result_cmyk.pdf \


gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
    -sColorConversionStrategy=Gray \
    -dProcessColorModel=/DeviceGray \
    -dCompatibilityLevel=1.5 \
    -sOutputFile=result_gray.pdf \