Tools

Complete Guide to qpdf for PDF Repair and Transformation

qpdf is the tool most PDF problems end at. It is free, cross-platform, maintained, and operates directly on PDF structure rather than rendering and re-rendering the document. For repair, merging, splitting, rotation, encryption, and decryption, it is the first tool to try and usually the only one needed. For the problems it doesn’t solve — visual rendering damage, content stream corruption, severe truncation — no other structural tool will help either, and you escalate to Ghostscript for re-rendering or accept that the file is gone.

This guide covers installation, the recipes that handle the most common tasks, how to read qpdf’s diagnostic output, and what qpdf does not do.

When to use qpdf

qpdf is the right tool for:

Repairing damaged PDF structure. Rebuilds the cross-reference table by scanning for object markers, regenerates the trailer, and writes a clean file. Works on the majority of “file won’t open” problems, including the common failure mode where the xref is malformed but the object content is intact.

Merging, splitting, extracting, and reordering pages. Fast and lossless. Does not re-render. Preserves form fields, annotations, and metadata.

Rotating specific pages. Can apply rotation as a metadata flag or flatten it into the content stream.

Encrypting and decrypting PDFs. Supports 40-bit, 128-bit, and 256-bit encryption. Can remove a password from a file when you know the password.

Inspecting PDFs. qpdf --check validates structure and reports warnings. qpdf --json exposes the internal object graph for scripting or custom analysis.

qpdf is not the right tool for:

Changing visual content. qpdf does not edit text, rearrange layout, or modify the rendering of a page. For that, use Acrobat, PDF-XChange Editor, or similar.

Recovering from severe corruption. If the content streams themselves are damaged — not just the index — qpdf will report warnings and produce a file that is technically valid but visually broken. Ghostscript’s re-rendering approach sometimes salvages these cases; qpdf structurally cannot.

OCR or text extraction. qpdf knows nothing about page content semantics. Pair it with pdftotext (from Poppler) for extraction or OCRmyPDF for OCR.

Removing unknown passwords. qpdf removes encryption when you supply the password. It does not crack passwords.

Installation

macOS

The simplest route is Homebrew:

brew install qpdf

This installs the command-line tool, the fix-qdf companion tool, the man pages, and the libqpdf shared library. MacPorts users can run sudo port install qpdf instead; the end result is the same.

After installation, verify with:

qpdf --version

Linux

qpdf is packaged by every major distribution:

# Debian, Ubuntu, Mint
sudo apt install qpdf

# Fedora
sudo dnf install qpdf

# RHEL / CentOS with EPEL
sudo yum install epel-release
sudo yum install qpdf

# Arch
sudo pacman -S qpdf

Distribution packages may lag the upstream release by a version or two. For the current version (12.3.2 at time of writing), either build from source or use a distribution that tracks upstream more closely.

Windows

Download the official Windows installer from the qpdf releases page on GitHub. The installer places qpdf.exe in a folder you can add to your PATH, after which qpdf is available from Command Prompt, PowerShell, or any shell.

Alternatively, if you use Windows Subsystem for Linux, install the Linux package inside WSL and call it from there — often simpler than managing the Windows binary.

Verify with:

qpdf --version

Common recipes

All examples assume input.pdf exists in the current directory. Output files are named to make the purpose of each recipe obvious.

Repair a damaged PDF

The most common repair recipe. Forces qpdf to read, rebuild the structure, and write a clean output:

qpdf --linearize input.pdf output.pdf

--linearize produces a web-optimized file as a side effect; if that’s not wanted, use --object-streams=preserve instead, which rebuilds structure without linearizing. Both rebuild the xref table from scratch, which is what actually fixes the file.

If qpdf reports warnings but still produces output, the file is structurally readable but qpdf encountered non-standard constructs. The output is usually fine; inspect it before discarding the original.

Diagnose without modifying

To find out what’s wrong with a file before attempting a fix:

qpdf --check input.pdf

This prints a structural assessment. Exit codes are informative:

  • 0 — no problems detected.
  • 2 — errors were found and qpdf could not fully process the file.
  • 3 — problems were found but qpdf recovered. The resulting file, if you proceed with a transformation, may still be damaged.

Running --check combined with another operation (for example qpdf --check --linearize input.pdf output.pdf) triggers recovery before the transformation runs, which can help with severely damaged files.

Extract specific pages

Extract page 1 only:

qpdf input.pdf --pages . 1 -- page1.pdf

The . refers to the current input file. The -- delimits the end of the --pages specification.

Extract a range, for example pages 1 to 5:

qpdf input.pdf --pages . 1-5 -- pages1-5.pdf

Extract a mixed selection, for example pages 1-3, 5, and 6-10:

qpdf --empty --pages input.pdf 1-3,5,6-10 -- selection.pdf

--empty starts with a blank PDF and inserts the selected pages into it. The r prefix denotes pages counted from the end — r1 is the last page, so 1-3,8-r1 means “pages 1 through 3 and page 8 through the last page.”

Merge multiple PDFs

Concatenate two or more PDFs in order:

qpdf --empty --pages file1.pdf file2.pdf file3.pdf -- combined.pdf

Merge selected pages from multiple files:

qpdf --empty --pages file1.pdf 1,6-8 file2.pdf 3-5 -- combined.pdf

Each file’s page range follows its filename.

Split a PDF into individual pages

Write each page to its own file:

qpdf --split-pages=1 input.pdf out_%d.pdf

The %d placeholder is replaced with the page number. Use --split-pages=5 to produce files of five pages each, and so on.

Rotate specific pages

Rotate pages 2, 4, and 6 by 90 degrees clockwise, and pages 7 through 8 by 180 degrees:

qpdf --rotate=+90:2,4,6 --rotate=+180:7-8 input.pdf rotated.pdf

The + prefix means “add this rotation to whatever the page has.” An unsigned number replaces the rotation absolutely — almost always not what you want, because it ignores any rotation already applied.

Remove a password

If you know the password and want a new unencrypted copy:

qpdf --password=your-password --decrypt input.pdf output.pdf

This requires the password and does not crack or recover a lost one.

Add a password

Encrypt a file with a user password (required to open) and owner password (required to change permissions):

qpdf --encrypt user-password owner-password 256 -- input.pdf encrypted.pdf

The 256 specifies 256-bit AES encryption. Use 128 for 128-bit AES or 40 for the legacy 40-bit algorithm. 256-bit is the default modern choice. The -- delimits the encryption options from the file arguments.

Modify a file in place

Most qpdf operations require separate input and output files. To modify in place without an intermediate filename:

qpdf --replace-input --linearize input.pdf

qpdf writes to a temporary file and then replaces the original atomically. If the operation fails, the original is untouched.

Reading qpdf’s diagnostic output

When qpdf encounters problems, the messages follow a predictable format. Understanding them helps decide whether to proceed with the output or try another approach.

A typical warning sequence for a damaged file looks like:

WARNING: input.pdf: reported number of objects (7) is not one plus the highest object number (7)
WARNING: input.pdf: file is damaged
WARNING: input.pdf (object 5 0, offset 348368): expected 5 0 obj
WARNING: input.pdf: Attempting to reconstruct cross-reference table
qpdf: operation succeeded with warnings

The key line is Attempting to reconstruct cross-reference table — qpdf detected a broken xref and rebuilt it by scanning the file for object markers. When you see this followed by operation succeeded with warnings and exit code 3, the output file is usually readable and the repair worked.

When qpdf cannot recover:

qpdf: input.pdf: file is damaged and cannot be processed

Exit code 2. The file’s structure is too damaged to rebuild. Escalate to Ghostscript re-rendering or accept the file is unrecoverable.

Limitations and known issues

Content stream damage is invisible to qpdf. qpdf validates structure, not semantics. A file with a valid xref and intact trailer can still render incorrectly if the content streams (the actual drawing instructions for each page) are corrupted. qpdf --check will report no problems, but the output will look wrong when viewed.

Linearization changes byte-level structure. If you’re using the output for forensic purposes, digital signature verification, or any workflow that depends on preserving the exact byte sequence of the original, avoid --linearize. Use --object-streams=preserve for minimal structural change.

Encryption removal invalidates signatures. Removing or re-applying encryption breaks digital signatures on the file. Use --remove-restrictions when dealing with signed files you need to modify; qpdf preserves the visual signature appearance but disables its cryptographic validity.

Large files consume memory. qpdf loads structural data into memory. For PDFs over a few hundred megabytes, consider processing on a machine with substantial RAM.

No help with visual-only problems. If a PDF’s structure is intact but the pages display blank, garbled, or with wrong fonts, qpdf will report a clean file. The problem is in the rendering layer, not the structure.

Alternatives

pikepdf is a Python library that uses libqpdf internally, offering a programmatic API for the same capabilities. For scripted or batch workflows, pikepdf is often more convenient than shelling out to qpdf. See the complete guide to pikepdf.

Ghostscript takes a fundamentally different approach: it re-interprets and re-renders the PDF. This sometimes recovers files qpdf cannot, because it doesn’t depend on the structure being parseable. The cost is significant — form fields, annotations, digital signatures, bookmarks, and tagged accessibility structure are commonly lost. Use it as a last resort, not a default. See the complete guide to Ghostscript for PDF recovery.

Adobe Acrobat has automatic repair behaviour built in. Opening a damaged PDF in Acrobat often produces a repair prompt and a saved output. For users who already have Acrobat and don’t need command-line tooling, this is often the fastest path.

PDFtk was historically a popular command-line PDF tool. Its original version is no longer maintained, and the available forks have had mixed support. qpdf has effectively replaced it for new workflows.

Commercial GUI tools like Stellar Repair for PDF, Wondershare Repairit, and Recoverit automate the same techniques qpdf uses, with a GUI and less steep learning curve. Useful for a one-off urgent problem; not worth the recurring cost for regular use.

Last verified: April 2026