PDF

PDF Repair: A Complete Guide to Fixing Corrupted PDF Files

A corrupted PDF is a stressful find, especially if the file is a contract, tax document, or the only copy of something you need today. The good news: most PDF corruption is recoverable, often in a few minutes, using free tools you probably already have. The harder truth: some PDFs are genuinely beyond repair, and knowing when to stop trying matters too.

This guide covers how PDF files are built, what goes wrong, how to fix the common cases in order of effort, which tools handle which problems, and how to recognise the rare PDF that can’t be saved.

Common problems

Most PDF problems fall into a handful of categories. If yours matches one of these, skip straight to the specific guide — each links to the fastest path to a working file.

The PDF won’t open at all. Acrobat or another reader shows an error like “There was an error opening this document” or “The file is damaged and could not be repaired.” This is the most common scenario and has the highest recovery rate. Start with the guide to PDFs that won’t open — the first two strategies resolve most cases.

The PDF opens blank or black. The file structure is intact enough for the reader to display something, but the content isn’t rendering. This often indicates a problem with embedded fonts, images, or the content stream. See the guide to blank or black PDFs for the diagnosis and fix path.

The PDF shows a specific error message. Errors like "file is damaged and cannot be repaired", "cross-reference table not found", or "invalid or corrupted PDF file" each point to different underlying problems. The error-specific guides give a targeted fix for each one rather than a generic repair sequence.

The PDF is password-protected and won’t open. There are two very different scenarios here. If you have the password and the file still rejects it, the encryption dictionary is likely damaged. If you’ve lost the password, recovery is a separate problem with separate tools. See password-protected PDF problems for both cases.

PDF fonts or images are missing or wrong. The page structure is intact but specific resources are broken. This usually indicates a partial stream corruption rather than a structural problem. See PDF fonts missing or incorrect and PDF images missing.

A PDF produced from conversion is corrupted. If the broken PDF was created by exporting from Word, scanning, or running through an OCR tool, the problem often lies in the conversion rather than the file itself. See conversion-related PDF corruption.

A PDF downloaded from email or the web won’t open. Transfer corruption is the single most common cause of PDF problems. Before trying any repair tool, re-download the file — in many cases a fresh copy opens without issue. See PDFs corrupted on download for the full diagnostic sequence.

Understanding PDF files

To repair a PDF effectively, it helps to understand how one is structured. A PDF is not a single monolithic document but a collection of numbered objects — pages, fonts, images, form fields, metadata — stitched together by a map called the cross-reference table, usually abbreviated as the xref.

The xref table sits near the end of the file and lists the byte offset where each object begins. When a reader opens a PDF, it reads the xref first to locate the root object, then follows pointers from there to build the document. This design is what makes PDFs fast to open and flexible to edit, but it also creates a specific failure mode: if the xref is wrong, even by a single byte, the reader can’t find anything, and the file appears entirely broken even though most of the content is still intact.

This matters because most PDF corruption is xref corruption, not content corruption. The pages, fonts, and images are usually still there. They’re just unreachable because the map is wrong. Recovery tools work by scanning the raw bytes for object markers (a pattern like 5 0 obj), recalculating where each object actually starts, and rebuilding the xref from scratch. Once the map is correct, the file opens.

A few other structural details are worth knowing. PDFs support incremental updates: when you edit a PDF, the editor often doesn’t rewrite the whole file. It appends the changes to the end and adds a new xref table that shadows the old one. This is efficient but fragile — if any of the appended xref tables are malformed, or if the wrong one is read, the file may appear broken. This is why saving through a different application sometimes “fixes” a PDF: the new save produces a single clean xref instead of a chain of shadows.

PDFs also come in a linearized (or “web-optimized”) variant, where the xref data is placed at the beginning of the file as well as the end, so the first page can render before the rest has downloaded. Linearization is purely a performance optimization; a linearized PDF that loses its linearization hints can still be opened, just more slowly.

The practical upshot: a PDF that looks completely broken usually has intact content hidden behind a damaged index. Rebuilding that index is what “repair” actually means for most PDFs.

Why PDFs become corrupted

The actual causes of PDF corruption, ranked by how often they’re the culprit:

Interrupted or partial download. The file is incomplete — the last few kilobytes never arrived, and the trailing xref table is either missing or truncated. Symptoms include file sizes that don’t match the expected size, and errors that specifically reference the end of the file. This cause alone accounts for a large fraction of reported PDF problems.

Transfer corruption. Some intermediate step — a flaky network, a badly-behaved email gateway, a file system that encoded the file wrongly on upload — altered bytes in the file. Even a single byte change can invalidate the xref.

Improper file generation. The tool that produced the PDF wrote a technically-invalid file. Scanning software, OCR tools, and PDF libraries written without careful adherence to the PDF specification are common sources. These files often open in Adobe Acrobat (which is lenient and auto-repairs) but fail in stricter readers.

Truncation from a full disk. The file was partially written before the device ran out of space. Similar symptoms to an interrupted download but the file exists in its broken form on the local disk.

Storage media errors. The file is intact but the storage medium (a failing drive, corrupt flash memory, a bad USB stick) returned the wrong bytes when the file was copied. Often irreversible without a backup.

Editing with incompatible tools. A file edited by two different PDF editors in sequence sometimes ends up with conflicting xref chains that no single reader can resolve correctly.

Version or compatibility issues. The file uses a feature from a newer PDF version than the reader supports, or an Acrobat bug specific to a particular version. Updating the reader resolves these without touching the file itself.

The repair tool landscape

No single tool is the right answer for every PDF problem. The tools below are the ones you’ll actually reach for, with honest notes on each.

Adobe Acrobat. Acrobat’s repair behaviour is built in and automatic — when you open a damaged PDF, Acrobat often attempts recovery silently. If the recovery succeeds, the file opens; if it fails, you get the familiar “could not be repaired” error. Acrobat is the most tolerant reader in common use, so if a file won’t open in Acrobat, it probably can’t be opened by any GUI tool. Acrobat’s own installation can also become corrupted and trigger false-positive errors on otherwise-fine files; the Help menu’s Repair Installation option addresses that.

qpdf. A free, open-source command-line tool that operates directly on PDF structure. It’s the single best general-purpose repair tool for structurally-damaged PDFs. Running qpdf --linearize input.pdf output.pdf forces a rebuild of the xref and produces a cleanly-structured output file. qpdf --check input.pdf diagnoses structural problems and reports what it finds; useful for understanding what’s wrong before attempting a fix. qpdf is available on Windows, macOS, and Linux, and its exit codes distinguish between clean files, files with warnings, and files that could not be fully processed. See the complete guide to qpdf for full recipes.

pikepdf. A Python library that wraps qpdf’s core engine with a more programmatic interface. For scripted or batch repair, pikepdf is often more convenient than calling qpdf directly — opening and re-saving a file with pikepdf has the same xref-rebuilding effect as qpdf --linearize, with cleaner integration into workflows that process many files. See the complete guide to pikepdf.

Ghostscript. A PostScript and PDF interpreter that can re-render a PDF entirely — effectively rasterizing or re-typesetting the document and producing a new PDF from scratch. This succeeds on some files that qpdf and pikepdf can’t salvage, because it doesn’t depend on the original structure being parseable. The tradeoff is significant: Ghostscript re-rendering commonly loses form fields, annotations, digital signatures, bookmarks, and tagged accessibility structure. Use it only when the alternatives have failed and the visual content is what matters. See the complete guide to Ghostscript for PDF recovery for the caveats in full.

Browser-based PDF viewers (Chrome, Firefox, Edge). Modern browsers have their own PDF rendering engines that sometimes open files Acrobat refuses. This isn’t a “repair” in any technical sense — the file is unchanged — but if you just need to read the contents once, a browser is worth trying as the first step.

Apple Preview (macOS). Preview is tolerant of certain structural oddities that Acrobat rejects. Opening a damaged PDF in Preview and exporting to a new PDF sometimes produces a clean file. Not a reliable fix for serious corruption, but worth trying on Macs before resorting to command-line tools.

Commercial PDF repair tools (Stellar Repair for PDF, Recoverit, Wondershare Repairit, Kernel for PDF). Paid tools aimed at non-technical users. They automate what qpdf and pikepdf do for free, with a GUI and slightly more robust handling of severely damaged files. Worth considering if you have one urgent file and don’t want to learn command-line tools. Not worth considering for repeated use — the one-off licence fees add up fast, and the underlying techniques are the same as the free tools.

When a PDF can’t be repaired

Some PDFs are genuinely beyond recovery. Recognising these early saves hours of effort.

The file is severely truncated. If more than the last ten percent of a PDF is missing — for example, a download that stopped at 3 MB of a 10 MB file — the content itself is gone, not just the index. No tool can reconstruct bytes that were never received. The only recourse is to obtain a fresh copy.

The file has been physically altered or encrypted by malware. Ransomware that re-encrypts files produces what look like corrupted PDFs but are actually intact files in a different encryption format. Repair tools will not help; recovery requires either the decryption key or a backup.

The xref is intact but the content streams are damaged. This is rarer than the inverse but does happen — especially with storage media errors. Partial recovery may be possible (some pages render, others don’t) but a full reconstruction is not.

The file was never a valid PDF. Occasionally a file is saved with a .pdf extension but is actually an HTML error page, an empty file, or a file in a completely different format. Checking the first few bytes (a valid PDF starts with %PDF-) takes seconds and rules this out.

For unrecoverable files, the practical options are: request the file again from the original source, restore from backup, or — if the content is visible in a degraded form somewhere, such as a browser cache or email preview — recreate the important parts manually. No amount of tooling can recover content that no longer exists in the file.

File repair problems often span formats. If you’re dealing with PDF corruption that originated from Word or Excel (for example, a PDF exported from a damaged Word document), start with the Word repair guide or the Excel repair guide to fix the source, then regenerate the PDF. For PDFs inside damaged ZIP archives or email attachments that won’t extract, the archive repair guide covers the extraction side before PDF-specific repair becomes relevant.