Word Repair: A Complete Guide to Recovering Damaged Documents
A corrupted Word document is a stressful find, especially when it is a contract, dissertation, or the only copy of work you cannot easily redo. The good news: most Word corruption is recoverable, often using features that ship with Word itself or with free alternatives like LibreOffice. The harder truth: some documents are genuinely beyond repair, and knowing when to stop trying matters too.
This guide covers how Word documents are structured, what goes wrong, how to fix the common cases in order of effort, which tools handle which problems, and how to recognize the rare document that cannot be saved.
Common problems
Most Word problems fall into a handful of categories. If yours matches one of these, skip straight to the specific guide — each links to the fastest path to a working file.
The document won’t open at all. Word displays an error and refuses to load the file. This is the most common scenario and has the highest recovery rate. Start with the guide to Word documents that won’t open — Word’s own Open and Repair feature resolves a large fraction of these in under a minute.
The document opens but is blank. The file structure is intact enough for Word to load, but the page area is empty. This usually indicates damage to document.xml inside the DOCX archive, or — for older .doc files — a damaged content stream. See the guide to Word documents that open blank.
The text appears garbled, with strange characters or random symbols. Encoding mismatch, font substitution failure, or low-level damage to the text run elements in document.xml. See the guide to Word documents with garbled text.
The document shows a specific error message. Errors like "Word experienced an error trying to open the file", "The file is corrupt and cannot be opened", or "Word was unable to read this document. It may be corrupt" each point to different underlying problems. The error-specific guides give a targeted fix for each one rather than a generic recovery sequence.
Tables, images, or other content elements are missing or broken. The document opens but specific structural elements are damaged. Tables can lose rows or display as plain text, images can fail to render, embedded objects can show as placeholders. See tables broken or missing and images missing after recovery.
The document is password-protected and won’t open. Two distinct scenarios. If you have the password and the file rejects it, the encryption metadata is likely damaged. If you have lost the password, recovery is a separate problem with separate tools. See password-protected Word problems.
Track changes, comments, or revision history disappeared after recovery. Common after using aggressive recovery tools or the “Recover Text from Any File” converter — text comes back, but revision metadata does not. See track changes and comments missing after recovery.
A document downloaded from email won’t open. Transfer corruption is one of the most common causes of Word problems. Before any repair attempt, re-download the file — many cases resolve immediately. See Word documents from email that won’t open for the full diagnostic sequence.
A document opens in Compatibility Mode and behaves oddly. Older .doc files opened in newer Word versions trigger a compatibility shim that disables newer features. Not corruption, but reads as such to users encountering layout drift or missing functionality. See Compatibility Mode issues.
Understanding Word documents
To recover a Word document effectively, it helps to understand how one is built. Word uses two distinct file formats, and they fail in different ways.
The modern format is .docx. This is the default since Word 2007. A DOCX file is not a single binary blob — it is a ZIP archive containing a structured set of XML files. If you rename report.docx to report.zip and open it with any archive tool, you will see [Content_Types].xml at the root, a word/ folder containing document.xml (the main content), styles.xml, settings.xml, a theme/ folder, a media/ folder for embedded images, and a _rels/ folder defining the relationships between parts. This structure is defined by the OOXML specification (ECMA-376 / ISO/IEC 29500).
This matters enormously for recovery. A DOCX file can fail in two distinct ways. The ZIP container itself can be damaged — unreadable archive structure, truncation, bad CRC. Or one of the XML parts inside can be malformed — a missing closing tag, an unescaped character, a corrupted relationship. Container damage requires a ZIP repair approach. XML damage often allows manual recovery — extract the archive, fix the broken file in any text editor, repackage it. Most “the file is corrupt” errors on DOCX are one of these two failure modes, and both are usually fixable.
The legacy format is .doc. Used by Word 97 through Word 2003 and still produced by some applications for compatibility. A DOC file is a binary OLE Compound File (also called Compound File Binary Format, or CFBF) — the same underlying container Microsoft used for legacy Excel and PowerPoint files. Inside a DOC file, content is stored in named streams within a structured binary container, with no human-readable XML.
DOC files fail differently from DOCX. The binary structure is opaque to most tools, manual repair is impractical, and recovery generally depends on a tool that understands the format — Word itself, LibreOffice, or a commercial repair utility. The good news is that the DOC format has been around long enough that mature tools handle most corruption patterns well. The bad news is that severely damaged DOC files are harder to triage than DOCX files because you cannot easily inspect the contents to see what is wrong.
The Normal template (normal.dotm) is a separate corruption surface specific to Word. Word loads normal.dotm from %APPDATA%\Microsoft\Templates\ on Windows (the Mac equivalent lives in the templates folder under ~/Library/Group Containers/) every time it starts. If this template is corrupted, every document Word opens may behave incorrectly — slow loading, crashes, error messages on documents that are themselves fine. When multiple unrelated documents misbehave the same way, suspect the template before suspecting the documents. Renaming or deleting normal.dotm forces Word to recreate it from defaults on next launch, which resolves a surprising fraction of “all my Word documents are broken” reports.
The practical upshot: a DOCX that won’t open is probably either a damaged ZIP container or a malformed XML part, both of which have well-understood fixes. A DOC that won’t open requires the right tool but generally responds to standard recovery options. And a Word installation behaving badly across all files is probably a template problem, not a document problem.
Why Word documents become corrupted
The actual causes of Word corruption, ranked by how often they are the culprit:
Interrupted download or transfer. The file is incomplete — the last few kilobytes never arrived, leaving a truncated ZIP container (DOCX) or a chopped binary stream (DOC). Symptoms include file sizes that do not match what the sender reported, and errors that specifically reference a problem reading the file. This single cause accounts for a large fraction of reported Word problems.
OneDrive or SharePoint sync conflicts. Two devices edited the same document while offline, or sync was interrupted mid-write. The result can be a file with damaged internal references, a ~$filename.docx lock file that will not go away, or a “conflicted copy” marker. OneDrive’s version history is the first place to look — yesterday’s version is often available even when today’s is broken.
Improper generation by third-party tools. Many applications can produce DOCX files — accounting software, content management systems, exporters from Google Docs or Pages, Word add-ins. Some produce technically invalid DOCX that Word reads via auto-repair quirks but stricter readers reject. These files often open fine in Word and fail in LibreOffice, or vice versa.
Truncation from a full disk. The file was partially written when the device ran out of space. The result is a structurally invalid DOCX — the ZIP central directory is missing or pointing past the end of the file — or a truncated DOC. Similar symptoms to an interrupted download but the broken file exists in its full form on the local disk.
Storage media errors. The file is intact in its source location but the storage medium (a failing drive, corrupt flash memory, a bad USB stick) returned the wrong bytes when it was read or copied. Often irreversible without a backup.
Track changes and revision history accumulation. Documents with hundreds or thousands of accepted-but-not-cleared revisions can develop parsing problems. The fix is usually to accept all revisions (Review > Accept All) and save, but if the document will not open at all, the revisions need to be stripped from the underlying XML manually.
Embedded object damage. Excel sheets, OLE objects, equations, or charts embedded in a Word document can become damaged independently. The host document is fine but Word fails when it tries to render the broken embed. Often resolvable by extracting the DOCX, deleting the offending embed from the word/embeddings/ folder, and repackaging.
Template corruption. As covered above — normal.dotm damage produces symptoms that mimic file corruption across multiple unrelated documents.
Antivirus or security software interference. Some security tools intercept Word’s file writes mid-save, leaving a half-written file. Rarer than it used to be but still happens with aggressive endpoint protection products.
The repair tool landscape
No single tool is the right answer for every Word problem. The tools below are the ones that actually resolve the cases you will meet, with honest notes on each.
Word’s “Open and Repair” feature. Built into every modern version of Word. From the Open dialog, select the file but do not double-click — click the dropdown arrow next to the Open button and choose “Open and Repair.” Word reads the file with extra tolerance for corruption and attempts to reconstruct the document. This is the first thing to try, and it succeeds on a large fraction of cases — especially DOCX files with minor structural damage. Cost: nothing, no setup, no risk to the original file. See the complete guide to Microsoft’s Open and Repair feature.
Word’s AutoRecover files. Word periodically saves recovery copies of open documents. After a crash or unexpected close, File > Info > Manage Document > Recover Unsaved Documents shows what is available. This recovers work that was never saved, not damaged saved files — but for the common case of “Word crashed and now my document is gone,” it is often the answer. AutoRecover files live in %APPDATA%\Microsoft\Word\ on Windows and ~/Library/Containers/com.microsoft.Word/Data/Library/Preferences/AutoRecovery/ on Mac.
LibreOffice Writer. LibreOffice is often more tolerant of damaged Word files than Word itself. Files that produce errors in Word frequently open without complaint in LibreOffice — opening the file there and saving as a new DOCX often produces a clean version. LibreOffice is free, runs on Windows, Mac, and Linux, and can be run in headless mode for batch processing:
soffice --headless --convert-to docx broken.docx
For users without a Word license who need to recover a single document, LibreOffice is the simplest option. See the complete guide to LibreOffice for Word and Excel repair.
The “Recover Text from Any File” converter. A built-in Word feature most users do not know exists. In the Open dialog, change the file type dropdown to “Recover Text from Any File (*.*)” and open the damaged file. Word strips out everything except plain text and presents what it can read. Formatting, images, tables, and structure are lost — this is a last-resort option when nothing else works and the priority is getting the words back. The text often comes through with stray binary characters that need cleanup.
Manual ZIP-and-XML repair (DOCX only). Because DOCX is a ZIP archive of XML files, a damaged DOCX can sometimes be fixed by hand. Rename the file to .zip, extract it with any archive tool, identify the malformed XML file (usually word/document.xml — open it in a text editor and look for unmatched tags or invalid characters), fix it, and repackage as a ZIP with the original .docx extension. This sounds intimidating but works on a specific class of corruption that no automated tool handles cleanly: a single character or tag in document.xml causing the entire file to fail to parse. The free utility xmllint reports the exact line and column of the parsing error:
xmllint --noout word/document.xml
Apple Pages (Mac). Pages can open DOCX files and is sometimes more tolerant than Word. Opening a damaged file in Pages and exporting back to DOCX produces a file that may open in Word. Not a comprehensive solution, but worth trying as a quick fix on a Mac.
Word Online (Word for the web). Microsoft’s browser-based Word sometimes opens files that the desktop version refuses, because it uses a different parsing path. Upload to OneDrive, try opening in Word for the web, and download the saved version if it succeeds.
pandoc. A general-purpose document converter. Running pandoc broken.docx -o recovered.txt extracts text and basic structure from a DOCX, often working when other tools fail. Useful as a fallback for getting content out when losing formatting is acceptable.
Commercial repair tools (Stellar Repair for Word, Recoverit, Wondershare Repairit, Kernel for Word). Paid tools aimed at non-technical users. They automate similar techniques to the free options above, with a GUI and slightly more robust handling of severely damaged files. Worth considering if you have one urgent file and do not want to learn the manual approaches. Not worth considering for repeated use — the underlying techniques are available for free, and the licensing fees add up. See the complete guide to Stellar Repair for Word for a detailed assessment.
When a Word document can’t be repaired
Some Word documents are genuinely beyond recovery. Recognizing these early saves hours of effort.
The file is severely truncated. If a substantial portion of the file is missing — for example, a download that stopped at 200 KB of a 2 MB document — the content is gone, not just the index. No tool can reconstruct bytes that were never received. The only recourse is to obtain a fresh copy.
The file has been encrypted by malware. Ransomware that re-encrypts files produces what look like corrupted Word documents but are actually intact files in a different encryption format. Repair tools will not help; recovery requires either the decryption key or a backup.
The DOCX archive is intact but every internal XML file is shredded. Rare, but it happens with certain storage failures or aggressive content modification. The container reads fine, but document.xml, styles.xml, and the relationships are all damaged simultaneously. Partial recovery via the “Recover Text from Any File” converter may yield some content; full reconstruction is not possible.
The file was never a valid Word document. Occasionally a file is saved with a .doc or .docx extension but is actually an HTML email export, an empty file, or a file in a completely different format. Checking the file’s actual contents takes seconds and rules this out — DOCX files start with the bytes PK (the ZIP magic number); DOC files start with a specific OLE Compound File signature. A quick look in a hex editor, or running the file command on Mac or Linux, confirms the actual format.
The damage occurred during a save that overwrote the only good copy. If a corrupted file was saved over the only good copy, no backup exists, no version history is available, and no recovery file remains in the AutoRecover folder, the original content is gone. This is why backup discipline matters more than repair tooling.
For genuinely unrecoverable documents, the practical options are: request the file again from the original source, restore from backup, restore from version history (OneDrive, SharePoint, Google Drive, Time Machine, or any sync tool), or — if the content is visible somewhere in degraded form, such as a file preview, an email thread, or a printed copy — recreate the important parts manually. No amount of tooling can recover content that no longer exists in the file or anywhere else.
Related categories
Word and Excel share architectural DNA — both modern formats are ZIP archives of XML files, both legacy formats are OLE compound documents, and many of the same recovery techniques apply across them. If you are dealing with Excel corruption alongside Word corruption (a common pattern, since they often originate from the same OneDrive sync problem or storage failure), see the Excel repair guide. For documents converted to or from PDF — for example, a Word document exported as PDF that will not open, or a PDF imported into Word that produces a broken file — see the PDF repair guide. For Word documents stored inside damaged ZIP archives or email attachments that will not extract, the archive repair guide covers the extraction side before document-specific repair becomes relevant.
Last verified: April 2026