Archive Repair: A Complete Guide to Fixing Corrupted ZIP, RAR, and 7z Files
A corrupted archive is a particular kind of bad news. A multi-gigabyte ZIP that won’t extract is often the only copy of work you spent days assembling, and the error messages — unexpected end of archive, CRC failed, cannot open file as archive — give little hint of whether the contents are gone or just temporarily out of reach. Most damaged archives are partially or fully recoverable using free tools, often in minutes. The harder truth: archive formats vary enormously in how well they tolerate damage, and the strategies that work for ZIP do not work for 7z, and vice versa.
This guide covers how the major archive formats are structured, what goes wrong, how to fix the common cases for each format, which tools handle which problems, and how to recognize an archive that genuinely cannot be saved.
Common problems
Most archive problems fall into a handful of categories. If yours matches one of these, skip to the specific guide for the fastest fix.
The ZIP shows “unexpected end of archive.” The single most common ZIP failure. Either the file was truncated during download or transfer, or the central directory at the end of the file is damaged. See the guide to “unexpected end of archive” errors for the diagnostic and fix sequence.
Extraction starts but fails partway through. Some files come out of the archive cleanly, then extraction stops with a CRC error or data error message. The archive is partially intact. See ZIP extraction fails partway for partial recovery strategies.
A RAR archive shows “CRC failed” or “checksum error.” RAR’s per-file integrity check has detected corruption. If the archive was created with a recovery record, WinRAR can often reconstruct the damaged data. If not, partial extraction is the realistic outcome. See RAR CRC failed errors.
A 7z archive won’t extract. 7z is more brittle than ZIP under damage because it has no built-in recovery record and uses solid compression by default. See 7z won’t extract for the available approaches.
The archive is password-protected and the password is lost. Recovery prospects depend heavily on which encryption mode was used. Legacy ZipCrypto is genuinely weak; modern AES-256 (used in 7z and modern ZIP) is effectively uncrackable. See ZIP password lost for an honest assessment of what’s possible.
A large download was interrupted. The archive is technically valid but truncated. Some tools can extract everything before the truncation point even though the central directory is missing. See incomplete ZIP downloads.
The archive came from a different operating system and won’t open. Filename encoding, path separator differences, and macOS resource fork artifacts can all cause cross-platform extraction failures. See cross-platform ZIP issues.
Understanding archive formats
Different archive formats have very different internal structures, and those structural differences determine what kinds of damage they can tolerate. Understanding the format you’re dealing with explains why some recovery techniques work for one format but not another.
ZIP
A ZIP file is a sequence of compressed file entries followed by a central directory at the end. Each entry has a local file header (with the filename and basic metadata), the compressed data, and an optional data descriptor. After all entries comes the central directory — a complete index of every file in the archive, with offsets pointing back to where each one starts — and finally the end of central directory record (EOCD), which sits at the very end of the file and points to the central directory.
This design has two important consequences for repair. First, a ZIP can usually be partially recovered even if the end is damaged, because each file entry is self-contained and tools can scan for local file headers (which start with a fixed signature) and extract them without needing the central directory. Tools like unzip -FF and 7-Zip do exactly this. Second, a missing or damaged central directory looks like total failure but usually isn’t — the files are still in the archive, just unindexed.
ZIP entries each carry a CRC-32 checksum. If the compressed data is altered in transit, the CRC will not match on extraction and the extractor reports a CRC error for that specific entry while leaving other entries unaffected.
ZIP encryption comes in two flavors. The legacy ZipCrypto, present since the format’s beginning, is a weak XOR-based stream cipher. It is genuinely breakable — known-plaintext attacks recover the password in minutes for archives that contain even one entry whose unencrypted content is known. AES encryption (128, 192, or 256-bit), specified by PKWARE and supported by 7-Zip and WinRAR, is cryptographically strong and not currently breakable by any known attack.
RAR
RAR is a proprietary format developed by Eugene Roshal and maintained by RARLAB. Two major versions are in use: RAR4 (the older format) and RAR5 (the modern default). Both share the same basic structure — a series of headers followed by compressed data — but RAR5 includes meaningful improvements to integrity protection.
RAR’s distinguishing repair feature is the recovery record: optional redundant data added at archive creation that allows recovery from a measured amount of damage. If a recovery record is present, WinRAR’s repair function can often reconstruct corrupted data losslessly. If no recovery record was added, repair becomes much harder. The recovery record is not added by default — the user creating the archive has to choose to include it.
RAR also supports recovery volumes for multi-part archives. These are separate .rev files that can substitute for any missing or damaged volume in the set, similar in spirit to RAID parity. Like recovery records, they have to be created intentionally.
RAR uses AES-256 encryption when password-protected. As with modern ZIP encryption, this is strong; lost passwords are not realistically recoverable.
7z
7-Zip’s native format is structurally simpler than ZIP, but deliberately so. The format places a small header at the start of the file pointing to a more detailed end header at the file’s end. The end header describes how the data is compressed and where each file starts within the compressed stream.
7z does not include a recovery record. Its design prioritizes compression efficiency over resilience to damage. This means damaged 7z files have less repair capability than damaged ZIP or RAR: there is no redundancy to fall back on, and the recovery option is limited to whatever 7-Zip can extract before hitting the corruption.
Solid compression compounds this. By default, 7z uses solid mode, which compresses multiple files together into a single stream. The compression ratio is significantly better than per-file compression, but it means damage to any part of the stream loses every file from that point forward. A corrupted byte halfway through a solid 7z archive may render the second half of the contents unrecoverable.
7z encryption uses AES-256 and applies it to the file headers as well as the data. Encrypted 7z archives reveal nothing about their contents — even the filenames — without the password.
TAR and GZ
TAR (tape archive) is the oldest of these formats, originally designed for sequential write to magnetic tape. It is structurally extremely simple: each file in the archive is preceded by a 512-byte header (containing filename, size, permissions, and a checksum) and followed by its data, padded to a 512-byte boundary. There is no central directory and no compression.
This simplicity is a recovery advantage. A damaged TAR file can usually be partially recovered by skipping past the damaged section and resuming with the next valid header — the tar command’s -i option does this. The format’s lack of compression also means that random byte corruption affects only the file containing the damaged byte, not the archive as a whole.
TAR is almost always combined with a separate compression layer. .tar.gz (or .tgz) wraps a TAR archive in gzip compression; .tar.bz2 uses bzip2; .tar.xz uses xz/LZMA. The combined format inherits the compression layer’s damage characteristics — gzip is per-block, so damage in one part of a .tar.gz doesn’t necessarily destroy the whole archive, but the extracting tool has to be able to skip past the damaged compressed block.
Standalone GZ files contain a single compressed file (gzip is not an archive format — it compresses, it does not bundle). The GZ format includes a CRC-32 checksum and original file size at the footer. Limited partial recovery is possible from damaged GZ files using specialized tools like gzrecover, though success is not guaranteed.
Why archives become corrupted
The actual causes, ranked by how often they’re the culprit:
Interrupted or partial download. By far the most common cause, especially for large archives. The file is incomplete, and the central directory or end markers that the extractor needs are simply not there. Symptoms include a file size on disk that doesn’t match the expected size, and errors that reference end of file or central directory.
Transfer corruption. A network glitch, a misbehaving proxy, or a buggy file system altered bytes during transfer. The file is the right size but a small region is wrong. Symptom: CRC errors on specific entries while others extract cleanly.
Storage media errors. A failing drive, bad sectors, corrupted flash memory, or a bad USB stick returned wrong bytes when the file was read. Often unrecoverable from the affected medium; check whether the original source still has a clean copy.
Truncation from a full disk. The archive was being written when the disk filled up, leaving a partial file. Same symptoms as an interrupted download but the file exists in its broken state on the local disk.
Improper archive creation. The tool that created the archive wrote a technically-invalid file. Older or buggy archive utilities, bespoke scripts that don’t follow format specifications correctly, and certain web-based archive creators are common sources. These archives often open in tolerant tools (7-Zip is famously forgiving) but fail in stricter ones.
Cross-platform encoding mismatches. Archives created with non-ASCII filenames on one system can fail to extract on another if filename encoding wasn’t specified correctly. Particularly common between Windows (CP437/UTF-8 hybrid) and macOS (UTF-8 with NFD normalization).
Editing or repacking with incompatible tools. An archive opened, modified, and re-saved by a tool that doesn’t handle every format feature correctly may end up with internal inconsistencies. Rarer than transit corruption but does happen.
Email gateway processing. Many corporate email systems unpack and repack attachments for virus scanning. Some implementations damage archives in the process, particularly older ZIP archives with non-standard features. Asking the sender to share the file via a different channel often produces a clean copy.
The repair tool landscape
No single tool handles every archive format and every damage scenario. The tools below are the ones you’ll actually reach for, with honest notes on each.
7-Zip. Free, open-source, and unusually tolerant of damaged archives. Handles ZIP, 7z, RAR (read-only), TAR, GZ, and many other formats. The general repair pattern is to attempt extraction with the t (test) or x (extract) command and let 7-Zip extract whatever it can; the GUI version reports per-file errors in a way that makes partial recovery practical. 7-Zip is usually the first tool to try for any archive problem, regardless of format. See the complete guide to 7-Zip.
WinRAR. The canonical RAR tool — commercial, but the trial version extracts indefinitely. WinRAR’s repair function (Tools > Repair Archive in the GUI, or WinRAR r archive.rar from the command line) is the standard remedy for damaged RAR files. If a recovery record was included when the archive was created, WinRAR can often reconstruct the damage losslessly. Without a recovery record, it does its best with what’s intact. WinRAR can also repair ZIP files, sometimes successfully where other tools fail, though for ZIP-specific damage unzip -FF is usually better.
Info-ZIP unzip (with -FF). The standard unzip command on Linux and macOS includes a “fixfix” mode that aggressively rebuilds a damaged ZIP. The invocation unzip -FF damaged.zip --out fixed.zip scans the damaged archive for local file headers, ignores the broken central directory, and writes a new clean archive containing whatever it can recover. There is also a less aggressive -F mode for minor damage. On Windows, the same functionality is available via WSL or by installing Info-ZIP separately.
The standard tar command (with -i for ignore-zeros). For damaged TAR or compressed TAR archives, tar -ixf damaged.tar.gz instructs tar to ignore zero blocks (which corrupted areas often look like) and continue past the damage. Combined with --ignore-failed-read, this is often enough to extract the salvageable contents.
gzip and gzrecover. For damaged GZ files, gzip -t damaged.gz tests integrity and reports where damage starts. gzrecover (from the gzrt — gzip recovery toolkit — package) attempts partial recovery of corrupted gzip files, including .tar.gz archives where the damage is in the compression layer rather than the TAR data.
Commercial archive repair tools (DiskInternals ZIP Repair, DataNumen Archive Repair, Stellar Repair for ZIP). Paid tools aimed at non-technical users. They automate what the free command-line tools do, with a GUI and slightly more aggressive recovery for severely damaged files. Worth considering for one urgent file when learning command-line tools is unappealing. Not worth ongoing license cost for repeated use — the underlying techniques are the same as the free tools.
Format-specific utilities. Some niche tools exist for specific recovery scenarios: zip -F and zip -FF from the Info-ZIP zip package (separate from unzip) handle some ZIP damage from a different angle; 7z from p7zip on Linux is the same engine as 7-Zip on Windows. For most users, 7-Zip and unzip -FF cover the common cases.
When an archive can’t be repaired
Some archives are genuinely beyond recovery. Recognizing these early saves hours of effort.
Severely truncated archives where most of the data is missing. If a download stopped at 100 MB of a 1 GB archive, no tool can reconstruct the missing 900 MB. Partial extraction may recover the files that were complete before the truncation point, but the rest is gone. Re-downloading is the only path to a complete archive.
Encrypted archives with strong AES encryption and a lost password. AES-128 and AES-256, used by modern ZIP, RAR, and 7z, are not breakable by current methods. Tools claiming to “recover” passwords from these archives are running brute-force or dictionary attacks; against a non-trivial password, these are computationally infeasible. The legacy ZipCrypto is different — genuinely weak and often crackable — but few modern archives use it.
Solid 7z archives with damage in the compressed stream. Because solid compression chains files together, damage partway through means everything after the damage point is unrecoverable. There is no partial recovery within a damaged solid block — only files that came before the damage can be salvaged.
Archives where the damage is invisible to repair tools but real to extraction. Occasionally an archive will pass integrity checks (CRCs match, headers parse correctly) but the decompressed data is corrupted because the compression dictionary or encoding metadata was damaged. These cases are rare and unrecoverable; the data was wrong before the integrity check was computed.
Archives that were never valid to begin with. Sometimes a file is saved with a .zip or .rar extension but is actually an HTML error page, an empty file, or a file in a completely different format. Checking the first few bytes (a ZIP starts with PK, a RAR4 with Rar!, a 7z with 7z) takes seconds and rules this out.
For unrecoverable archives, the practical options are: request the file again from the original source, restore from backup, or — if the data exists somewhere in another form — recreate the important parts manually. No tool can recover content that no longer exists in any form.
Related categories
Archive problems often intersect with other file-format problems. If you’ve extracted a damaged archive successfully and discovered that the files inside are themselves corrupted, see the PDF repair guide, the Word repair guide, or the Excel repair guide for format-specific recovery once the file is out of the archive. If the archive arrived as an email attachment and the underlying issue is email transfer corruption, the same diagnostic patterns apply across all attached file types.
Last verified: April 2026