Description
Lziprecover is a data recovery tool and decompressor for files in the lzip
compressed data format (.lz). Lziprecover is able to repair slightly damaged
files (up to one single-byte error per member), produce a correct file by
merging the good parts of two or more damaged copies, reproduce a missing
(zeroed) sector using a reference file, extract data from damaged files,
decompress files, and test integrity of files.
Lziprecover can remove the damaged members from multimember files, for
example multimember tar.lz archives.
Lziprecover provides random access to the data in multimember files; it only
decompresses the members containing the desired data.
Lziprecover facilitates the management of metadata stored as trailing data
in lzip files.
Lziprecover is not a replacement for regular backups, but a last line of
defense for the case where the backups are also damaged.
The lzip file format is designed for data sharing and long-term archiving,
taking into account both data integrity and decoder availability:
* The lzip format provides very safe integrity checking and some data
recovery means. The program lziprecover can repair bit flip errors
(one of the most common forms of data corruption) in lzip files, and
provides data recovery capabilities, including error-checked merging
of damaged copies of a file.
* The lzip format is as simple as possible (but not simpler). The lzip
manual provides the source code of a simple decompressor along with a
detailed explanation of how it works, so that with the only help of the
lzip manual it would be possible for a digital archaeologist to extract
the data from a lzip file long after quantum computers eventually
render LZMA obsolete.
* Additionally the lzip reference implementation is copylefted, which
guarantees that it will remain free forever.
A nice feature of the lzip format is that a corrupt byte is easier to repair
the nearer it is from the beginning of the file. Therefore, with the help of
lziprecover, losing an entire archive just because of a corrupt byte near
the beginning is a thing of the past.
Compression may be good for long-term archiving. For compressible data,
multiple compressed copies may provide redundancy in a more useful form and
may have a better chance of surviving intact than one uncompressed copy
using the same amount of storage space. This is specially true if the format
provides recovery capabilities like those of lziprecover, which is able to
find and combine the good parts of several damaged copies.
Lziprecover is able to recover or decompress files produced by any of the
compressors in the lzip family: lzip, plzip, minilzip/lzlib, clzip, and
pdlzip.
If the cause of file corruption is a damaged medium, the combination
GNU ddrescue + lziprecover is the recommended option for recovering data
from damaged lzip files.
If a file is too damaged for lziprecover to repair it, all the recoverable
data in all members of the file can be extracted in one step with the
command 'lziprecover -cd -i file.lz > file'.
When recovering data, lziprecover takes as arguments the names of the
damaged files and writes zero or more recovered files depending on the
operation selected and whether the recovery succeeded or not. The damaged
files themselves are kept unchanged.
When decompressing or testing file integrity, lziprecover behaves like lzip
or lunzip.
To give you an idea of its possibilities, when merging two copies, each of
them with one damaged area affecting 1 percent of the copy, the probability
of obtaining a correct file is about 98 percent. With three such copies the
probability rises to 99.97 percent. For large files (a few MB) with small
errors (one sector damaged per copy), the probability approaches 100 percent
even with only two copies. (Supposing that the errors are randomly located
inside each copy).
The lziprecover package also includes unzcrash, a program written to test
robustness to decompression of corrupted data, inspired by unzcrash.c from
Julian Seward's bzip2. Type 'make unzcrash' in the lziprecover source
directory to build it. Then try 'unzcrash --help'.
Copyright (C) 2009-2022 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy,
distribute, and modify it.
The file Makefile.in is a data file used by configure to produce the
Makefile. It has the same copyright owner and permissions that configure
itself.
2022-01-21 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.23 released.
* Decompression time has been reduced by 5-12% depending on the file.
* main_common.cc (getnum): Show option name and valid range if error.
* dump_remove.cc (dump_members): Check tty except for --dump=tdata.
* Option '-U, --unzcrash' now takes an argument ('1' or 'B<size>').
* mtester.cc (duplicate_buffer): Use an external buffer.
* repair.cc (debug_decompress): Continue decoding on CRC mismatch.
* unzcrash.cc: Make zcmp_command a string of unlimited size.
Use execvp instead of popen to avoid invoking /bin/sh.
Print byte or block position in messages.
* New file common.h.
* Improve several descriptions in manual, '--help', and man page.
* lziprecover.texi: Change GNU Texinfo category to 'Compression'.
(Reported by Alfred M. Szmidt).
2021-01-02 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.22 released.
* New options '-e, --reproduce', '--lzip-level', '--lzip-name',
'--reference-file', and '-E, --debug-reproduce'.
* Remove '--dump-tdata', '--remove-tdata', and '--strip-tdata'.
* main.cc (main): Report an error if a file name is empty.
Make '-o' behave like '-c', but writing to file.
Make '-c' and '-o' check whether the output is a terminal only once.
Do not open output if input is a terminal.
* main.cc (decompress): With '-i', ignore data errors, keep files.
* range_dec.cc: '-i -D' now decompresses a truncated last member.
* '-i -D' now returns 0 if only ignored errors are found.
* '-i' now considers any block > 36 with header a member, not a gap.
* Replace 'decompressed', 'compressed' with 'out', 'in' in output.
* Fix several compiler warnings. (Reported by Nissanka Gooneratne).
* lzip_index.cc: Improve messages for corruption in last header.
* New debug options '-M, --md5sum' and '-U, --unzcrash'.
* main.cc: Set a valid invocation_name even if argc == 0.
* Document extraction from tar.lz in manual, '--help', and man page.
* New files lunzcrash.cc, md5.h, md5.cc, nrep_stats.cc, reproduce.cc.
* lziprecover.texi: New chapter 'Reproducing one sector'.
New sections 'Merging with a backup' and 'Reproducing a mailbox'.
Document the debug options for experts.
* check.sh: Lzip 1.16 or newer is required to run the tests.
* testsuite: Add 9 new test files.
2019-01-04 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.21 released.
* Rename File_* to Lzip_*.
* New options '--dump', '--remove', and '--strip'. They
replace '--dump-tdata', '--remove-tdata', and '--strip-tdata',
which are now aliases and will be removed in version 1.22.
* lzip.h (Lzip_trailer): New function 'verify_consistency'.
* lzip_index.cc: Lzip_index now detects gaps between members,
some kinds of corrupt trailers and
some fake trailers embedded in trailing data.
* split.cc: Use Lzip_index to split members, gaps and trailing data.
* split.cc: Verify last member before writing anything.
* list.cc (list_files): With '-i', ignore format errors, show gaps.
* range_dec.cc: With '-i', ignore a truncated last member.
* main.cc (main): Check return value of close( infd ).
* Improve and add new diagnostic messages.
* Print '\n' instead of '\r' if !isatty( 1 ) in merge, repair.
* main.cc: Compile on DOS with DJGPP.
* lziprecover.texi: New chapter 'Tarlz'.
* configure: Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'.
* INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
* New test files fox.lz, fox6_sc[1-6].lz.
2018-02-12 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.20 released.
* split.cc: Fix splitting of files > 64 KiB broken since 1.16.
* New options '--dump-tdata', '--remove-tdata', '--strip-tdata', and
'--loose-trailing'.
* Improve corrupt header detection to HD=3.
* main.cc: Show corrupt or truncated header in multimember file.
* Replace 'bits/byte' with inverse compression ratio in output.
* Show progress of decompression at verbosity level 2 (-vv).
* Show progress of decompression only if stderr is a terminal.
* main.cc: Show final diagnostic when testing multiple files.
* decoder.cc (verify_trailer): Show stored sizes also in hex.
Show dictionary size at verbosity level 4 (-vvvv).
2017-04-10 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.19 released.
* merge.cc: Fix members with thousands of scattered errors.
* Option '-a' now works with '-l' and '-D'.
* The output of option '-l, --list' has been simplified.
* main.cc: Continue testing if any input file is a terminal.
* main.cc: Show trailing data in both hexadecimal and ASCII.
* lzip_index.cc: Improve detection of bad dict and trailing data.
* lzip_index.cc: Skip trailing data more efficiently.
* lzip.h: Unify messages for bad magic, trailing data, etc.
* New struct Bad_byte allows delta and flip modes for bad_value.
* unzcrash.cc: New option '-e, --set-byte'.
2016-05-12 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.18 released.
* New option '-a, --trailing-error'.
* merge.cc (open_input_files): Use CRC to test identical files.
* repair.cc (repair_file): Detect gross damage before repairing.
* repair.cc: Repair a damaged dictionary size in the header.
* repair.cc: Try bytes at offsets 7 to 11 first.
* Decompression time has been reduced by 2%.
* main.cc (decompress): Print up to 6 bytes of trailing data
when '-tvvvv' is specified.
* decoder.cc (verify_trailer): Remove test of final code.
* main.cc (main): Delete '--output' file if infd is a terminal.
* main.cc (main): Don't use stdin more than once.
* Use 'close_and_set_permissions' and 'set_signals' in all modes.
* range_dec.cc (list_file): Show dictionary size and size of
trailing data (if any) with '-lv'.
* New options '-A, --alone-to-lz', '-W, --debug-decompress', and
'-X, --show-packets'.
* Change short name of option '--debug-delay' to '-Y'.
* Change short name of option '--debug-repair' to '-Z'.
* unzcrash.cc: New options '-B, --block', '-d, --delta',
'-t, --truncate', and '-z, --zcmp'.
* unzcrash.cc: Read files as large as RAM allows.
* unzcrash.cc: Compare output using zcmp if decompressor returns 0.
* unzcrash.cc: Accept negative position and size.
* lziprecover.texi: New chapter 'Trailing data'.
* configure: Avoid warning on some shells when testing for g++.
* Makefile.in: Detect the existence of install-info.
* check.sh: Don't check error messages.
* check.sh: A POSIX shell is required to run the tests.
2015-05-28 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.17 released.
* New block selection algorithm makes merge up to 100 times faster.
* repair.cc: Repair time has been reduced by 15%.
* New options '-y, --debug-delay' and '-z, --debug-repair'.
* Makefile.in: New targets 'install*-compress'.
* testsuite/unzcrash.cc: Move to top directory.
* lziprecover.texi: New chapter 'File names'.
2014-08-29 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.16 released.
* New class LZ_mtester makes repair up to 10 times faster.
* main.cc (close_and_set_permissions): Behave like 'cp -p'.
* lziprecover.texinfo: Rename to lziprecover.texi.
* Change license to GPL version 2 or later.
2013-09-14 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.15 released.
* repair.cc: Repair multimember files with up to one byte error
per member.
* merge.cc: Merge multimember files.
* main.cc (show_header): Don't show header version.
* lziprecover.texinfo: New chapters 'Repairing files',
'Merging files', and 'Unzcrash'.
2013-05-31 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.14 released.
* New option '-i, --ignore-errors'.
* Option '-l, --list' now accepts more than one file.
* Decompression time has been reduced by 12%.
* split.cc: Use as few digits as possible in file names.
* split.cc: In verbose mode show names of files being created.
* main.cc (show_header): Show header version if verbosity >= 4.
* configure: Options now accept a separate argument.
* Makefile.in: New targets 'install-as-lzip' and 'install-bin'.
* main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2.
2012-02-24 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 1.13 released.
* Lziprecover is now distributed in its own package. Until
version 1.12 it was included in the lzip package. Previous
entries in this file are taken from there.
* lziprecover.cc: Rename to main.cc.
* New files merge.cc, repair.cc, split.cc, and range_dec.cc.
* main.cc: Add decompressor options (-c, -d, -k, -t) so that
an external decompressor is not needed for recovery nor for
"make check".
* New option '-D, --range-decompress', which extracts a range of
bytes decompressing only the members containing the desired data.
* New option '-l, --list', which prints correct total file sizes
even for multimember files.
* merge.cc, repair.cc: Remove output file if recovery fails.
* Change quote characters in messages as advised by GNU Standards.
* split.cc: Use Boyer-Moore algorithm to search for headers.
* configure: Rename 'datadir' to 'datarootdir'.
2011-04-30 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 1.12 released.
* lziprecover.cc: If '-v' is not specified show errors only.
* unzcrash.cc: Use Arg_parser.
* unzcrash.cc: New options '-b, --bits', '-p, --position', and
'-s, --size'.
2010-09-16 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 1.11 released.
* lziprecover.cc: New option '-m, --merge', which tries to produce a
correct file by merging the good parts of two or more damaged copies.
* lziprecover.cc: New option '-R, --repair' for repairing a
1-byte error in single-member files.
* decoder.cc (decode_member): Detect file errors earlier to improve
efficiency of lziprecover's new repair capability.
This change also prevents (harmless) access to uninitialized
memory when decompressing a corrupt file.
* lziprecover.cc: New options '-f, --force' and '-o, --output'.
* lziprecover.cc: New option '-s, --split' to select the until
now only operation of splitting multimember files.
* lziprecover.cc: If no operation is specified, warn the user and do
nothing.
Add new comment