bzip2-1.0.2

This commit is contained in:
Julian Seward 2001-12-30 22:13:13 +01:00
parent 795b859eee
commit 099d844292
31 changed files with 1465 additions and 626 deletions

88
CHANGES
View File

@ -134,7 +134,7 @@ Several minor bugfixes and enhancements:
* Advance the version number to 1.0, so as to counteract the * Advance the version number to 1.0, so as to counteract the
(false-in-this-case) impression some people have that programs (false-in-this-case) impression some people have that programs
with version numbers less than 1.0 are in someway, experimental, with version numbers less than 1.0 are in some way, experimental,
pre-release versions. pre-release versions.
* Create an initial Makefile-libbz2_so to build a shared library. * Create an initial Makefile-libbz2_so to build a shared library.
@ -165,3 +165,89 @@ There are no functionality changes or bug fixes relative to version
1.0.0. This is just a documentation update + a fix for minor Win32 1.0.0. This is just a documentation update + a fix for minor Win32
build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
utterly pointless. Don't bother. utterly pointless. Don't bother.
1.0.2
~~~~~
A bug fix release, addressing various minor issues which have appeared
in the 18 or so months since 1.0.1 was released. Most of the fixes
are to do with file-handling or documentation bugs. To the best of my
knowledge, there have been no data-loss-causing bugs reported in the
compression/decompression engine of 1.0.0 or 1.0.1.
Note that this release does not improve the rather crude build system
for Unix platforms. The general plan here is to autoconfiscate/
libtoolise 1.0.2 soon after release, and release the result as 1.1.0
or perhaps 1.2.0. That, however, is still just a plan at this point.
Here are the changes in 1.0.2. Bug-reporters and/or patch-senders in
parentheses.
* Fix an infinite segfault loop in 1.0.1 when a directory is
encountered in -f (force) mode.
(Trond Eivind Glomsrod, Nicholas Nethercote, Volker Schmidt)
* Avoid double fclose() of output file on certain I/O error paths.
(Solar Designer)
* Don't fail with internal error 1007 when fed a long stream (> 48MB)
of byte 251. Also print useful message suggesting that 1007s may be
caused by bad memory.
(noticed by Juan Pedro Vallejo, fixed by me)
* Fix uninitialised variable silly bug in demo prog dlltest.c.
(Jorj Bauer)
* Remove 512-MB limitation on recovered file size for bzip2recover
on selected platforms which support 64-bit ints. At the moment
all GCC supported platforms, and Win32.
(me, Alson van der Meulen)
* Hard-code header byte values, to give correct operation on platforms
using EBCDIC as their native character set (IBM's OS/390).
(Leland Lucius)
* Copy file access times correctly.
(Marty Leisner)
* Add distclean and check targets to Makefile.
(Michael Carmack)
* Parameterise use of ar and ranlib in Makefile. Also add $(LDFLAGS).
(Rich Ireland, Bo Thorsen)
* Pass -p (create parent dirs as needed) to mkdir during make install.
(Jeremy Fusco)
* Dereference symlinks when copying file permissions in -f mode.
(Volker Schmidt)
* Majorly simplify implementation of uInt64_qrm10.
(Bo Lindbergh)
* Check the input file still exists before deleting the output one,
when aborting in cleanUpAndFail().
(Joerg Prante, Robert Linden, Matthias Krings)
Also a bunch of patches courtesy of Philippe Troin, the Debian maintainer
of bzip2:
* Wrapper scripts (with manpages): bzdiff, bzgrep, bzmore.
* Spelling changes and minor enhancements in bzip2.1.
* Avoid race condition between creating the output file and setting its
interim permissions safely, by using fopen_output_safely().
No changes to bzip2recover since there is no issue with file
permissions there.
* do not print senseless report with -v when compressing an empty
file.
* bzcat -f works on non-bzip2 files.
* do not try to escape shell meta-characters on unix (the shell takes
care of these).
* added --fast and --best aliases for -1 -9 for gzip compatibility.

View File

@ -1,6 +1,6 @@
This program, "bzip2" and associated library "libbzip2", are This program, "bzip2" and associated library "libbzip2", are
copyright (C) 1996-2000 Julian R Seward. All rights reserved. copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -35,5 +35,5 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK. Julian Seward, Cambridge, UK.
jseward@acm.org jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000 bzip2/libbzip2 version 1.0.2 of 30 December 2001

View File

@ -1,9 +1,20 @@
SHELL=/bin/sh SHELL=/bin/sh
# To assist in cross-compiling
CC=gcc CC=gcc
AR=ar
RANLIB=ranlib
LDFLAGS=
# Suitably paranoid flags to avoid bugs in gcc-2.7
BIGFILES=-D_FILE_OFFSET_BITS=64 BIGFILES=-D_FILE_OFFSET_BITS=64
CFLAGS=-Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES) CFLAGS=-Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
# Where you want it installed when you do 'make install'
PREFIX=/usr
OBJS= blocksort.o \ OBJS= blocksort.o \
huffman.o \ huffman.o \
crctable.o \ crctable.o \
@ -15,20 +26,21 @@ OBJS= blocksort.o \
all: libbz2.a bzip2 bzip2recover test all: libbz2.a bzip2 bzip2recover test
bzip2: libbz2.a bzip2.o bzip2: libbz2.a bzip2.o
$(CC) $(CFLAGS) -o bzip2 bzip2.o -L. -lbz2 $(CC) $(CFLAGS) $(LDFLAGS) -o bzip2 bzip2.o -L. -lbz2
bzip2recover: bzip2recover.o bzip2recover: bzip2recover.o
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.o $(CC) $(CFLAGS) $(LDFLAGS) -o bzip2recover bzip2recover.o
libbz2.a: $(OBJS) libbz2.a: $(OBJS)
rm -f libbz2.a rm -f libbz2.a
ar cq libbz2.a $(OBJS) $(AR) cq libbz2.a $(OBJS)
@if ( test -f /usr/bin/ranlib -o -f /bin/ranlib -o \ @if ( test -f $(RANLIB) -o -f /usr/bin/ranlib -o \
-f /usr/ccs/bin/ranlib ) ; then \ -f /bin/ranlib -o -f /usr/ccs/bin/ranlib ) ; then \
echo ranlib libbz2.a ; \ echo $(RANLIB) libbz2.a ; \
ranlib libbz2.a ; \ $(RANLIB) libbz2.a ; \
fi fi
check: test
test: bzip2 test: bzip2
@cat words1 @cat words1
./bzip2 -1 < sample1.ref > sample1.rb2 ./bzip2 -1 < sample1.ref > sample1.rb2
@ -45,14 +57,12 @@ test: bzip2
cmp sample3.tst sample3.ref cmp sample3.tst sample3.ref
@cat words3 @cat words3
PREFIX=/usr
install: bzip2 bzip2recover install: bzip2 bzip2recover
if ( test ! -d $(PREFIX)/bin ) ; then mkdir $(PREFIX)/bin ; fi if ( test ! -d $(PREFIX)/bin ) ; then mkdir -p $(PREFIX)/bin ; fi
if ( test ! -d $(PREFIX)/lib ) ; then mkdir $(PREFIX)/lib ; fi if ( test ! -d $(PREFIX)/lib ) ; then mkdir -p $(PREFIX)/lib ; fi
if ( test ! -d $(PREFIX)/man ) ; then mkdir $(PREFIX)/man ; fi if ( test ! -d $(PREFIX)/man ) ; then mkdir -p $(PREFIX)/man ; fi
if ( test ! -d $(PREFIX)/man/man1 ) ; then mkdir $(PREFIX)/man/man1 ; fi if ( test ! -d $(PREFIX)/man/man1 ) ; then mkdir -p $(PREFIX)/man/man1 ; fi
if ( test ! -d $(PREFIX)/include ) ; then mkdir $(PREFIX)/include ; fi if ( test ! -d $(PREFIX)/include ) ; then mkdir -p $(PREFIX)/include ; fi
cp -f bzip2 $(PREFIX)/bin/bzip2 cp -f bzip2 $(PREFIX)/bin/bzip2
cp -f bzip2 $(PREFIX)/bin/bunzip2 cp -f bzip2 $(PREFIX)/bin/bunzip2
cp -f bzip2 $(PREFIX)/bin/bzcat cp -f bzip2 $(PREFIX)/bin/bzcat
@ -67,7 +77,26 @@ install: bzip2 bzip2recover
chmod a+r $(PREFIX)/include/bzlib.h chmod a+r $(PREFIX)/include/bzlib.h
cp -f libbz2.a $(PREFIX)/lib cp -f libbz2.a $(PREFIX)/lib
chmod a+r $(PREFIX)/lib/libbz2.a chmod a+r $(PREFIX)/lib/libbz2.a
cp -f bzgrep $(PREFIX)/bin/bzgrep
ln $(PREFIX)/bin/bzgrep $(PREFIX)/bin/bzegrep
ln $(PREFIX)/bin/bzgrep $(PREFIX)/bin/bzfgrep
chmod a+x $(PREFIX)/bin/bzgrep
cp -f bzmore $(PREFIX)/bin/bzmore
ln $(PREFIX)/bin/bzmore $(PREFIX)/bin/bzless
chmod a+x $(PREFIX)/bin/bzmore
cp -f bzdiff $(PREFIX)/bin/bzdiff
ln $(PREFIX)/bin/bzdiff $(PREFIX)/bin/bzcmp
chmod a+x $(PREFIX)/bin/bzdiff
cp -f bzgrep.1 bzmore.1 bzdiff.1 $(PREFIX)/man/man1
chmod a+r $(PREFIX)/man/man1/bzgrep.1
chmod a+r $(PREFIX)/man/man1/bzmore.1
chmod a+r $(PREFIX)/man/man1/bzdiff.1
echo ".so man1/bzgrep.1" > $(PREFIX)/man/man1/bzegrep.1
echo ".so man1/bzgrep.1" > $(PREFIX)/man/man1/bzfgrep.1
echo ".so man1/bzmore.1" > $(PREFIX)/man/man1/bzless.1
echo ".so man1/bzdiff.1" > $(PREFIX)/man/man1/bzcmp.1
distclean: clean
clean: clean:
rm -f *.o libbz2.a bzip2 bzip2recover \ rm -f *.o libbz2.a bzip2 bzip2recover \
sample1.rb2 sample2.rb2 sample3.rb2 \ sample1.rb2 sample2.rb2 sample3.rb2 \
@ -93,7 +122,7 @@ bzip2.o: bzip2.c
bzip2recover.o: bzip2recover.c bzip2recover.o: bzip2recover.c
$(CC) $(CFLAGS) -c bzip2recover.c $(CC) $(CFLAGS) -c bzip2recover.c
DISTNAME=bzip2-1.0.1 DISTNAME=bzip2-1.0.2
tarfile: tarfile:
rm -f $(DISTNAME) rm -f $(DISTNAME)
ln -sf . $(DISTNAME) ln -sf . $(DISTNAME)
@ -112,6 +141,7 @@ tarfile:
$(DISTNAME)/Makefile \ $(DISTNAME)/Makefile \
$(DISTNAME)/manual.texi \ $(DISTNAME)/manual.texi \
$(DISTNAME)/manual.ps \ $(DISTNAME)/manual.ps \
$(DISTNAME)/manual.pdf \
$(DISTNAME)/LICENSE \ $(DISTNAME)/LICENSE \
$(DISTNAME)/bzip2.1 \ $(DISTNAME)/bzip2.1 \
$(DISTNAME)/bzip2.1.preformatted \ $(DISTNAME)/bzip2.1.preformatted \
@ -138,4 +168,25 @@ tarfile:
$(DISTNAME)/Y2K_INFO \ $(DISTNAME)/Y2K_INFO \
$(DISTNAME)/unzcrash.c \ $(DISTNAME)/unzcrash.c \
$(DISTNAME)/spewG.c \ $(DISTNAME)/spewG.c \
$(DISTNAME)/mk251.c \
$(DISTNAME)/bzdiff \
$(DISTNAME)/bzdiff.1 \
$(DISTNAME)/bzmore \
$(DISTNAME)/bzmore.1 \
$(DISTNAME)/bzgrep \
$(DISTNAME)/bzgrep.1 \
$(DISTNAME)/Makefile-libbz2_so $(DISTNAME)/Makefile-libbz2_so
gzip -v $(DISTNAME).tar
# For rebuilding the manual from sources on my RedHat 7.2 box
manual: manual.ps manual.pdf manual.html
manual.ps: manual.texi
tex manual.texi
dvips -o manual.ps manual.dvi
manual.pdf: manual.ps
ps2pdf manual.ps
manual.html: manual.texi
texi2html -split_chapter manual.texi

View File

@ -1,8 +1,9 @@
# This Makefile builds a shared version of the library, # This Makefile builds a shared version of the library,
# libbz2.so.1.0.1, with soname libbz2.so.1.0, # libbz2.so.1.0.2, with soname libbz2.so.1.0,
# at least on x86-Linux (RedHat 5.2), # at least on x86-Linux (RedHat 7.2),
# with gcc-2.7.2.3. Please see the README file for some # with gcc-2.96 20000731 (Red Hat Linux 7.1 2.96-98).
# Please see the README file for some
# important info about building the library like this. # important info about building the library like this.
SHELL=/bin/sh SHELL=/bin/sh
@ -19,13 +20,13 @@ OBJS= blocksort.o \
bzlib.o bzlib.o
all: $(OBJS) all: $(OBJS)
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS) $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.2 $(OBJS)
$(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1 $(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.2
rm -f libbz2.so.1.0 rm -f libbz2.so.1.0
ln -s libbz2.so.1.0.1 libbz2.so.1.0 ln -s libbz2.so.1.0.2 libbz2.so.1.0
clean: clean:
rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared rm -f $(OBJS) bzip2.o libbz2.so.1.0.2 libbz2.so.1.0 bzip2-shared
blocksort.o: blocksort.c blocksort.o: blocksort.c
$(CC) $(CFLAGS) -c blocksort.c $(CC) $(CFLAGS) -c blocksort.c

89
README
View File

@ -1,15 +1,15 @@
This is the README for bzip2, a block-sorting file compressor, version This is the README for bzip2, a block-sorting file compressor, version
1.0. This version is fully compatible with the previous public 1.0.2. This version is fully compatible with the previous public
releases, bzip2-0.1pl2, bzip2-0.9.0 and bzip2-0.9.5. releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1.
bzip2-1.0 is distributed under a BSD-style license. For details, bzip2-1.0.2 is distributed under a BSD-style license. For details,
see the file LICENSE. see the file LICENSE.
Complete documentation is available in Postscript form (manual.ps) or Complete documentation is available in Postscript form (manual.ps),
html (manual_toc.html). A plain-text version of the manual page is PDF (manual.pdf, amazingly enough) or html (manual_toc.html). A
available as bzip2.txt. A statement about Y2K issues is now included plain-text version of the manual page is available as bzip2.txt.
in the file Y2K_INFO. A statement about Y2K issues is now included in the file Y2K_INFO.
HOW TO BUILD -- UNIX HOW TO BUILD -- UNIX
@ -33,34 +33,41 @@ not actually execute them.
HOW TO BUILD -- UNIX, shared library libbz2.so. HOW TO BUILD -- UNIX, shared library libbz2.so.
Do 'make -f Makefile-libbz2_so'. This Makefile seems to work for Do 'make -f Makefile-libbz2_so'. This Makefile seems to work for
Linux-ELF (RedHat 5.2 on an x86 box), with gcc. I make no claims Linux-ELF (RedHat 7.2 on an x86 box), with gcc. I make no claims
that it works for any other platform, though I suspect it probably that it works for any other platform, though I suspect it probably
will work for most platforms employing both ELF and gcc. will work for most platforms employing both ELF and gcc.
bzip2-shared, a client of the shared library, is also build, but bzip2-shared, a client of the shared library, is also built, but not
not self-tested. So I suggest you also build using the normal self-tested. So I suggest you also build using the normal Makefile,
Makefile, since that conducts a self-test. since that conducts a self-test. A second reason to prefer the
version statically linked to the library is that, on x86 platforms,
building shared objects makes a valuable register (%ebx) unavailable
to gcc, resulting in a slowdown of 10%-20%, at least for bzip2.
Important note for people upgrading .so's from 0.9.0/0.9.5 to Important note for people upgrading .so's from 0.9.0/0.9.5 to version
version 1.0. All the functions in the library have been renamed, 1.0.X. All the functions in the library have been renamed, from (eg)
from (eg) bzCompress to BZ2_bzCompress, to avoid namespace pollution. bzCompress to BZ2_bzCompress, to avoid namespace pollution.
Unfortunately this means that the libbz2.so created by Unfortunately this means that the libbz2.so created by
Makefile-libbz2_so will not work with any program which used an Makefile-libbz2_so will not work with any program which used an older
older version of the library. Sorry. I do encourage library version of the library. Sorry. I do encourage library clients to
clients to make the effort to upgrade to use version 1.0, since make the effort to upgrade to use version 1.0, since it is both faster
it is both faster and more robust than previous versions. and more robust than previous versions.
HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc. HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc.
It's difficult for me to support compilation on all these platforms. It's difficult for me to support compilation on all these platforms.
My approach is to collect binaries for these platforms, and put them My approach is to collect binaries for these platforms, and put them
on the master web page (http://sourceware.cygnus.com/bzip2). Look on the master web page (http://sources.redhat.com/bzip2). Look there.
there. However (FWIW), bzip2-1.0 is very standard ANSI C and should However (FWIW), bzip2-1.0.X is very standard ANSI C and should compile
compile unmodified with MS Visual C. For Win32, there is one unmodified with MS Visual C. If you have difficulties building, you
important caveat: in bzip2.c, you must set BZ_UNIX to 0 and might want to read README.COMPILATION.PROBLEMS.
BZ_LCCWIN32 to 1 before building. If you have difficulties building,
you might want to read README.COMPILATION.PROBLEMS. At least using MS Visual C++ 6, you can build from the unmodified
sources by issuing, in a command shell:
nmake -f makefile.msc
(you may need to first run the MSVC-provided script VCVARS32.BAT
so as to set up paths to the MSVC tools correctly).
VALIDATION VALIDATION
@ -138,29 +145,37 @@ WHAT'S NEW IN 0.9.5 ?
* Many small improvements in file and flag handling. * Many small improvements in file and flag handling.
* A Y2K statement. * A Y2K statement.
WHAT'S NEW IN 1.0 WHAT'S NEW IN 1.0.0 ?
See the CHANGES file. See the CHANGES file.
WHAT'S NEW IN 1.0.2 ?
See the CHANGES file.
I hope you find bzip2 useful. Feel free to contact me at I hope you find bzip2 useful. Feel free to contact me at
jseward@acm.org jseward@acm.org
if you have any suggestions or queries. Many people mailed me with if you have any suggestions or queries. Many people mailed me with
comments, suggestions and patches after the releases of bzip-0.15, comments, suggestions and patches after the releases of bzip-0.15,
bzip-0.21, bzip2-0.1pl2 and bzip2-0.9.0, and the changes in bzip2 are bzip-0.21, and bzip2 versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
largely a result of this feedback. I thank you for your comments. and the changes in bzip2 are largely a result of this feedback.
I thank you for your comments.
At least for the time being, bzip2's "home" is (or can be reached via) At least for the time being, bzip2's "home" is (or can be reached via)
http://www.muraroa.demon.co.uk. http://sources.redhat.com/bzip2.
Julian Seward Julian Seward
jseward@acm.org jseward@acm.org
Cambridge, UK Cambridge, UK (and what a great town this is!)
18 July 1996 (version 0.15)
25 August 1996 (version 0.21) 18 July 1996 (version 0.15)
7 August 1997 (bzip2, version 0.1) 25 August 1996 (version 0.21)
29 August 1997 (bzip2, version 0.1pl2) 7 August 1997 (bzip2, version 0.1)
23 August 1998 (bzip2, version 0.9.0) 29 August 1997 (bzip2, version 0.1pl2)
8 June 1999 (bzip2, version 0.9.5) 23 August 1998 (bzip2, version 0.9.0)
4 Sept 1999 (bzip2, version 0.9.5d) 8 June 1999 (bzip2, version 0.9.5)
5 May 2000 (bzip2, version 1.0pre8) 4 Sept 1999 (bzip2, version 0.9.5d)
5 May 2000 (bzip2, version 1.0pre8)
30 December 2001 (bzip2, version 1.0.2pre1)

View File

@ -117,11 +117,11 @@ Known problems as of 1.0pre8:
All that said: you might be able to get somewhere All that said: you might be able to get somewhere
by finding the line in Makefile-libbz2_so which says by finding the line in Makefile-libbz2_so which says
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS) $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.2 $(OBJS)
and replacing with and replacing with
($CC) -G -shared -o libbz2.so.1.0.1 -h libbz2.so.1.0 $(OBJS) $(CC) -G -shared -o libbz2.so.1.0.2 -h libbz2.so.1.0 $(OBJS)
If gcc objects to the combination -fpic -fPIC, get rid of If gcc objects to the combination -fpic -fPIC, get rid of
the second one, leaving just "-fpic". the second one, leaving just "-fpic".

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -981,7 +981,14 @@ void mainSort ( UInt32* ptr,
} }
} }
AssertH ( copyStart[ss]-1 == copyEnd[ss], 1007 ); AssertH ( (copyStart[ss]-1 == copyEnd[ss])
||
/* Extremely rare case missing in bzip2-1.0.0 and 1.0.1.
Necessity for this case is demonstrated by compressing
a sequence of approximately 48.5 million of character
251; 1.0.0/1.0.1 will then die here. */
(copyStart[ss] == 0 && copyEnd[ss] == nblock-1),
1007 )
for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK; for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK;

76
bzdiff Normal file
View File

@ -0,0 +1,76 @@
#!/bin/sh
# sh is buggy on RS/6000 AIX 3.2. Replace above line with #!/bin/ksh
# Bzcmp/diff wrapped for bzip2,
# adapted from zdiff by Philippe Troin <phil@fifi.org> for Debian GNU/Linux.
# Bzcmp and bzdiff are used to invoke the cmp or the diff pro-
# gram on compressed files. All options specified are passed
# directly to cmp or diff. If only 1 file is specified, then
# the files compared are file1 and an uncompressed file1.gz.
# If two files are specified, then they are uncompressed (if
# necessary) and fed to cmp or diff. The exit status from cmp
# or diff is preserved.
PATH="/usr/bin:$PATH"; export PATH
prog=`echo $0 | sed 's|.*/||'`
case "$prog" in
*cmp) comp=${CMP-cmp} ;;
*) comp=${DIFF-diff} ;;
esac
OPTIONS=
FILES=
for ARG
do
case "$ARG" in
-*) OPTIONS="$OPTIONS $ARG";;
*) if test -f "$ARG"; then
FILES="$FILES $ARG"
else
echo "${prog}: $ARG not found or not a regular file"
exit 1
fi ;;
esac
done
if test -z "$FILES"; then
echo "Usage: $prog [${comp}_options] file [file]"
exit 1
fi
tmp=`tempfile -d /tmp -p bz` || {
echo 'cannot create a temporary file' >&2
exit 1
}
set $FILES
if test $# -eq 1; then
FILE=`echo "$1" | sed 's/.bz2$//'`
bzip2 -cd "$FILE.bz2" | $comp $OPTIONS - "$FILE"
STAT="$?"
elif test $# -eq 2; then
case "$1" in
*.bz2)
case "$2" in
*.bz2)
F=`echo "$2" | sed 's|.*/||;s|.bz2$||'`
bzip2 -cdfq "$2" > $tmp
bzip2 -cdfq "$1" | $comp $OPTIONS - $tmp
STAT="$?"
/bin/rm -f $tmp;;
*) bzip2 -cdfq "$1" | $comp $OPTIONS - "$2"
STAT="$?";;
esac;;
*) case "$2" in
*.bz2)
bzip2 -cdfq "$2" | $comp $OPTIONS "$1" -
STAT="$?";;
*) $comp $OPTIONS "$1" "$2"
STAT="$?";;
esac;;
esac
exit "$STAT"
else
echo "Usage: $prog [${comp}_options] file [file]"
exit 1
fi

47
bzdiff.1 Normal file
View File

@ -0,0 +1,47 @@
\"Shamelessly copied from zmore.1 by Philippe Troin <phil@fifi.org>
\"for Debian GNU/Linux
.TH BZDIFF 1
.SH NAME
bzcmp, bzdiff \- compare bzip2 compressed files
.SH SYNOPSIS
.B bzcmp
[ cmp_options ] file1
[ file2 ]
.br
.B bzdiff
[ diff_options ] file1
[ file2 ]
.SH DESCRIPTION
.I Bzcmp
and
.I bzdiff
are used to invoke the
.I cmp
or the
.I diff
program on bzip2 compressed files. All options specified are passed
directly to
.I cmp
or
.IR diff "."
If only 1 file is specified, then the files compared are
.I file1
and an uncompressed
.IR file1 ".bz2."
If two files are specified, then they are uncompressed if necessary and fed to
.I cmp
or
.IR diff "."
The exit status from
.I cmp
or
.I diff
is preserved.
.SH "SEE ALSO"
cmp(1), diff(1), bzmore(1), bzless(1), bzgrep(1), bzip2(1)
.SH BUGS
Messages from the
.I cmp
or
.I diff
programs refer to temporary filenames instead of those specified.

71
bzgrep Normal file
View File

@ -0,0 +1,71 @@
#!/bin/sh
# Bzgrep wrapped for bzip2,
# adapted from zgrep by Philippe Troin <phil@fifi.org> for Debian GNU/Linux.
## zgrep notice:
## zgrep -- a wrapper around a grep program that decompresses files as needed
## Adapted from a version sent by Charles Levert <charles@comm.polymtl.ca>
PATH="/usr/bin:$PATH"; export PATH
prog=`echo $0 | sed 's|.*/||'`
case "$prog" in
*egrep) grep=${EGREP-egrep} ;;
*fgrep) grep=${FGREP-fgrep} ;;
*) grep=${GREP-grep} ;;
esac
pat=""
while test $# -ne 0; do
case "$1" in
-e | -f) opt="$opt $1"; shift; pat="$1"
if test "$grep" = grep; then # grep is buggy with -e on SVR4
grep=egrep
fi;;
-A | -B) opt="$opt $1 $2"; shift;;
-*) opt="$opt $1";;
*) if test -z "$pat"; then
pat="$1"
else
break;
fi;;
esac
shift
done
if test -z "$pat"; then
echo "grep through bzip2 files"
echo "usage: $prog [grep_options] pattern [files]"
exit 1
fi
list=0
silent=0
op=`echo "$opt" | sed -e 's/ //g' -e 's/-//g'`
case "$op" in
*l*) list=1
esac
case "$op" in
*h*) silent=1
esac
if test $# -eq 0; then
bzip2 -cdfq | $grep $opt "$pat"
exit $?
fi
res=0
for i do
if test -f "$i"; then :; else if test -f "$i.bz2"; then i="$i.bz2"; fi; fi
if test $list -eq 1; then
bzip2 -cdfq "$i" | $grep $opt "$pat" 2>&1 > /dev/null && echo $i
r=$?
elif test $# -eq 1 -o $silent -eq 1; then
bzip2 -cdfq "$i" | $grep $opt "$pat"
r=$?
else
bzip2 -cdfq "$i" | $grep $opt "$pat" | sed "s|^|${i}:|"
r=$?
fi
test "$r" -ne 0 && res="$r"
done
exit $res

56
bzgrep.1 Normal file
View File

@ -0,0 +1,56 @@
\"Shamelessly copied from zmore.1 by Philippe Troin <phil@fifi.org>
\"for Debian GNU/Linux
.TH BZGREP 1
.SH NAME
bzgrep, bzfgrep, bzegrep \- search possibly bzip2 compressed files for a regular expression
.SH SYNOPSIS
.B bzgrep
[ grep_options ]
.BI [\ -e\ ] " pattern"
.IR filename ".\|.\|."
.br
.B bzegrep
[ egrep_options ]
.BI [\ -e\ ] " pattern"
.IR filename ".\|.\|."
.br
.B bzfgrep
[ fgrep_options ]
.BI [\ -e\ ] " pattern"
.IR filename ".\|.\|."
.SH DESCRIPTION
.IR Bzgrep
is used to invoke the
.I grep
on bzip2-compressed files. All options specified are passed directly to
.I grep.
If no file is specified, then the standard input is decompressed
if necessary and fed to grep.
Otherwise the given files are uncompressed if necessary and fed to
.I grep.
.PP
If
.I bzgrep
is invoked as
.I bzegrep
or
.I bzfgrep
then
.I egrep
or
.I fgrep
is used instead of
.I grep.
If the GREP environment variable is set,
.I bzgrep
uses it as the
.I grep
program to be invoked. For example:
for sh: GREP=fgrep bzgrep string files
for csh: (setenv GREP fgrep; bzgrep string files)
.SH AUTHOR
Charles Levert (charles@comm.polymtl.ca). Adapted to bzip2 by Philippe
Troin <phil@fifi.org> for Debian GNU/Linux.
.SH "SEE ALSO"
grep(1), egrep(1), fgrep(1), bzdiff(1), bzmore(1), bzless(1), bzip2(1)

56
bzip2.1
View File

@ -1,7 +1,7 @@
.PU .PU
.TH bzip2 1 .TH bzip2 1
.SH NAME .SH NAME
bzip2, bunzip2 \- a block-sorting file compressor, v1.0 bzip2, bunzip2 \- a block-sorting file compressor, v1.0.2
.br .br
bzcat \- decompresses files to stdout bzcat \- decompresses files to stdout
.br .br
@ -197,7 +197,7 @@ to decompress.
.TP .TP
.B \-z --compress .B \-z --compress
The complement to \-d: forces compression, regardless of the The complement to \-d: forces compression, regardless of the
invokation name. invocation name.
.TP .TP
.B \-t --test .B \-t --test
Check integrity of the specified file(s), but don't decompress them. Check integrity of the specified file(s), but don't decompress them.
@ -211,6 +211,10 @@ existing output files. Also forces
.I bzip2 .I bzip2
to break hard links to break hard links
to files, which it otherwise wouldn't do. to files, which it otherwise wouldn't do.
bzip2 normally declines to decompress files which don't have the
correct magic header bytes. If forced (-f), however, it will pass
such files through unmodified. This is how GNU gzip behaves.
.TP .TP
.B \-k --keep .B \-k --keep
Keep (don't delete) input files during compression Keep (don't delete) input files during compression
@ -239,9 +243,13 @@ information which is primarily of interest for diagnostic purposes.
.B \-L --license -V --version .B \-L --license -V --version
Display the software version, license terms and conditions. Display the software version, license terms and conditions.
.TP .TP
.B \-1 to \-9 .B \-1 (or \-\-fast) to \-9 (or \-\-best)
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
effect when decompressing. See MEMORY MANAGEMENT below. effect when decompressing. See MEMORY MANAGEMENT below.
The \-\-fast and \-\-best aliases are primarily for GNU gzip
compatibility. In particular, \-\-fast doesn't make things
significantly faster.
And \-\-best merely selects the default behaviour.
.TP .TP
.B \-- .B \--
Treats all subsequent arguments as file names, even if they start Treats all subsequent arguments as file names, even if they start
@ -352,11 +360,11 @@ undamaged.
.I bzip2recover .I bzip2recover
takes a single argument, the name of the damaged file, takes a single argument, the name of the damaged file,
and writes a number of files "rec0001file.bz2", and writes a number of files "rec00001file.bz2",
"rec0002file.bz2", etc, containing the extracted blocks. "rec00002file.bz2", etc, containing the extracted blocks.
The output filenames are designed so that the use of The output filenames are designed so that the use of
wildcards in subsequent processing -- for example, wildcards in subsequent processing -- for example,
"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in "bzip2 -dc rec*file.bz2 > recovered_data" -- processes the files in
the correct order. the correct order.
.I bzip2recover .I bzip2recover
@ -397,27 +405,31 @@ I/O error messages are not as helpful as they could be.
tries hard to detect I/O errors and exit cleanly, but the details of tries hard to detect I/O errors and exit cleanly, but the details of
what the problem is sometimes seem rather misleading. what the problem is sometimes seem rather misleading.
This manual page pertains to version 1.0 of This manual page pertains to version 1.0.2 of
.I bzip2. .I bzip2.
Compressed Compressed data created by this version is entirely forwards and
data created by this version is entirely forwards and backwards backwards compatible with the previous public releases, versions
compatible with the previous public releases, versions 0.1pl2, 0.9.0 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, but with the following
and 0.9.5, exception: 0.9.0 and above can correctly decompress multiple
but with the following exception: 0.9.0 and above can correctly concatenated compressed files. 0.1pl2 cannot do this; it will stop
decompress multiple concatenated compressed files. 0.1pl2 cannot do after decompressing just the first file in the stream.
this; it will stop after decompressing just the first file in the
stream.
.I bzip2recover .I bzip2recover
uses 32-bit integers to represent bit positions in versions prior to this one, 1.0.2, used 32-bit integers to represent
compressed files, so it cannot handle compressed files more than 512 bit positions in compressed files, so it could not handle compressed
megabytes long. This could easily be fixed. files more than 512 megabytes long. Version 1.0.2 and above uses
64-bit ints on some platforms which support them (GNU supported
targets, and Windows). To establish whether or not bzip2recover was
built with such a limitation, run it without arguments. In any event
you can build yourself an unlimited version if you can recompile it
with MaybeUInt64 set to be an unsigned 64-bit integer.
.SH AUTHOR .SH AUTHOR
Julian Seward, jseward@acm.org. Julian Seward, jseward@acm.org.
http://sourceware.cygnus.com/bzip2 http://sources.redhat.com/bzip2
http://www.muraroa.demon.co.uk
The ideas embodied in The ideas embodied in
.I bzip2 .I bzip2
@ -434,6 +446,8 @@ indebted for their help, support and advice. See the manual in the
source distribution for pointers to sources of documentation. Christian source distribution for pointers to sources of documentation. Christian
von Roques encouraged me to look for faster sorting algorithms, so as to von Roques encouraged me to look for faster sorting algorithms, so as to
speed up compression. Bela Lubkin encouraged me to improve the speed up compression. Bela Lubkin encouraged me to improve the
worst-case compression performance. Many people sent patches, helped worst-case compression performance.
The bz* scripts are derived from those of GNU gzip.
Many people sent patches, helped
with portability problems, lent machines, gave advice and were generally with portability problems, lent machines, gave advice and were generally
helpful. helpful.

View File

@ -1,11 +1,9 @@
bzip2(1) bzip2(1) bzip2(1) bzip2(1)
NNAAMMEE NNAAMMEE
bzip2, bunzip2 - a block-sorting file compressor, v1.0 bzip2, bunzip2 - a block-sorting file compressor, v1.0.2
bzcat - decompresses files to stdout bzcat - decompresses files to stdout
bzip2recover - recovers data from damaged bzip2 files bzip2recover - recovers data from damaged bzip2 files
@ -22,20 +20,20 @@ DDEESSCCRRIIPPTTIIOONN
sorting text compression algorithm, and Huffman coding. sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that Compression is generally considerably better than that
achieved by more conventional LZ77/LZ78-based compressors, achieved by more conventional LZ77/LZ78-based compressors,
and approaches the performance of the PPM family of sta- and approaches the performance of the PPM family of sta­
tistical compressors. tistical compressors.
The command-line options are deliberately very similar to The command-line options are deliberately very similar to
those of _G_N_U _g_z_i_p_, but they are not identical. those of _G_N_U _g_z_i_p_, but they are not identical.
_b_z_i_p_2 expects a list of file names to accompany the com- _b_z_i_p_2 expects a list of file names to accompany the com­
mand-line flags. Each file is replaced by a compressed mand-line flags. Each file is replaced by a compressed
version of itself, with the name "original_name.bz2". version of itself, with the name "original_name.bz2".
Each compressed file has the same modification date, per- Each compressed file has the same modification date, per­
missions, and, when possible, ownership as the correspond- missions, and, when possible, ownership as the correspond­
ing original, so that these properties can be correctly ing original, so that these properties can be correctly
restored at decompression time. File name handling is restored at decompression time. File name handling is
naive in the sense that there is no mechanism for preserv- naive in the sense that there is no mechanism for preserv­
ing original file names, permissions, ownerships or dates ing original file names, permissions, ownerships or dates
in filesystems which lack these concepts, or have serious in filesystems which lack these concepts, or have serious
file name length restrictions, such as MS-DOS. file name length restrictions, such as MS-DOS.
@ -58,18 +56,6 @@ DDEESSCCRRIIPPTTIIOONN
filename.bz2 becomes filename filename.bz2 becomes filename
filename.bz becomes filename filename.bz becomes filename
filename.tbz2 becomes filename.tar filename.tbz2 becomes filename.tar
1
bzip2(1) bzip2(1)
filename.tbz becomes filename.tar filename.tbz becomes filename.tar
anyothername becomes anyothername.out anyothername becomes anyothername.out
@ -78,23 +64,23 @@ bzip2(1) bzip2(1)
guess the name of the original file, and uses the original guess the name of the original file, and uses the original
name with _._o_u_t appended. name with _._o_u_t appended.
As with compression, supplying no filenames causes decom- As with compression, supplying no filenames causes decom­
pression from standard input to standard output. pression from standard input to standard output.
_b_u_n_z_i_p_2 will correctly decompress a file which is the con- _b_u_n_z_i_p_2 will correctly decompress a file which is the con­
catenation of two or more compressed files. The result is catenation of two or more compressed files. The result is
the concatenation of the corresponding uncompressed files. the concatenation of the corresponding uncompressed files.
Integrity testing (-t) of concatenated compressed files is Integrity testing (-t) of concatenated compressed files is
also supported. also supported.
You can also compress or decompress files to the standard You can also compress or decompress files to the standard
output by giving the -c flag. Multiple files may be com- output by giving the -c flag. Multiple files may be com­
pressed and decompressed like this. The resulting outputs pressed and decompressed like this. The resulting outputs
are fed sequentially to stdout. Compression of multiple are fed sequentially to stdout. Compression of multiple
files in this manner generates a stream containing multi- files in this manner generates a stream containing multi­
ple compressed file representations. Such a stream can be ple compressed file representations. Such a stream can be
decompressed correctly only by _b_z_i_p_2 version 0.9.0 or decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
later. Earlier versions of _b_z_i_p_2 will stop after decom- later. Earlier versions of _b_z_i_p_2 will stop after decom­
pressing the first file in the stream. pressing the first file in the stream.
_b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to _b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to
@ -115,7 +101,7 @@ bzip2(1) bzip2(1)
As a self-check for your protection, _b_z_i_p_2 uses 32-bit As a self-check for your protection, _b_z_i_p_2 uses 32-bit
CRCs to make sure that the decompressed version of a file CRCs to make sure that the decompressed version of a file
is identical to the original. This guards against corrup- is identical to the original. This guards against corrup­
tion of the compressed data, and against undetected bugs tion of the compressed data, and against undetected bugs
in _b_z_i_p_2 (hopefully very unlikely). The chances of data in _b_z_i_p_2 (hopefully very unlikely). The chances of data
corruption going undetected is microscopic, about one corruption going undetected is microscopic, about one
@ -125,17 +111,6 @@ bzip2(1) bzip2(1)
you recover the original uncompressed data. You can use you recover the original uncompressed data. You can use
_b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files. _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
2
bzip2(1) bzip2(1)
Return values: 0 for a normal exit, 1 for environmental Return values: 0 for a normal exit, 1 for environmental
problems (file not found, invalid flags, I/O errors, &c), problems (file not found, invalid flags, I/O errors, &c),
2 to indicate a corrupt compressed file, 3 for an internal 2 to indicate a corrupt compressed file, 3 for an internal
@ -154,8 +129,8 @@ OOPPTTIIOONNSS
and forces _b_z_i_p_2 to decompress. and forces _b_z_i_p_2 to decompress.
--zz ----ccoommpprreessss --zz ----ccoommpprreessss
The complement to -d: forces compression, regard- The complement to -d: forces compression,
less of the invokation name. regardless of the invocation name.
--tt ----tteesstt --tt ----tteesstt
Check integrity of the specified file(s), but don't Check integrity of the specified file(s), but don't
@ -168,6 +143,11 @@ OOPPTTIIOONNSS
forces _b_z_i_p_2 to break hard links to files, which it forces _b_z_i_p_2 to break hard links to files, which it
otherwise wouldn't do. otherwise wouldn't do.
bzip2 normally declines to decompress files which
don't have the correct magic header bytes. If
forced (-f), however, it will pass such files
through unmodified. This is how GNU gzip behaves.
--kk ----kkeeeepp --kk ----kkeeeepp
Keep (don't delete) input files during compression Keep (don't delete) input files during compression
or decompression. or decompression.
@ -190,23 +170,11 @@ OOPPTTIIOONNSS
--qq ----qquuiieett --qq ----qquuiieett
Suppress non-essential warning messages. Messages Suppress non-essential warning messages. Messages
pertaining to I/O errors and other critical events pertaining to I/O errors and other critical events
3
bzip2(1) bzip2(1)
will not be suppressed. will not be suppressed.
--vv ----vveerrbboossee --vv ----vveerrbboossee
Verbose mode -- show the compression ratio for each Verbose mode -- show the compression ratio for each
file processed. Further -v's increase the ver- file processed. Further -v's increase the ver­
bosity level, spewing out lots of information which bosity level, spewing out lots of information which
is primarily of interest for diagnostic purposes. is primarily of interest for diagnostic purposes.
@ -214,20 +182,24 @@ bzip2(1) bzip2(1)
Display the software version, license terms and Display the software version, license terms and
conditions. conditions.
--11 ttoo --99 --11 ((oorr ----ffaasstt)) ttoo --99 ((oorr ----bbeesstt))
Set the block size to 100 k, 200 k .. 900 k when Set the block size to 100 k, 200 k .. 900 k when
compressing. Has no effect when decompressing. compressing. Has no effect when decompressing.
See MEMORY MANAGEMENT below. See MEMORY MANAGEMENT below. The --fast and --best
aliases are primarily for GNU gzip compatibility.
In particular, --fast doesn't make things signifi­
cantly faster. And --best merely selects the
default behaviour.
---- Treats all subsequent arguments as file names, even ---- Treats all subsequent arguments as file names, even
if they start with a dash. This is so you can han- if they start with a dash. This is so you can han­
dle files with names beginning with a dash, for dle files with names beginning with a dash, for
example: bzip2 -- -myfilename. example: bzip2 -- -myfilename.
----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt ----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt
These flags are redundant in versions 0.9.5 and These flags are redundant in versions 0.9.5 and
above. They provided some coarse control over the above. They provided some coarse control over the
behaviour of the sorting algorithm in earlier ver- behaviour of the sorting algorithm in earlier ver­
sions, which was sometimes useful. 0.9.5 and above sions, which was sometimes useful. 0.9.5 and above
have an improved algorithm which renders these have an improved algorithm which renders these
flags irrelevant. flags irrelevant.
@ -238,7 +210,7 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
affects both the compression ratio achieved, and the affects both the compression ratio achieved, and the
amount of memory needed for compression and decompression. amount of memory needed for compression and decompression.
The flags -1 through -9 specify the block size to be The flags -1 through -9 specify the block size to be
100,000 bytes through 900,000 bytes (the default) respec- 100,000 bytes through 900,000 bytes (the default) respec­
tively. At decompression time, the block size used for tively. At decompression time, the block size used for
compression is read from the header of the compressed compression is read from the header of the compressed
file, and _b_u_n_z_i_p_2 then allocates itself just enough memory file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
@ -256,18 +228,6 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
Larger block sizes give rapidly diminishing marginal Larger block sizes give rapidly diminishing marginal
returns. Most of the compression comes from the first two returns. Most of the compression comes from the first two
4
bzip2(1) bzip2(1)
or three hundred k of block size, a fact worth bearing in or three hundred k of block size, a fact worth bearing in
mind when using _b_z_i_p_2 on small machines. It is also mind when using _b_z_i_p_2 on small machines. It is also
important to appreciate that the decompression memory important to appreciate that the decompression memory
@ -278,13 +238,13 @@ bzip2(1) bzip2(1)
_b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
support decompression of any file on a 4 megabyte machine, support decompression of any file on a 4 megabyte machine,
_b_u_n_z_i_p_2 has an option to decompress using approximately _b_u_n_z_i_p_2 has an option to decompress using approximately
half this amount of memory, about 2300 kbytes. Decompres- half this amount of memory, about 2300 kbytes. Decompres­
sion speed is also halved, so you should use this option sion speed is also halved, so you should use this option
only where necessary. The relevant flag is -s. only where necessary. The relevant flag is -s.
In general, try and use the largest block size memory con- In general, try and use the largest block size memory con­
straints allow, since that maximises the compression straints allow, since that maximises the compression
achieved. Compression and decompression speed are virtu- achieved. Compression and decompression speed are virtu­
ally unaffected by block size. ally unaffected by block size.
Another significant point applies to files which fit in a Another significant point applies to files which fit in a
@ -300,11 +260,11 @@ bzip2(1) bzip2(1)
Here is a table which summarises the maximum memory usage Here is a table which summarises the maximum memory usage
for different block sizes. Also recorded is the total for different block sizes. Also recorded is the total
compressed size for 14 files of the Calgary Text Compres- compressed size for 14 files of the Calgary Text Compres­
sion Corpus totalling 3,141,622 bytes. This column gives sion Corpus totalling 3,141,622 bytes. This column gives
some feel for how compression varies with block size. some feel for how compression varies with block size.
These figures tend to understate the advantage of larger These figures tend to understate the advantage of larger
block sizes for larger files, since the Corpus is domi- block sizes for larger files, since the Corpus is domi­
nated by smaller files. nated by smaller files.
Compress Decompress Decompress Corpus Compress Decompress Decompress Corpus
@ -321,22 +281,9 @@ bzip2(1) bzip2(1)
-9 7600k 3700k 2350k 828642 -9 7600k 3700k 2350k 828642
5
bzip2(1) bzip2(1)
RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
_b_z_i_p_2 compresses files in blocks, usually 900kbytes long. _b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
Each block is handled independently. If a media or trans- Each block is handled independently. If a media or trans­
mission error causes a multi-block .bz2 file to become mission error causes a multi-block .bz2 file to become
damaged, it may be possible to recover data from the damaged, it may be possible to recover data from the
undamaged blocks in the file. undamaged blocks in the file.
@ -353,19 +300,19 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F
the integrity of the resulting files, and decompress those the integrity of the resulting files, and decompress those
which are undamaged. which are undamaged.
_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam­
aged file, and writes a number of files "rec0001file.bz2", aged file, and writes a number of files
"rec0002file.bz2", etc, containing the extracted blocks. "rec00001file.bz2", "rec00002file.bz2", etc, containing
The output filenames are designed so that the use of the extracted blocks. The output filenames are
wildcards in subsequent processing -- for example, "bzip2 designed so that the use of wildcards in subsequent pro­
-dc rec*file.bz2 > recovered_data" -- lists the files in cessing -- for example, "bzip2 -dc rec*file.bz2 > recov­
the correct order. ered_data" -- processes the files in the correct order.
_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
files, as these will contain many blocks. It is clearly files, as these will contain many blocks. It is clearly
futile to use it on damaged single-block files, since a futile to use it on damaged single-block files, since a
damaged block cannot be recovered. If you wish to min- damaged block cannot be recovered. If you wish to min­
imise any potential data loss through media or transmis- imise any potential data loss through media or transmis­
sion errors, you might consider compressing with a smaller sion errors, you might consider compressing with a smaller
block size. block size.
@ -379,31 +326,19 @@ PPEERRFFOORRMMAANNCCEE NNOOTTEESS
better than previous versions in this respect. The ratio better than previous versions in this respect. The ratio
between worst-case and average-case compression time is in between worst-case and average-case compression time is in
the region of 10:1. For previous versions, this figure the region of 10:1. For previous versions, this figure
was more like 100:1. You can use the -vvvv option to mon- was more like 100:1. You can use the -vvvv option to mon­
itor progress in great detail, if you want. itor progress in great detail, if you want.
Decompression speed is unaffected by these phenomena. Decompression speed is unaffected by these phenomena.
_b_z_i_p_2 usually allocates several megabytes of memory to _b_z_i_p_2 usually allocates several megabytes of memory to
operate in, and then charges all over it in a fairly ran- operate in, and then charges all over it in a fairly ran­
dom fashion. This means that performance, both for com- dom fashion. This means that performance, both for com­
pressing and decompressing, is largely determined by the pressing and decompressing, is largely determined by the
6
bzip2(1) bzip2(1)
speed at which your machine can service cache misses. speed at which your machine can service cache misses.
Because of this, small changes to the code to reduce the Because of this, small changes to the code to reduce the
miss rate have been observed to give disproportionately miss rate have been observed to give disproportionately
large performance improvements. I imagine _b_z_i_p_2 will per- large performance improvements. I imagine _b_z_i_p_2 will per­
form best on machines with very large caches. form best on machines with very large caches.
@ -413,50 +348,51 @@ CCAAVVEEAATTSS
but the details of what the problem is sometimes seem but the details of what the problem is sometimes seem
rather misleading. rather misleading.
This manual page pertains to version 1.0 of _b_z_i_p_2_. Com- This manual page pertains to version 1.0.2 of _b_z_i_p_2_. Com­
pressed data created by this version is entirely forwards pressed data created by this version is entirely forwards
and backwards compatible with the previous public and backwards compatible with the previous public
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
following exception: 0.9.0 and above can correctly decom- but with the following exception: 0.9.0 and above can cor­
press multiple concatenated compressed files. 0.1pl2 can- rectly decompress multiple concatenated compressed files.
not do this; it will stop after decompressing just the 0.1pl2 cannot do this; it will stop after decompressing
first file in the stream. just the first file in the stream.
_b_z_i_p_2_r_e_c_o_v_e_r versions prior to this one, 1.0.2, used
32-bit integers to represent bit positions in compressed
files, so it could not handle compressed files more than
512 megabytes long. Version 1.0.2 and above uses 64-bit
ints on some platforms which support them (GNU supported
targets, and Windows). To establish whether or not
bzip2recover was built with such a limitation, run it
without arguments. In any event you can build yourself an
unlimited version if you can recompile it with MaybeUInt64
set to be an unsigned 64-bit integer.
_b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
tions in compressed files, so it cannot handle compressed
files more than 512 megabytes long. This could easily be
fixed.
AAUUTTHHOORR AAUUTTHHOORR
Julian Seward, jseward@acm.org. Julian Seward, jseward@acm.org.
http://sourceware.cygnus.com/bzip2 http://sources.redhat.com/bzip2
http://www.muraroa.demon.co.uk
The ideas embodied in _b_z_i_p_2 are due to (at least) the fol- The ideas embodied in _b_z_i_p_2 are due to (at least) the fol­
lowing people: Michael Burrows and David Wheeler (for the lowing people: Michael Burrows and David Wheeler (for the
block sorting transformation), David Wheeler (again, for block sorting transformation), David Wheeler (again, for
the Huffman coder), Peter Fenwick (for the structured cod- the Huffman coder), Peter Fenwick (for the structured cod­
ing model in the original _b_z_i_p_, and many refinements), and ing model in the original _b_z_i_p_, and many refinements), and
Alistair Moffat, Radford Neal and Ian Witten (for the Alistair Moffat, Radford Neal and Ian Witten (for the
arithmetic coder in the original _b_z_i_p_)_. I am much arithmetic coder in the original _b_z_i_p_)_. I am much
indebted for their help, support and advice. See the man- indebted for their help, support and advice. See the man­
ual in the source distribution for pointers to sources of ual in the source distribution for pointers to sources of
documentation. Christian von Roques encouraged me to look documentation. Christian von Roques encouraged me to look
for faster sorting algorithms, so as to speed up compres- for faster sorting algorithms, so as to speed up compres­
sion. Bela Lubkin encouraged me to improve the worst-case sion. Bela Lubkin encouraged me to improve the worst-case
compression performance. Many people sent patches, helped compression performance. The bz* scripts are derived from
with portability problems, lent machines, gave advice and those of GNU gzip. Many people sent patches, helped with
were generally helpful. portability problems, lent machines, gave advice and were
generally helpful.
bzip2(1)
7

535
bzip2.c
View File

@ -7,7 +7,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -113,13 +113,16 @@
/*-- /*--
Generic 32-bit Unix. Generic 32-bit Unix.
Also works on 64-bit Unix boxes. Also works on 64-bit Unix boxes.
This is the default.
--*/ --*/
#define BZ_UNIX 1 #define BZ_UNIX 1
/*-- /*--
Win32, as seen by Jacob Navia's excellent Win32, as seen by Jacob Navia's excellent
port of (Chris Fraser & David Hanson)'s excellent port of (Chris Fraser & David Hanson)'s excellent
lcc compiler. lcc compiler. Or with MS Visual C.
This is selected automatically if compiled by a compiler which
defines _WIN32, not including the Cygwin GCC.
--*/ --*/
#define BZ_LCCWIN32 0 #define BZ_LCCWIN32 0
@ -156,6 +159,7 @@
--*/ --*/
#if BZ_UNIX #if BZ_UNIX
# include <fcntl.h>
# include <sys/types.h> # include <sys/types.h>
# include <utime.h> # include <utime.h>
# include <unistd.h> # include <unistd.h>
@ -164,8 +168,9 @@
# define PATH_SEP '/' # define PATH_SEP '/'
# define MY_LSTAT lstat # define MY_LSTAT lstat
# define MY_S_IFREG S_ISREG
# define MY_STAT stat # define MY_STAT stat
# define MY_S_ISREG S_ISREG
# define MY_S_ISDIR S_ISDIR
# define APPEND_FILESPEC(root, name) \ # define APPEND_FILESPEC(root, name) \
root=snocString((root), (name)) root=snocString((root), (name))
@ -180,19 +185,23 @@
# else # else
# define NORETURN /**/ # define NORETURN /**/
# endif # endif
# ifdef __DJGPP__ # ifdef __DJGPP__
# include <io.h> # include <io.h>
# include <fcntl.h> # include <fcntl.h>
# undef MY_LSTAT # undef MY_LSTAT
# undef MY_STAT
# define MY_LSTAT stat # define MY_LSTAT stat
# define MY_STAT stat
# undef SET_BINARY_MODE # undef SET_BINARY_MODE
# define SET_BINARY_MODE(fd) \ # define SET_BINARY_MODE(fd) \
do { \ do { \
int retVal = setmode ( fileno ( fd ), \ int retVal = setmode ( fileno ( fd ), \
O_BINARY ); \ O_BINARY ); \
ERROR_IF_MINUS_ONE ( retVal ); \ ERROR_IF_MINUS_ONE ( retVal ); \
} while ( 0 ) } while ( 0 )
# endif # endif
# ifdef __CYGWIN__ # ifdef __CYGWIN__
# include <io.h> # include <io.h>
# include <fcntl.h> # include <fcntl.h>
@ -200,11 +209,11 @@
# define SET_BINARY_MODE(fd) \ # define SET_BINARY_MODE(fd) \
do { \ do { \
int retVal = setmode ( fileno ( fd ), \ int retVal = setmode ( fileno ( fd ), \
O_BINARY ); \ O_BINARY ); \
ERROR_IF_MINUS_ONE ( retVal ); \ ERROR_IF_MINUS_ONE ( retVal ); \
} while ( 0 ) } while ( 0 )
# endif # endif
#endif #endif /* BZ_UNIX */
@ -217,46 +226,23 @@
# define PATH_SEP '\\' # define PATH_SEP '\\'
# define MY_LSTAT _stat # define MY_LSTAT _stat
# define MY_STAT _stat # define MY_STAT _stat
# define MY_S_IFREG(x) ((x) & _S_IFREG) # define MY_S_ISREG(x) ((x) & _S_IFREG)
# define MY_S_ISDIR(x) ((x) & _S_IFDIR)
# define APPEND_FLAG(root, name) \ # define APPEND_FLAG(root, name) \
root=snocString((root), (name)) root=snocString((root), (name))
# if 0
/*-- lcc-win32 seems to expand wildcards itself --*/
# define APPEND_FILESPEC(root, spec) \
do { \
if ((spec)[0] == '-') { \
root = snocString((root), (spec)); \
} else { \
struct _finddata_t c_file; \
long hFile; \
hFile = _findfirst((spec), &c_file); \
if ( hFile == -1L ) { \
root = snocString ((root), (spec)); \
} else { \
int anInt = 0; \
while ( anInt == 0 ) { \
root = snocString((root), \
&c_file.name[0]); \
anInt = _findnext(hFile, &c_file); \
} \
} \
} \
} while ( 0 )
# else
# define APPEND_FILESPEC(root, name) \ # define APPEND_FILESPEC(root, name) \
root = snocString ((root), (name)) root = snocString ((root), (name))
# endif
# define SET_BINARY_MODE(fd) \ # define SET_BINARY_MODE(fd) \
do { \ do { \
int retVal = setmode ( fileno ( fd ), \ int retVal = setmode ( fileno ( fd ), \
O_BINARY ); \ O_BINARY ); \
ERROR_IF_MINUS_ONE ( retVal ); \ ERROR_IF_MINUS_ONE ( retVal ); \
} while ( 0 ) } while ( 0 )
#endif #endif /* BZ_LCCWIN32 */
/*---------------------------------------------*/ /*---------------------------------------------*/
@ -338,6 +324,7 @@ typedef
struct { UChar b[8]; } struct { UChar b[8]; }
UInt64; UInt64;
static static
void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )
{ {
@ -351,6 +338,7 @@ void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )
n->b[0] = (UChar) (lo32 & 0xFF); n->b[0] = (UChar) (lo32 & 0xFF);
} }
static static
double uInt64_to_double ( UInt64* n ) double uInt64_to_double ( UInt64* n )
{ {
@ -364,77 +352,6 @@ double uInt64_to_double ( UInt64* n )
return sum; return sum;
} }
static
void uInt64_add ( UInt64* src, UInt64* dst )
{
Int32 i;
Int32 carry = 0;
for (i = 0; i < 8; i++) {
carry += ( ((Int32)src->b[i]) + ((Int32)dst->b[i]) );
dst->b[i] = (UChar)(carry & 0xFF);
carry >>= 8;
}
}
static
void uInt64_sub ( UInt64* src, UInt64* dst )
{
Int32 t, i;
Int32 borrow = 0;
for (i = 0; i < 8; i++) {
t = ((Int32)dst->b[i]) - ((Int32)src->b[i]) - borrow;
if (t < 0) {
dst->b[i] = (UChar)(t + 256);
borrow = 1;
} else {
dst->b[i] = (UChar)t;
borrow = 0;
}
}
}
static
void uInt64_mul ( UInt64* a, UInt64* b, UInt64* r_hi, UInt64* r_lo )
{
UChar sum[16];
Int32 ia, ib, carry;
for (ia = 0; ia < 16; ia++) sum[ia] = 0;
for (ia = 0; ia < 8; ia++) {
carry = 0;
for (ib = 0; ib < 8; ib++) {
carry += ( ((Int32)sum[ia+ib])
+ ((Int32)a->b[ia]) * ((Int32)b->b[ib]) );
sum[ia+ib] = (UChar)(carry & 0xFF);
carry >>= 8;
}
sum[ia+8] = (UChar)(carry & 0xFF);
if ((carry >>= 8) != 0) panic ( "uInt64_mul" );
}
for (ia = 0; ia < 8; ia++) r_hi->b[ia] = sum[ia+8];
for (ia = 0; ia < 8; ia++) r_lo->b[ia] = sum[ia];
}
static
void uInt64_shr1 ( UInt64* n )
{
Int32 i;
for (i = 0; i < 8; i++) {
n->b[i] >>= 1;
if (i < 7 && (n->b[i+1] & 1)) n->b[i] |= 0x80;
}
}
static
void uInt64_shl1 ( UInt64* n )
{
Int32 i;
for (i = 7; i >= 0; i--) {
n->b[i] <<= 1;
if (i > 0 && (n->b[i-1] & 0x80)) n->b[i]++;
}
}
static static
Bool uInt64_isZero ( UInt64* n ) Bool uInt64_isZero ( UInt64* n )
@ -445,49 +362,23 @@ Bool uInt64_isZero ( UInt64* n )
return 1; return 1;
} }
static
/* Divide *n by 10, and return the remainder. */
static
Int32 uInt64_qrm10 ( UInt64* n ) Int32 uInt64_qrm10 ( UInt64* n )
{ {
/* Divide *n by 10, and return the remainder. Long division UInt32 rem, tmp;
is difficult, so we cheat and instead multiply by
0xCCCC CCCC CCCC CCCD, which is 0.8 (viz, 0.1 << 3).
*/
Int32 i; Int32 i;
UInt64 tmp1, tmp2, n_orig, zero_point_eight; rem = 0;
for (i = 7; i >= 0; i--) {
zero_point_eight.b[1] = zero_point_eight.b[2] = tmp = rem * 256 + n->b[i];
zero_point_eight.b[3] = zero_point_eight.b[4] = n->b[i] = tmp / 10;
zero_point_eight.b[5] = zero_point_eight.b[6] = rem = tmp % 10;
zero_point_eight.b[7] = 0xCC; }
zero_point_eight.b[0] = 0xCD; return rem;
n_orig = *n;
/* divide n by 10,
by multiplying by 0.8 and then shifting right 3 times */
uInt64_mul ( n, &zero_point_eight, &tmp1, &tmp2 );
uInt64_shr1(&tmp1); uInt64_shr1(&tmp1); uInt64_shr1(&tmp1);
*n = tmp1;
/* tmp1 = 8*n, tmp2 = 2*n */
uInt64_shl1(&tmp1); uInt64_shl1(&tmp1); uInt64_shl1(&tmp1);
tmp2 = *n; uInt64_shl1(&tmp2);
/* tmp1 = 10*n */
uInt64_add ( &tmp2, &tmp1 );
/* n_orig = n_orig - 10*n */
uInt64_sub ( &tmp1, &n_orig );
/* n_orig should now hold quotient, in range 0 .. 9 */
for (i = 7; i >= 1; i--)
if (n_orig.b[i] != 0) panic ( "uInt64_qrm10(1)" );
if (n_orig.b[0] > 9)
panic ( "uInt64_qrm10(2)" );
return (int)n_orig.b[0];
} }
/* ... and the Whole Entire Point of all this UInt64 stuff is /* ... and the Whole Entire Point of all this UInt64 stuff is
so that we can supply the following function. so that we can supply the following function.
*/ */
@ -504,7 +395,8 @@ void uInt64_toAscii ( char* outbuf, UInt64* n )
nBuf++; nBuf++;
} while (!uInt64_isZero(&n_copy)); } while (!uInt64_isZero(&n_copy));
outbuf[nBuf] = 0; outbuf[nBuf] = 0;
for (i = 0; i < nBuf; i++) outbuf[i] = buf[nBuf-i-1]; for (i = 0; i < nBuf; i++)
outbuf[i] = buf[nBuf-i-1];
} }
@ -566,35 +458,38 @@ void compressStream ( FILE *stream, FILE *zStream )
if (ret == EOF) goto errhandler_io; if (ret == EOF) goto errhandler_io;
if (zStream != stdout) { if (zStream != stdout) {
ret = fclose ( zStream ); ret = fclose ( zStream );
outputHandleJustInCase = NULL;
if (ret == EOF) goto errhandler_io; if (ret == EOF) goto errhandler_io;
} }
outputHandleJustInCase = NULL;
if (ferror(stream)) goto errhandler_io; if (ferror(stream)) goto errhandler_io;
ret = fclose ( stream ); ret = fclose ( stream );
if (ret == EOF) goto errhandler_io; if (ret == EOF) goto errhandler_io;
if (nbytes_in_lo32 == 0 && nbytes_in_hi32 == 0)
nbytes_in_lo32 = 1;
if (verbosity >= 1) { if (verbosity >= 1) {
Char buf_nin[32], buf_nout[32]; if (nbytes_in_lo32 == 0 && nbytes_in_hi32 == 0) {
UInt64 nbytes_in, nbytes_out; fprintf ( stderr, " no data compressed.\n");
double nbytes_in_d, nbytes_out_d; } else {
uInt64_from_UInt32s ( &nbytes_in, Char buf_nin[32], buf_nout[32];
nbytes_in_lo32, nbytes_in_hi32 ); UInt64 nbytes_in, nbytes_out;
uInt64_from_UInt32s ( &nbytes_out, double nbytes_in_d, nbytes_out_d;
nbytes_out_lo32, nbytes_out_hi32 ); uInt64_from_UInt32s ( &nbytes_in,
nbytes_in_d = uInt64_to_double ( &nbytes_in ); nbytes_in_lo32, nbytes_in_hi32 );
nbytes_out_d = uInt64_to_double ( &nbytes_out ); uInt64_from_UInt32s ( &nbytes_out,
uInt64_toAscii ( buf_nin, &nbytes_in ); nbytes_out_lo32, nbytes_out_hi32 );
uInt64_toAscii ( buf_nout, &nbytes_out ); nbytes_in_d = uInt64_to_double ( &nbytes_in );
fprintf ( stderr, "%6.3f:1, %6.3f bits/byte, " nbytes_out_d = uInt64_to_double ( &nbytes_out );
"%5.2f%% saved, %s in, %s out.\n", uInt64_toAscii ( buf_nin, &nbytes_in );
nbytes_in_d / nbytes_out_d, uInt64_toAscii ( buf_nout, &nbytes_out );
(8.0 * nbytes_out_d) / nbytes_in_d, fprintf ( stderr, "%6.3f:1, %6.3f bits/byte, "
100.0 * (1.0 - nbytes_out_d / nbytes_in_d), "%5.2f%% saved, %s in, %s out.\n",
buf_nin, nbytes_in_d / nbytes_out_d,
buf_nout (8.0 * nbytes_out_d) / nbytes_in_d,
); 100.0 * (1.0 - nbytes_out_d / nbytes_in_d),
buf_nin,
buf_nout
);
}
} }
return; return;
@ -652,7 +547,7 @@ Bool uncompressStream ( FILE *zStream, FILE *stream )
while (bzerr == BZ_OK) { while (bzerr == BZ_OK) {
nread = BZ2_bzRead ( &bzerr, bzf, obuf, 5000 ); nread = BZ2_bzRead ( &bzerr, bzf, obuf, 5000 );
if (bzerr == BZ_DATA_ERROR_MAGIC) goto errhandler; if (bzerr == BZ_DATA_ERROR_MAGIC) goto trycat;
if ((bzerr == BZ_OK || bzerr == BZ_STREAM_END) && nread > 0) if ((bzerr == BZ_OK || bzerr == BZ_STREAM_END) && nread > 0)
fwrite ( obuf, sizeof(UChar), nread, stream ); fwrite ( obuf, sizeof(UChar), nread, stream );
if (ferror(stream)) goto errhandler_io; if (ferror(stream)) goto errhandler_io;
@ -668,9 +563,9 @@ Bool uncompressStream ( FILE *zStream, FILE *stream )
if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" ); if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" );
if (nUnused == 0 && myfeof(zStream)) break; if (nUnused == 0 && myfeof(zStream)) break;
} }
closeok:
if (ferror(zStream)) goto errhandler_io; if (ferror(zStream)) goto errhandler_io;
ret = fclose ( zStream ); ret = fclose ( zStream );
if (ret == EOF) goto errhandler_io; if (ret == EOF) goto errhandler_io;
@ -680,11 +575,26 @@ Bool uncompressStream ( FILE *zStream, FILE *stream )
if (ret != 0) goto errhandler_io; if (ret != 0) goto errhandler_io;
if (stream != stdout) { if (stream != stdout) {
ret = fclose ( stream ); ret = fclose ( stream );
outputHandleJustInCase = NULL;
if (ret == EOF) goto errhandler_io; if (ret == EOF) goto errhandler_io;
} }
outputHandleJustInCase = NULL;
if (verbosity >= 2) fprintf ( stderr, "\n " ); if (verbosity >= 2) fprintf ( stderr, "\n " );
return True; return True;
trycat:
if (forceOverwrite) {
rewind(zStream);
while (True) {
if (myfeof(zStream)) break;
nread = fread ( obuf, sizeof(UChar), 5000, zStream );
if (ferror(zStream)) goto errhandler_io;
if (nread > 0) fwrite ( obuf, sizeof(UChar), nread, stream );
if (ferror(stream)) goto errhandler_io;
}
goto closeok;
}
errhandler: errhandler:
BZ2_bzReadClose ( &bzerr_dummy, bzf ); BZ2_bzReadClose ( &bzerr_dummy, bzf );
switch (bzerr) { switch (bzerr) {
@ -832,7 +742,7 @@ void cadvise ( void )
stderr, stderr,
"\nIt is possible that the compressed file(s) have become corrupted.\n" "\nIt is possible that the compressed file(s) have become corrupted.\n"
"You can use the -tvv option to test integrity of such files.\n\n" "You can use the -tvv option to test integrity of such files.\n\n"
"You can use the `bzip2recover' program to *attempt* to recover\n" "You can use the `bzip2recover' program to attempt to recover\n"
"data from undamaged sections of corrupted files.\n\n" "data from undamaged sections of corrupted files.\n\n"
); );
} }
@ -855,28 +765,55 @@ void showFileNames ( void )
static static
void cleanUpAndFail ( Int32 ec ) void cleanUpAndFail ( Int32 ec )
{ {
IntNative retVal; IntNative retVal;
struct MY_STAT statBuf;
if ( srcMode == SM_F2F if ( srcMode == SM_F2F
&& opMode != OM_TEST && opMode != OM_TEST
&& deleteOutputOnInterrupt ) { && deleteOutputOnInterrupt ) {
if (noisy)
fprintf ( stderr, "%s: Deleting output file %s, if it exists.\n", /* Check whether input file still exists. Delete output file
progName, outName ); only if input exists to avoid loss of data. Joerg Prante, 5
if (outputHandleJustInCase != NULL) January 2002. (JRS 06-Jan-2002: other changes in 1.0.2 mean
fclose ( outputHandleJustInCase ); this is less likely to happen. But to be ultra-paranoid, we
retVal = remove ( outName ); do the check anyway.) */
if (retVal != 0) retVal = MY_STAT ( inName, &statBuf );
if (retVal == 0) {
if (noisy)
fprintf ( stderr,
"%s: Deleting output file %s, if it exists.\n",
progName, outName );
if (outputHandleJustInCase != NULL)
fclose ( outputHandleJustInCase );
retVal = remove ( outName );
if (retVal != 0)
fprintf ( stderr,
"%s: WARNING: deletion of output file "
"(apparently) failed.\n",
progName );
} else {
fprintf ( stderr, fprintf ( stderr,
"%s: WARNING: deletion of output file (apparently) failed.\n", "%s: WARNING: deletion of output file suppressed\n",
progName );
fprintf ( stderr,
"%s: since input file no longer exists. Output file\n",
progName ); progName );
fprintf ( stderr,
"%s: `%s' may be incomplete.\n",
progName, outName );
fprintf ( stderr,
"%s: I suggest doing an integrity test (bzip2 -tv)"
" of it.\n",
progName );
}
} }
if (noisy && numFileNames > 0 && numFilesProcessed < numFileNames) { if (noisy && numFileNames > 0 && numFilesProcessed < numFileNames) {
fprintf ( stderr, fprintf ( stderr,
"%s: WARNING: some files have not been processed:\n" "%s: WARNING: some files have not been processed:\n"
"\t%d specified on command line, %d not processed yet.\n\n", "%s: %d specified on command line, %d not processed yet.\n\n",
progName, numFileNames, progName, progName,
numFileNames - numFilesProcessed ); numFileNames, numFileNames - numFilesProcessed );
} }
setExit(ec); setExit(ec);
exit(exitValue); exit(exitValue);
@ -915,14 +852,16 @@ void crcError ( void )
static static
void compressedStreamEOF ( void ) void compressedStreamEOF ( void )
{ {
fprintf ( stderr, if (noisy) {
"\n%s: Compressed file ends unexpectedly;\n\t" fprintf ( stderr,
"perhaps it is corrupted? *Possible* reason follows.\n", "\n%s: Compressed file ends unexpectedly;\n\t"
progName ); "perhaps it is corrupted? *Possible* reason follows.\n",
perror ( progName ); progName );
showFileNames(); perror ( progName );
cadvise(); showFileNames();
cleanUpAndFail( 2 ); cadvise();
}
cleanUpAndFail( 2 );
} }
@ -1038,6 +977,11 @@ void configError ( void )
/*--- The main driver machinery ---*/ /*--- The main driver machinery ---*/
/*---------------------------------------------------*/ /*---------------------------------------------------*/
/* All rather crufty. The main problem is that input files
are stat()d multiple times before use. This should be
cleaned up.
*/
/*---------------------------------------------*/ /*---------------------------------------------*/
static static
void pad ( Char *s ) void pad ( Char *s )
@ -1081,6 +1025,32 @@ Bool fileExists ( Char* name )
} }
/*---------------------------------------------*/
/* Open an output file safely with O_EXCL and good permissions.
This avoids a race condition in versions < 1.0.2, in which
the file was first opened and then had its interim permissions
set safely. We instead use open() to create the file with
the interim permissions required. (--- --- rw-).
For non-Unix platforms, if we are not worrying about
security issues, simple this simply behaves like fopen.
*/
FILE* fopen_output_safely ( Char* name, const char* mode )
{
# if BZ_UNIX
FILE* fp;
IntNative fh;
fh = open(name, O_WRONLY|O_CREAT|O_EXCL, S_IWUSR|S_IRUSR);
if (fh == -1) return NULL;
fp = fdopen(fh, mode);
if (fp == NULL) close(fh);
return fp;
# else
return fopen(name, mode);
# endif
}
/*---------------------------------------------*/ /*---------------------------------------------*/
/*-- /*--
if in doubt, return True if in doubt, return True
@ -1093,7 +1063,7 @@ Bool notAStandardFile ( Char* name )
i = MY_LSTAT ( name, &statBuf ); i = MY_LSTAT ( name, &statBuf );
if (i != 0) return True; if (i != 0) return True;
if (MY_S_IFREG(statBuf.st_mode)) return False; if (MY_S_ISREG(statBuf.st_mode)) return False;
return True; return True;
} }
@ -1115,42 +1085,66 @@ Int32 countHardLinks ( Char* name )
/*---------------------------------------------*/ /*---------------------------------------------*/
static /* Copy modification date, access date, permissions and owner from the
void copyDatePermissionsAndOwner ( Char *srcName, Char *dstName ) source to destination file. We have to copy this meta-info off
{ into fileMetaInfo before starting to compress / decompress it,
because doing it afterwards means we get the wrong access time.
To complicate matters, in compress() and decompress() below, the
sequence of tests preceding the call to saveInputFileMetaInfo()
involves calling fileExists(), which in turn establishes its result
by attempting to fopen() the file, and if successful, immediately
fclose()ing it again. So we have to assume that the fopen() call
does not cause the access time field to be updated.
Reading of the man page for stat() (man 2 stat) on RedHat 7.2 seems
to imply that merely doing open() will not affect the access time.
Therefore we merely need to hope that the C library only does
open() as a result of fopen(), and not any kind of read()-ahead
cleverness.
It sounds pretty fragile to me. Whether this carries across
robustly to arbitrary Unix-like platforms (or even works robustly
on this one, RedHat 7.2) is unknown to me. Nevertheless ...
*/
#if BZ_UNIX #if BZ_UNIX
static
struct MY_STAT fileMetaInfo;
#endif
static
void saveInputFileMetaInfo ( Char *srcName )
{
# if BZ_UNIX
IntNative retVal;
/* Note use of stat here, not lstat. */
retVal = MY_STAT( srcName, &fileMetaInfo );
ERROR_IF_NOT_ZERO ( retVal );
# endif
}
static
void applySavedMetaInfoToOutputFile ( Char *dstName )
{
# if BZ_UNIX
IntNative retVal; IntNative retVal;
struct MY_STAT statBuf;
struct utimbuf uTimBuf; struct utimbuf uTimBuf;
retVal = MY_LSTAT ( srcName, &statBuf ); uTimBuf.actime = fileMetaInfo.st_atime;
ERROR_IF_NOT_ZERO ( retVal ); uTimBuf.modtime = fileMetaInfo.st_mtime;
uTimBuf.actime = statBuf.st_atime;
uTimBuf.modtime = statBuf.st_mtime;
retVal = chmod ( dstName, statBuf.st_mode ); retVal = chmod ( dstName, fileMetaInfo.st_mode );
ERROR_IF_NOT_ZERO ( retVal ); ERROR_IF_NOT_ZERO ( retVal );
retVal = utime ( dstName, &uTimBuf ); retVal = utime ( dstName, &uTimBuf );
ERROR_IF_NOT_ZERO ( retVal ); ERROR_IF_NOT_ZERO ( retVal );
retVal = chown ( dstName, statBuf.st_uid, statBuf.st_gid ); retVal = chown ( dstName, fileMetaInfo.st_uid, fileMetaInfo.st_gid );
/* chown() will in many cases return with EPERM, which can /* chown() will in many cases return with EPERM, which can
be safely ignored. be safely ignored.
*/ */
#endif # endif
}
/*---------------------------------------------*/
static
void setInterimPermissions ( Char *dstName )
{
#if BZ_UNIX
IntNative retVal;
retVal = chmod ( dstName, S_IRUSR | S_IWUSR );
ERROR_IF_NOT_ZERO ( retVal );
#endif
} }
@ -1158,10 +1152,19 @@ void setInterimPermissions ( Char *dstName )
static static
Bool containsDubiousChars ( Char* name ) Bool containsDubiousChars ( Char* name )
{ {
Bool cdc = False; # if BZ_UNIX
/* On unix, files can contain any characters and the file expansion
* is performed by the shell.
*/
return False;
# else /* ! BZ_UNIX */
/* On non-unix (Win* platforms), wildcard characters are not allowed in
* filenames.
*/
for (; *name != '\0'; name++) for (; *name != '\0'; name++)
if (*name == '?' || *name == '*') cdc = True; if (*name == '?' || *name == '*') return True;
return cdc; return False;
# endif /* BZ_UNIX */
} }
@ -1201,6 +1204,7 @@ void compress ( Char *name )
FILE *inStr; FILE *inStr;
FILE *outStr; FILE *outStr;
Int32 n, i; Int32 n, i;
struct MY_STAT statBuf;
deleteOutputOnInterrupt = False; deleteOutputOnInterrupt = False;
@ -1246,6 +1250,16 @@ void compress ( Char *name )
return; return;
} }
} }
if ( srcMode == SM_F2F || srcMode == SM_F2O ) {
MY_STAT(inName, &statBuf);
if ( MY_S_ISDIR(statBuf.st_mode) ) {
fprintf( stderr,
"%s: Input file %s is a directory.\n",
progName,inName);
setExit(1);
return;
}
}
if ( srcMode == SM_F2F && !forceOverwrite && notAStandardFile ( inName )) { if ( srcMode == SM_F2F && !forceOverwrite && notAStandardFile ( inName )) {
if (noisy) if (noisy)
fprintf ( stderr, "%s: Input file %s is not a normal file.\n", fprintf ( stderr, "%s: Input file %s is not a normal file.\n",
@ -1253,11 +1267,15 @@ void compress ( Char *name )
setExit(1); setExit(1);
return; return;
} }
if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) { if ( srcMode == SM_F2F && fileExists ( outName ) ) {
fprintf ( stderr, "%s: Output file %s already exists.\n", if (forceOverwrite) {
progName, outName ); remove(outName);
setExit(1); } else {
return; fprintf ( stderr, "%s: Output file %s already exists.\n",
progName, outName );
setExit(1);
return;
}
} }
if ( srcMode == SM_F2F && !forceOverwrite && if ( srcMode == SM_F2F && !forceOverwrite &&
(n=countHardLinks ( inName )) > 0) { (n=countHardLinks ( inName )) > 0) {
@ -1267,6 +1285,12 @@ void compress ( Char *name )
return; return;
} }
if ( srcMode == SM_F2F ) {
/* Save the file's meta-info before we open it. Doing it later
means we mess up the access times. */
saveInputFileMetaInfo ( inName );
}
switch ( srcMode ) { switch ( srcMode ) {
case SM_I2O: case SM_I2O:
@ -1306,7 +1330,7 @@ void compress ( Char *name )
case SM_F2F: case SM_F2F:
inStr = fopen ( inName, "rb" ); inStr = fopen ( inName, "rb" );
outStr = fopen ( outName, "wb" ); outStr = fopen_output_safely ( outName, "wb" );
if ( outStr == NULL) { if ( outStr == NULL) {
fprintf ( stderr, "%s: Can't create output file %s: %s.\n", fprintf ( stderr, "%s: Can't create output file %s: %s.\n",
progName, outName, strerror(errno) ); progName, outName, strerror(errno) );
@ -1321,7 +1345,6 @@ void compress ( Char *name )
setExit(1); setExit(1);
return; return;
}; };
setInterimPermissions ( outName );
break; break;
default: default:
@ -1343,7 +1366,7 @@ void compress ( Char *name )
/*--- If there was an I/O error, we won't get here. ---*/ /*--- If there was an I/O error, we won't get here. ---*/
if ( srcMode == SM_F2F ) { if ( srcMode == SM_F2F ) {
copyDatePermissionsAndOwner ( inName, outName ); applySavedMetaInfoToOutputFile ( outName );
deleteOutputOnInterrupt = False; deleteOutputOnInterrupt = False;
if ( !keepInputFiles ) { if ( !keepInputFiles ) {
IntNative retVal = remove ( inName ); IntNative retVal = remove ( inName );
@ -1364,6 +1387,7 @@ void uncompress ( Char *name )
Int32 n, i; Int32 n, i;
Bool magicNumberOK; Bool magicNumberOK;
Bool cantGuess; Bool cantGuess;
struct MY_STAT statBuf;
deleteOutputOnInterrupt = False; deleteOutputOnInterrupt = False;
@ -1405,6 +1429,16 @@ void uncompress ( Char *name )
setExit(1); setExit(1);
return; return;
} }
if ( srcMode == SM_F2F || srcMode == SM_F2O ) {
MY_STAT(inName, &statBuf);
if ( MY_S_ISDIR(statBuf.st_mode) ) {
fprintf( stderr,
"%s: Input file %s is a directory.\n",
progName,inName);
setExit(1);
return;
}
}
if ( srcMode == SM_F2F && !forceOverwrite && notAStandardFile ( inName )) { if ( srcMode == SM_F2F && !forceOverwrite && notAStandardFile ( inName )) {
if (noisy) if (noisy)
fprintf ( stderr, "%s: Input file %s is not a normal file.\n", fprintf ( stderr, "%s: Input file %s is not a normal file.\n",
@ -1419,11 +1453,15 @@ void uncompress ( Char *name )
progName, inName, outName ); progName, inName, outName );
/* just a warning, no return */ /* just a warning, no return */
} }
if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) { if ( srcMode == SM_F2F && fileExists ( outName ) ) {
fprintf ( stderr, "%s: Output file %s already exists.\n", if (forceOverwrite) {
progName, outName ); remove(outName);
setExit(1); } else {
return; fprintf ( stderr, "%s: Output file %s already exists.\n",
progName, outName );
setExit(1);
return;
}
} }
if ( srcMode == SM_F2F && !forceOverwrite && if ( srcMode == SM_F2F && !forceOverwrite &&
(n=countHardLinks ( inName ) ) > 0) { (n=countHardLinks ( inName ) ) > 0) {
@ -1433,6 +1471,12 @@ void uncompress ( Char *name )
return; return;
} }
if ( srcMode == SM_F2F ) {
/* Save the file's meta-info before we open it. Doing it later
means we mess up the access times. */
saveInputFileMetaInfo ( inName );
}
switch ( srcMode ) { switch ( srcMode ) {
case SM_I2O: case SM_I2O:
@ -1463,7 +1507,7 @@ void uncompress ( Char *name )
case SM_F2F: case SM_F2F:
inStr = fopen ( inName, "rb" ); inStr = fopen ( inName, "rb" );
outStr = fopen ( outName, "wb" ); outStr = fopen_output_safely ( outName, "wb" );
if ( outStr == NULL) { if ( outStr == NULL) {
fprintf ( stderr, "%s: Can't create output file %s: %s.\n", fprintf ( stderr, "%s: Can't create output file %s: %s.\n",
progName, outName, strerror(errno) ); progName, outName, strerror(errno) );
@ -1478,7 +1522,6 @@ void uncompress ( Char *name )
setExit(1); setExit(1);
return; return;
}; };
setInterimPermissions ( outName );
break; break;
default: default:
@ -1501,7 +1544,7 @@ void uncompress ( Char *name )
/*--- If there was an I/O error, we won't get here. ---*/ /*--- If there was an I/O error, we won't get here. ---*/
if ( magicNumberOK ) { if ( magicNumberOK ) {
if ( srcMode == SM_F2F ) { if ( srcMode == SM_F2F ) {
copyDatePermissionsAndOwner ( inName, outName ); applySavedMetaInfoToOutputFile ( outName );
deleteOutputOnInterrupt = False; deleteOutputOnInterrupt = False;
if ( !keepInputFiles ) { if ( !keepInputFiles ) {
IntNative retVal = remove ( inName ); IntNative retVal = remove ( inName );
@ -1539,6 +1582,7 @@ void testf ( Char *name )
{ {
FILE *inStr; FILE *inStr;
Bool allOK; Bool allOK;
struct MY_STAT statBuf;
deleteOutputOnInterrupt = False; deleteOutputOnInterrupt = False;
@ -1565,6 +1609,16 @@ void testf ( Char *name )
setExit(1); setExit(1);
return; return;
} }
if ( srcMode != SM_I2O ) {
MY_STAT(inName, &statBuf);
if ( MY_S_ISDIR(statBuf.st_mode) ) {
fprintf( stderr,
"%s: Input file %s is a directory.\n",
progName,inName);
setExit(1);
return;
}
}
switch ( srcMode ) { switch ( srcMode ) {
@ -1603,6 +1657,7 @@ void testf ( Char *name )
} }
/*--- Now the input handle is sane. Do the Biz. ---*/ /*--- Now the input handle is sane. Do the Biz. ---*/
outputHandleJustInCase = NULL;
allOK = testStream ( inStr ); allOK = testStream ( inStr );
if (allOK && verbosity >= 1) fprintf ( stderr, "ok\n" ); if (allOK && verbosity >= 1) fprintf ( stderr, "ok\n" );
@ -1619,7 +1674,7 @@ void license ( void )
"bzip2, a block-sorting file compressor. " "bzip2, a block-sorting file compressor. "
"Version %s.\n" "Version %s.\n"
" \n" " \n"
" Copyright (C) 1996-2000 by Julian Seward.\n" " Copyright (C) 1996-2002 by Julian Seward.\n"
" \n" " \n"
" This program is free software; you can redistribute it and/or modify\n" " This program is free software; you can redistribute it and/or modify\n"
" it under the terms set out in the LICENSE file, which is included\n" " it under the terms set out in the LICENSE file, which is included\n"
@ -1658,6 +1713,8 @@ void usage ( Char *fullProgName )
" -V --version display software version & license\n" " -V --version display software version & license\n"
" -s --small use less memory (at most 2500k)\n" " -s --small use less memory (at most 2500k)\n"
" -1 .. -9 set block size to 100k .. 900k\n" " -1 .. -9 set block size to 100k .. 900k\n"
" --fast alias for -1\n"
" --best alias for -9\n"
"\n" "\n"
" If invoked as `bzip2', default action is to compress.\n" " If invoked as `bzip2', default action is to compress.\n"
" as `bunzip2', default action is to decompress.\n" " as `bunzip2', default action is to decompress.\n"
@ -1666,9 +1723,9 @@ void usage ( Char *fullProgName )
" If no file names are given, bzip2 compresses or decompresses\n" " If no file names are given, bzip2 compresses or decompresses\n"
" from standard input to standard output. You can combine\n" " from standard input to standard output. You can combine\n"
" short flags, so `-v -4' means the same as -v4 or -4v, &c.\n" " short flags, so `-v -4' means the same as -v4 or -4v, &c.\n"
#if BZ_UNIX # if BZ_UNIX
"\n" "\n"
#endif # endif
, ,
BZ2_bzlibVersion(), BZ2_bzlibVersion(),
@ -1818,11 +1875,11 @@ IntNative main ( IntNative argc, Char *argv[] )
/*-- Set up signal handlers for mem access errors --*/ /*-- Set up signal handlers for mem access errors --*/
signal (SIGSEGV, mySIGSEGVorSIGBUScatcher); signal (SIGSEGV, mySIGSEGVorSIGBUScatcher);
#if BZ_UNIX # if BZ_UNIX
#ifndef __DJGPP__ # ifndef __DJGPP__
signal (SIGBUS, mySIGSEGVorSIGBUScatcher); signal (SIGBUS, mySIGSEGVorSIGBUScatcher);
#endif # endif
#endif # endif
copyFileName ( inName, "(none)" ); copyFileName ( inName, "(none)" );
copyFileName ( outName, "(none)" ); copyFileName ( outName, "(none)" );
@ -1933,6 +1990,8 @@ IntNative main ( IntNative argc, Char *argv[] )
if (ISFLAG("--exponential")) workFactor = 1; else if (ISFLAG("--exponential")) workFactor = 1; else
if (ISFLAG("--repetitive-best")) redundant(aa->name); else if (ISFLAG("--repetitive-best")) redundant(aa->name); else
if (ISFLAG("--repetitive-fast")) redundant(aa->name); else if (ISFLAG("--repetitive-fast")) redundant(aa->name); else
if (ISFLAG("--fast")) blockSize100k = 1; else
if (ISFLAG("--best")) blockSize100k = 9; else
if (ISFLAG("--verbose")) verbosity++; else if (ISFLAG("--verbose")) verbosity++; else
if (ISFLAG("--help")) { usage ( progName ); exit ( 0 ); } if (ISFLAG("--help")) { usage ( progName ); exit ( 0 ); }
else else

132
bzip2.txt
View File

@ -1,7 +1,6 @@
NAME NAME
bzip2, bunzip2 - a block-sorting file compressor, v1.0 bzip2, bunzip2 - a block-sorting file compressor, v1.0.2
bzcat - decompresses files to stdout bzcat - decompresses files to stdout
bzip2recover - recovers data from damaged bzip2 files bzip2recover - recovers data from damaged bzip2 files
@ -18,20 +17,20 @@ DESCRIPTION
sorting text compression algorithm, and Huffman coding. sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that Compression is generally considerably better than that
achieved by more conventional LZ77/LZ78-based compressors, achieved by more conventional LZ77/LZ78-based compressors,
and approaches the performance of the PPM family of sta- and approaches the performance of the PPM family of sta­
tistical compressors. tistical compressors.
The command-line options are deliberately very similar to The command-line options are deliberately very similar to
those of GNU gzip, but they are not identical. those of GNU gzip, but they are not identical.
bzip2 expects a list of file names to accompany the com- bzip2 expects a list of file names to accompany the com­
mand-line flags. Each file is replaced by a compressed mand-line flags. Each file is replaced by a compressed
version of itself, with the name "original_name.bz2". version of itself, with the name "original_name.bz2".
Each compressed file has the same modification date, per- Each compressed file has the same modification date, per­
missions, and, when possible, ownership as the correspond- missions, and, when possible, ownership as the correspond­
ing original, so that these properties can be correctly ing original, so that these properties can be correctly
restored at decompression time. File name handling is restored at decompression time. File name handling is
naive in the sense that there is no mechanism for preserv- naive in the sense that there is no mechanism for preserv­
ing original file names, permissions, ownerships or dates ing original file names, permissions, ownerships or dates
in filesystems which lack these concepts, or have serious in filesystems which lack these concepts, or have serious
file name length restrictions, such as MS-DOS. file name length restrictions, such as MS-DOS.
@ -62,23 +61,23 @@ DESCRIPTION
guess the name of the original file, and uses the original guess the name of the original file, and uses the original
name with .out appended. name with .out appended.
As with compression, supplying no filenames causes decom- As with compression, supplying no filenames causes decom­
pression from standard input to standard output. pression from standard input to standard output.
bunzip2 will correctly decompress a file which is the con- bunzip2 will correctly decompress a file which is the con­
catenation of two or more compressed files. The result is catenation of two or more compressed files. The result is
the concatenation of the corresponding uncompressed files. the concatenation of the corresponding uncompressed files.
Integrity testing (-t) of concatenated compressed files is Integrity testing (-t) of concatenated compressed files is
also supported. also supported.
You can also compress or decompress files to the standard You can also compress or decompress files to the standard
output by giving the -c flag. Multiple files may be com- output by giving the -c flag. Multiple files may be com­
pressed and decompressed like this. The resulting outputs pressed and decompressed like this. The resulting outputs
are fed sequentially to stdout. Compression of multiple are fed sequentially to stdout. Compression of multiple
files in this manner generates a stream containing multi- files in this manner generates a stream containing multi­
ple compressed file representations. Such a stream can be ple compressed file representations. Such a stream can be
decompressed correctly only by bzip2 version 0.9.0 or decompressed correctly only by bzip2 version 0.9.0 or
later. Earlier versions of bzip2 will stop after decom- later. Earlier versions of bzip2 will stop after decom­
pressing the first file in the stream. pressing the first file in the stream.
bzcat (or bzip2 -dc) decompresses all specified files to bzcat (or bzip2 -dc) decompresses all specified files to
@ -99,7 +98,7 @@ DESCRIPTION
As a self-check for your protection, bzip2 uses 32-bit As a self-check for your protection, bzip2 uses 32-bit
CRCs to make sure that the decompressed version of a file CRCs to make sure that the decompressed version of a file
is identical to the original. This guards against corrup- is identical to the original. This guards against corrup­
tion of the compressed data, and against undetected bugs tion of the compressed data, and against undetected bugs
in bzip2 (hopefully very unlikely). The chances of data in bzip2 (hopefully very unlikely). The chances of data
corruption going undetected is microscopic, about one corruption going undetected is microscopic, about one
@ -127,8 +126,8 @@ OPTIONS
and forces bzip2 to decompress. and forces bzip2 to decompress.
-z --compress -z --compress
The complement to -d: forces compression, regard- The complement to -d: forces compression,
less of the invokation name. regardless of the invocation name.
-t --test -t --test
Check integrity of the specified file(s), but don't Check integrity of the specified file(s), but don't
@ -141,6 +140,11 @@ OPTIONS
forces bzip2 to break hard links to files, which it forces bzip2 to break hard links to files, which it
otherwise wouldn't do. otherwise wouldn't do.
bzip2 normally declines to decompress files which
don't have the correct magic header bytes. If
forced (-f), however, it will pass such files
through unmodified. This is how GNU gzip behaves.
-k --keep -k --keep
Keep (don't delete) input files during compression Keep (don't delete) input files during compression
or decompression. or decompression.
@ -167,7 +171,7 @@ OPTIONS
-v --verbose -v --verbose
Verbose mode -- show the compression ratio for each Verbose mode -- show the compression ratio for each
file processed. Further -v's increase the ver- file processed. Further -v's increase the ver­
bosity level, spewing out lots of information which bosity level, spewing out lots of information which
is primarily of interest for diagnostic purposes. is primarily of interest for diagnostic purposes.
@ -175,20 +179,24 @@ OPTIONS
Display the software version, license terms and Display the software version, license terms and
conditions. conditions.
-1 to -9 -1 (or --fast) to -9 (or --best)
Set the block size to 100 k, 200 k .. 900 k when Set the block size to 100 k, 200 k .. 900 k when
compressing. Has no effect when decompressing. compressing. Has no effect when decompressing.
See MEMORY MANAGEMENT below. See MEMORY MANAGEMENT below. The --fast and --best
aliases are primarily for GNU gzip compatibility.
In particular, --fast doesn't make things signifi­
cantly faster. And --best merely selects the
default behaviour.
-- Treats all subsequent arguments as file names, even -- Treats all subsequent arguments as file names, even
if they start with a dash. This is so you can han- if they start with a dash. This is so you can han­
dle files with names beginning with a dash, for dle files with names beginning with a dash, for
example: bzip2 -- -myfilename. example: bzip2 -- -myfilename.
--repetitive-fast --repetitive-best --repetitive-fast --repetitive-best
These flags are redundant in versions 0.9.5 and These flags are redundant in versions 0.9.5 and
above. They provided some coarse control over the above. They provided some coarse control over the
behaviour of the sorting algorithm in earlier ver- behaviour of the sorting algorithm in earlier ver­
sions, which was sometimes useful. 0.9.5 and above sions, which was sometimes useful. 0.9.5 and above
have an improved algorithm which renders these have an improved algorithm which renders these
flags irrelevant. flags irrelevant.
@ -199,7 +207,7 @@ MEMORY MANAGEMENT
affects both the compression ratio achieved, and the affects both the compression ratio achieved, and the
amount of memory needed for compression and decompression. amount of memory needed for compression and decompression.
The flags -1 through -9 specify the block size to be The flags -1 through -9 specify the block size to be
100,000 bytes through 900,000 bytes (the default) respec- 100,000 bytes through 900,000 bytes (the default) respec­
tively. At decompression time, the block size used for tively. At decompression time, the block size used for
compression is read from the header of the compressed compression is read from the header of the compressed
file, and bunzip2 then allocates itself just enough memory file, and bunzip2 then allocates itself just enough memory
@ -227,13 +235,13 @@ MEMORY MANAGEMENT
bunzip2 will require about 3700 kbytes to decompress. To bunzip2 will require about 3700 kbytes to decompress. To
support decompression of any file on a 4 megabyte machine, support decompression of any file on a 4 megabyte machine,
bunzip2 has an option to decompress using approximately bunzip2 has an option to decompress using approximately
half this amount of memory, about 2300 kbytes. Decompres- half this amount of memory, about 2300 kbytes. Decompres­
sion speed is also halved, so you should use this option sion speed is also halved, so you should use this option
only where necessary. The relevant flag is -s. only where necessary. The relevant flag is -s.
In general, try and use the largest block size memory con- In general, try and use the largest block size memory con­
straints allow, since that maximises the compression straints allow, since that maximises the compression
achieved. Compression and decompression speed are virtu- achieved. Compression and decompression speed are virtu­
ally unaffected by block size. ally unaffected by block size.
Another significant point applies to files which fit in a Another significant point applies to files which fit in a
@ -249,11 +257,11 @@ MEMORY MANAGEMENT
Here is a table which summarises the maximum memory usage Here is a table which summarises the maximum memory usage
for different block sizes. Also recorded is the total for different block sizes. Also recorded is the total
compressed size for 14 files of the Calgary Text Compres- compressed size for 14 files of the Calgary Text Compres­
sion Corpus totalling 3,141,622 bytes. This column gives sion Corpus totalling 3,141,622 bytes. This column gives
some feel for how compression varies with block size. some feel for how compression varies with block size.
These figures tend to understate the advantage of larger These figures tend to understate the advantage of larger
block sizes for larger files, since the Corpus is domi- block sizes for larger files, since the Corpus is domi­
nated by smaller files. nated by smaller files.
Compress Decompress Decompress Corpus Compress Decompress Decompress Corpus
@ -272,7 +280,7 @@ MEMORY MANAGEMENT
RECOVERING DATA FROM DAMAGED FILES RECOVERING DATA FROM DAMAGED FILES
bzip2 compresses files in blocks, usually 900kbytes long. bzip2 compresses files in blocks, usually 900kbytes long.
Each block is handled independently. If a media or trans- Each block is handled independently. If a media or trans­
mission error causes a multi-block .bz2 file to become mission error causes a multi-block .bz2 file to become
damaged, it may be possible to recover data from the damaged, it may be possible to recover data from the
undamaged blocks in the file. undamaged blocks in the file.
@ -289,19 +297,19 @@ RECOVERING DATA FROM DAMAGED FILES
the integrity of the resulting files, and decompress those the integrity of the resulting files, and decompress those
which are undamaged. which are undamaged.
bzip2recover takes a single argument, the name of the dam- bzip2recover takes a single argument, the name of the dam­
aged file, and writes a number of files "rec0001file.bz2", aged file, and writes a number of files
"rec0002file.bz2", etc, containing the extracted blocks. "rec00001file.bz2", "rec00002file.bz2", etc, containing
The output filenames are designed so that the use of the extracted blocks. The output filenames are
wildcards in subsequent processing -- for example, "bzip2 designed so that the use of wildcards in subsequent pro­
-dc rec*file.bz2 > recovered_data" -- lists the files in cessing -- for example, "bzip2 -dc rec*file.bz2 > recov­
the correct order. ered_data" -- processes the files in the correct order.
bzip2recover should be of most use dealing with large .bz2 bzip2recover should be of most use dealing with large .bz2
files, as these will contain many blocks. It is clearly files, as these will contain many blocks. It is clearly
futile to use it on damaged single-block files, since a futile to use it on damaged single-block files, since a
damaged block cannot be recovered. If you wish to min- damaged block cannot be recovered. If you wish to min­
imise any potential data loss through media or transmis- imise any potential data loss through media or transmis­
sion errors, you might consider compressing with a smaller sion errors, you might consider compressing with a smaller
block size. block size.
@ -315,19 +323,19 @@ PERFORMANCE NOTES
better than previous versions in this respect. The ratio better than previous versions in this respect. The ratio
between worst-case and average-case compression time is in between worst-case and average-case compression time is in
the region of 10:1. For previous versions, this figure the region of 10:1. For previous versions, this figure
was more like 100:1. You can use the -vvvv option to mon- was more like 100:1. You can use the -vvvv option to mon­
itor progress in great detail, if you want. itor progress in great detail, if you want.
Decompression speed is unaffected by these phenomena. Decompression speed is unaffected by these phenomena.
bzip2 usually allocates several megabytes of memory to bzip2 usually allocates several megabytes of memory to
operate in, and then charges all over it in a fairly ran- operate in, and then charges all over it in a fairly ran­
dom fashion. This means that performance, both for com- dom fashion. This means that performance, both for com­
pressing and decompressing, is largely determined by the pressing and decompressing, is largely determined by the
speed at which your machine can service cache misses. speed at which your machine can service cache misses.
Because of this, small changes to the code to reduce the Because of this, small changes to the code to reduce the
miss rate have been observed to give disproportionately miss rate have been observed to give disproportionately
large performance improvements. I imagine bzip2 will per- large performance improvements. I imagine bzip2 will per­
form best on machines with very large caches. form best on machines with very large caches.
@ -337,40 +345,46 @@ CAVEATS
but the details of what the problem is sometimes seem but the details of what the problem is sometimes seem
rather misleading. rather misleading.
This manual page pertains to version 1.0 of bzip2. Com- This manual page pertains to version 1.0.2 of bzip2. Com­
pressed data created by this version is entirely forwards pressed data created by this version is entirely forwards
and backwards compatible with the previous public and backwards compatible with the previous public
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
following exception: 0.9.0 and above can correctly decom- but with the following exception: 0.9.0 and above can cor­
press multiple concatenated compressed files. 0.1pl2 can- rectly decompress multiple concatenated compressed files.
not do this; it will stop after decompressing just the 0.1pl2 cannot do this; it will stop after decompressing
first file in the stream. just the first file in the stream.
bzip2recover uses 32-bit integers to represent bit posi- bzip2recover versions prior to this one, 1.0.2, used
tions in compressed files, so it cannot handle compressed 32-bit integers to represent bit positions in compressed
files more than 512 megabytes long. This could easily be files, so it could not handle compressed files more than
fixed. 512 megabytes long. Version 1.0.2 and above uses 64-bit
ints on some platforms which support them (GNU supported
targets, and Windows). To establish whether or not
bzip2recover was built with such a limitation, run it
without arguments. In any event you can build yourself an
unlimited version if you can recompile it with MaybeUInt64
set to be an unsigned 64-bit integer.
AUTHOR AUTHOR
Julian Seward, jseward@acm.org. Julian Seward, jseward@acm.org.
http://sourceware.cygnus.com/bzip2 http://sources.redhat.com/bzip2
http://www.muraroa.demon.co.uk
The ideas embodied in bzip2 are due to (at least) the fol- The ideas embodied in bzip2 are due to (at least) the fol­
lowing people: Michael Burrows and David Wheeler (for the lowing people: Michael Burrows and David Wheeler (for the
block sorting transformation), David Wheeler (again, for block sorting transformation), David Wheeler (again, for
the Huffman coder), Peter Fenwick (for the structured cod- the Huffman coder), Peter Fenwick (for the structured cod­
ing model in the original bzip, and many refinements), and ing model in the original bzip, and many refinements), and
Alistair Moffat, Radford Neal and Ian Witten (for the Alistair Moffat, Radford Neal and Ian Witten (for the
arithmetic coder in the original bzip). I am much arithmetic coder in the original bzip). I am much
indebted for their help, support and advice. See the man- indebted for their help, support and advice. See the man­
ual in the source distribution for pointers to sources of ual in the source distribution for pointers to sources of
documentation. Christian von Roques encouraged me to look documentation. Christian von Roques encouraged me to look
for faster sorting algorithms, so as to speed up compres- for faster sorting algorithms, so as to speed up compres­
sion. Bela Lubkin encouraged me to improve the worst-case sion. Bela Lubkin encouraged me to improve the worst-case
compression performance. Many people sent patches, helped compression performance. The bz* scripts are derived from
with portability problems, lent machines, gave advice and those of GNU gzip. Many people sent patches, helped with
were generally helpful. portability problems, lent machines, gave advice and were
generally helpful.

View File

@ -9,7 +9,7 @@
salvage from damaged files created by the accompanying salvage from damaged files created by the accompanying
bzip2-1.0 program. bzip2-1.0 program.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -57,6 +57,29 @@
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
/* This program records bit locations in the file to be recovered.
That means that if 64-bit ints are not supported, we will not
be able to recover .bz2 files over 512MB (2^32 bits) long.
On GNU supported platforms, we take advantage of the 64-bit
int support to circumvent this problem. Ditto MSVC.
This change occurred in version 1.0.2; all prior versions have
the 512MB limitation.
*/
#ifdef __GNUC__
typedef unsigned long long int MaybeUInt64;
# define MaybeUInt64_FMT "%Lu"
#else
#ifdef _MSC_VER
typedef unsigned __int64 MaybeUInt64;
# define MaybeUInt64_FMT "%I64u"
#else
typedef unsigned int MaybeUInt64;
# define MaybeUInt64_FMT "%u"
#endif
#endif
typedef unsigned int UInt32; typedef unsigned int UInt32;
typedef int Int32; typedef int Int32;
typedef unsigned char UChar; typedef unsigned char UChar;
@ -66,13 +89,25 @@ typedef unsigned char Bool;
#define False ((Bool)0) #define False ((Bool)0)
Char inFileName[2000]; #define BZ_MAX_FILENAME 2000
Char outFileName[2000];
Char progName[2000];
UInt32 bytesOut = 0; Char inFileName[BZ_MAX_FILENAME];
UInt32 bytesIn = 0; Char outFileName[BZ_MAX_FILENAME];
Char progName[BZ_MAX_FILENAME];
MaybeUInt64 bytesOut = 0;
MaybeUInt64 bytesIn = 0;
/*---------------------------------------------------*/
/*--- Header bytes ---*/
/*---------------------------------------------------*/
#define BZ_HDR_B 0x42 /* 'B' */
#define BZ_HDR_Z 0x5a /* 'Z' */
#define BZ_HDR_h 0x68 /* 'h' */
#define BZ_HDR_0 0x30 /* '0' */
/*---------------------------------------------------*/ /*---------------------------------------------------*/
/*--- I/O errors ---*/ /*--- I/O errors ---*/
@ -116,6 +151,23 @@ void mallocFail ( Int32 n )
} }
/*---------------------------------------------*/
void tooManyBlocks ( Int32 max_handled_blocks )
{
fprintf ( stderr,
"%s: `%s' appears to contain more than %d blocks\n",
progName, inFileName, max_handled_blocks );
fprintf ( stderr,
"%s: and cannot be handled. To fix, increase\n",
progName );
fprintf ( stderr,
"%s: BZ_MAX_HANDLED_BLOCKS in bzip2recover.c, and recompile.\n",
progName );
exit ( 1 );
}
/*---------------------------------------------------*/ /*---------------------------------------------------*/
/*--- Bit stream I/O ---*/ /*--- Bit stream I/O ---*/
/*---------------------------------------------------*/ /*---------------------------------------------------*/
@ -254,27 +306,37 @@ Bool endsInBz2 ( Char* name )
/*--- ---*/ /*--- ---*/
/*---------------------------------------------------*/ /*---------------------------------------------------*/
/* This logic isn't really right when it comes to Cygwin. */
#ifdef _WIN32
# define BZ_SPLIT_SYM '\\' /* path splitter on Windows platform */
#else
# define BZ_SPLIT_SYM '/' /* path splitter on Unix platform */
#endif
#define BLOCK_HEADER_HI 0x00003141UL #define BLOCK_HEADER_HI 0x00003141UL
#define BLOCK_HEADER_LO 0x59265359UL #define BLOCK_HEADER_LO 0x59265359UL
#define BLOCK_ENDMARK_HI 0x00001772UL #define BLOCK_ENDMARK_HI 0x00001772UL
#define BLOCK_ENDMARK_LO 0x45385090UL #define BLOCK_ENDMARK_LO 0x45385090UL
/* Increase if necessary. However, a .bz2 file with > 50000 blocks
would have an uncompressed size of at least 40GB, so the chances
are low you'll need to up this.
*/
#define BZ_MAX_HANDLED_BLOCKS 50000
UInt32 bStart[20000]; MaybeUInt64 bStart [BZ_MAX_HANDLED_BLOCKS];
UInt32 bEnd[20000]; MaybeUInt64 bEnd [BZ_MAX_HANDLED_BLOCKS];
UInt32 rbStart[20000]; MaybeUInt64 rbStart[BZ_MAX_HANDLED_BLOCKS];
UInt32 rbEnd[20000]; MaybeUInt64 rbEnd [BZ_MAX_HANDLED_BLOCKS];
Int32 main ( Int32 argc, Char** argv ) Int32 main ( Int32 argc, Char** argv )
{ {
FILE* inFile; FILE* inFile;
FILE* outFile; FILE* outFile;
BitStream* bsIn, *bsWr; BitStream* bsIn, *bsWr;
Int32 currBlock, b, wrBlock; Int32 b, wrBlock, currBlock, rbCtr;
UInt32 bitsRead; MaybeUInt64 bitsRead;
Int32 rbCtr;
UInt32 buffHi, buffLo, blockCRC; UInt32 buffHi, buffLo, blockCRC;
Char* p; Char* p;
@ -282,11 +344,37 @@ Int32 main ( Int32 argc, Char** argv )
strcpy ( progName, argv[0] ); strcpy ( progName, argv[0] );
inFileName[0] = outFileName[0] = 0; inFileName[0] = outFileName[0] = 0;
fprintf ( stderr, "bzip2recover 1.0: extracts blocks from damaged .bz2 files.\n" ); fprintf ( stderr,
"bzip2recover 1.0.2: extracts blocks from damaged .bz2 files.\n" );
if (argc != 2) { if (argc != 2) {
fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n", fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n",
progName, progName ); progName, progName );
switch (sizeof(MaybeUInt64)) {
case 8:
fprintf(stderr,
"\trestrictions on size of recovered file: None\n");
break;
case 4:
fprintf(stderr,
"\trestrictions on size of recovered file: 512 MB\n");
fprintf(stderr,
"\tto circumvent, recompile with MaybeUInt64 as an\n"
"\tunsigned 64-bit int.\n");
break;
default:
fprintf(stderr,
"\tsizeof(MaybeUInt64) is not 4 or 8 -- "
"configuration error.\n");
break;
}
exit(1);
}
if (strlen(argv[1]) >= BZ_MAX_FILENAME-20) {
fprintf ( stderr,
"%s: supplied filename is suspiciously (>= %d chars) long. Bye!\n",
progName, strlen(argv[1]) );
exit(1); exit(1);
} }
@ -316,7 +404,8 @@ Int32 main ( Int32 argc, Char** argv )
(bitsRead - bStart[currBlock]) >= 40) { (bitsRead - bStart[currBlock]) >= 40) {
bEnd[currBlock] = bitsRead-1; bEnd[currBlock] = bitsRead-1;
if (currBlock > 0) if (currBlock > 0)
fprintf ( stderr, " block %d runs from %d to %d (incomplete)\n", fprintf ( stderr, " block %d runs from " MaybeUInt64_FMT
" to " MaybeUInt64_FMT " (incomplete)\n",
currBlock, bStart[currBlock], bEnd[currBlock] ); currBlock, bStart[currBlock], bEnd[currBlock] );
} else } else
currBlock--; currBlock--;
@ -330,17 +419,22 @@ Int32 main ( Int32 argc, Char** argv )
( (buffHi & 0x0000ffff) == BLOCK_ENDMARK_HI ( (buffHi & 0x0000ffff) == BLOCK_ENDMARK_HI
&& buffLo == BLOCK_ENDMARK_LO) && buffLo == BLOCK_ENDMARK_LO)
) { ) {
if (bitsRead > 49) if (bitsRead > 49) {
bEnd[currBlock] = bitsRead-49; else bEnd[currBlock] = bitsRead-49;
} else {
bEnd[currBlock] = 0; bEnd[currBlock] = 0;
}
if (currBlock > 0 && if (currBlock > 0 &&
(bEnd[currBlock] - bStart[currBlock]) >= 130) { (bEnd[currBlock] - bStart[currBlock]) >= 130) {
fprintf ( stderr, " block %d runs from %d to %d\n", fprintf ( stderr, " block %d runs from " MaybeUInt64_FMT
" to " MaybeUInt64_FMT "\n",
rbCtr+1, bStart[currBlock], bEnd[currBlock] ); rbCtr+1, bStart[currBlock], bEnd[currBlock] );
rbStart[rbCtr] = bStart[currBlock]; rbStart[rbCtr] = bStart[currBlock];
rbEnd[rbCtr] = bEnd[currBlock]; rbEnd[rbCtr] = bEnd[currBlock];
rbCtr++; rbCtr++;
} }
if (currBlock >= BZ_MAX_HANDLED_BLOCKS)
tooManyBlocks(BZ_MAX_HANDLED_BLOCKS);
currBlock++; currBlock++;
bStart[currBlock] = bitsRead; bStart[currBlock] = bitsRead;
@ -400,10 +494,25 @@ Int32 main ( Int32 argc, Char** argv )
wrBlock++; wrBlock++;
} else } else
if (bitsRead == rbStart[wrBlock]) { if (bitsRead == rbStart[wrBlock]) {
outFileName[0] = 0; /* Create the output file name, correctly handling leading paths.
sprintf ( outFileName, "rec%4d", wrBlock+1 ); (31.10.2001 by Sergey E. Kusikov) */
for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0'; Char* split;
strcat ( outFileName, inFileName ); Int32 ofs, k;
for (k = 0; k < BZ_MAX_FILENAME; k++)
outFileName[k] = 0;
strcpy (outFileName, inFileName);
split = strrchr (outFileName, BZ_SPLIT_SYM);
if (split == NULL) {
split = outFileName;
} else {
++split;
}
/* Now split points to the start of the basename. */
ofs = split - outFileName;
sprintf (split, "rec%5d", wrBlock+1);
for (p = split; *p != 0; p++) if (*p == ' ') *p = '0';
strcat (outFileName, inFileName + ofs);
if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" ); if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" );
fprintf ( stderr, " writing block %d to `%s' ...\n", fprintf ( stderr, " writing block %d to `%s' ...\n",
@ -416,8 +525,10 @@ Int32 main ( Int32 argc, Char** argv )
exit(1); exit(1);
} }
bsWr = bsOpenWriteStream ( outFile ); bsWr = bsOpenWriteStream ( outFile );
bsPutUChar ( bsWr, 'B' ); bsPutUChar ( bsWr, 'Z' ); bsPutUChar ( bsWr, BZ_HDR_B );
bsPutUChar ( bsWr, 'h' ); bsPutUChar ( bsWr, '9' ); bsPutUChar ( bsWr, BZ_HDR_Z );
bsPutUChar ( bsWr, BZ_HDR_h );
bsPutUChar ( bsWr, BZ_HDR_0 + 9 );
bsPutUChar ( bsWr, 0x31 ); bsPutUChar ( bsWr, 0x41 ); bsPutUChar ( bsWr, 0x31 ); bsPutUChar ( bsWr, 0x41 );
bsPutUChar ( bsWr, 0x59 ); bsPutUChar ( bsWr, 0x26 ); bsPutUChar ( bsWr, 0x59 ); bsPutUChar ( bsWr, 0x26 );
bsPutUChar ( bsWr, 0x53 ); bsPutUChar ( bsWr, 0x59 ); bsPutUChar ( bsWr, 0x53 ); bsPutUChar ( bsWr, 0x59 );

35
bzlib.c
View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -93,10 +93,39 @@ void BZ2_bz__AssertH__fail ( int errcode )
"component, you should also report this bug to the author(s)\n" "component, you should also report this bug to the author(s)\n"
"of that program. Please make an effort to report this bug;\n" "of that program. Please make an effort to report this bug;\n"
"timely and accurate bug reports eventually lead to higher\n" "timely and accurate bug reports eventually lead to higher\n"
"quality software. Thanks. Julian Seward, 21 March 2000.\n\n", "quality software. Thanks. Julian Seward, 30 December 2001.\n\n",
errcode, errcode,
BZ2_bzlibVersion() BZ2_bzlibVersion()
); );
if (errcode == 1007) {
fprintf(stderr,
"\n*** A special note about internal error number 1007 ***\n"
"\n"
"Experience suggests that a common cause of i.e. 1007\n"
"is unreliable memory or other hardware. The 1007 assertion\n"
"just happens to cross-check the results of huge numbers of\n"
"memory reads/writes, and so acts (unintendedly) as a stress\n"
"test of your memory system.\n"
"\n"
"I suggest the following: try compressing the file again,\n"
"possibly monitoring progress in detail with the -vv flag.\n"
"\n"
"* If the error cannot be reproduced, and/or happens at different\n"
" points in compression, you may have a flaky memory system.\n"
" Try a memory-test program. I have used Memtest86\n"
" (www.memtest86.com). At the time of writing it is free (GPLd).\n"
" Memtest86 tests memory much more thorougly than your BIOSs\n"
" power-on test, and may find failures that the BIOS doesn't.\n"
"\n"
"* If the error can be repeatably reproduced, this is a bug in\n"
" bzip2, and I would very much like to hear about it. Please\n"
" let me know, and, ideally, save a copy of the file causing the\n"
" problem -- without which I will be unable to investigate it.\n"
"\n"
);
}
exit(3); exit(3);
} }
#endif #endif
@ -1402,7 +1431,7 @@ BZFILE * bzopen_or_bzdopen
smallMode = 1; break; smallMode = 1; break;
default: default:
if (isdigit((int)(*mode))) { if (isdigit((int)(*mode))) {
blockSize100k = *mode-'0'; blockSize100k = *mode-BZ_HDR_0;
} }
} }
mode++; mode++;

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -110,8 +110,10 @@ typedef
#define BZ_EXPORT #define BZ_EXPORT
#endif #endif
/* Need a definitition for FILE */
#include <stdio.h>
#ifdef _WIN32 #ifdef _WIN32
# include <stdio.h>
# include <windows.h> # include <windows.h>
# ifdef small # ifdef small
/* windows.h define small to char */ /* windows.h define small to char */

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -76,7 +76,7 @@
/*-- General stuff. --*/ /*-- General stuff. --*/
#define BZ_VERSION "1.0.1, 23-June-2000" #define BZ_VERSION "1.0.2, 30-Dec-2001"
typedef char Char; typedef char Char;
typedef unsigned char Bool; typedef unsigned char Bool;
@ -137,6 +137,13 @@ extern void bz_internal_error ( int errcode );
#define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp)) #define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp))
/*-- Header bytes. --*/
#define BZ_HDR_B 0x42 /* 'B' */
#define BZ_HDR_Z 0x5a /* 'Z' */
#define BZ_HDR_h 0x68 /* 'h' */
#define BZ_HDR_0 0x30 /* '0' */
/*-- Constants for the back end. --*/ /*-- Constants for the back end. --*/
#define BZ_MAX_ALPHA_SIZE 258 #define BZ_MAX_ALPHA_SIZE 258

61
bzmore Normal file
View File

@ -0,0 +1,61 @@
#!/bin/sh
# Bzmore wrapped for bzip2,
# adapted from zmore by Philippe Troin <phil@fifi.org> for Debian GNU/Linux.
PATH="/usr/bin:$PATH"; export PATH
prog=`echo $0 | sed 's|.*/||'`
case "$prog" in
*less) more=less ;;
*) more=more ;;
esac
if test "`echo -n a`" = "-n a"; then
# looks like a SysV system:
n1=''; n2='\c'
else
n1='-n'; n2=''
fi
oldtty=`stty -g 2>/dev/null`
if stty -cbreak 2>/dev/null; then
cb='cbreak'; ncb='-cbreak'
else
# 'stty min 1' resets eof to ^a on both SunOS and SysV!
cb='min 1 -icanon'; ncb='icanon eof ^d'
fi
if test $? -eq 0 -a -n "$oldtty"; then
trap 'stty $oldtty 2>/dev/null; exit' 0 2 3 5 10 13 15
else
trap 'stty $ncb echo 2>/dev/null; exit' 0 2 3 5 10 13 15
fi
if test $# = 0; then
if test -t 0; then
echo usage: $prog files...
else
bzip2 -cdfq | eval $more
fi
else
FIRST=1
for FILE
do
if test $FIRST -eq 0; then
echo $n1 "--More--(Next file: $FILE)$n2"
stty $cb -echo 2>/dev/null
ANS=`dd bs=1 count=1 2>/dev/null`
stty $ncb echo 2>/dev/null
echo " "
if test "$ANS" = 'e' -o "$ANS" = 'q'; then
exit
fi
fi
if test "$ANS" != 's'; then
echo "------> $FILE <------"
bzip2 -cdfq "$FILE" | eval $more
fi
if test -t; then
FIRST=0
fi
done
fi

152
bzmore.1 Normal file
View File

@ -0,0 +1,152 @@
.\"Shamelessly copied from zmore.1 by Philippe Troin <phil@fifi.org>
.\"for Debian GNU/Linux
.TH BZMORE 1
.SH NAME
bzmore, bzless \- file perusal filter for crt viewing of bzip2 compressed text
.SH SYNOPSIS
.B bzmore
[ name ... ]
.br
.B bzless
[ name ... ]
.SH NOTE
In the following description,
.I bzless
and
.I less
can be used interchangeably with
.I bzmore
and
.I more.
.SH DESCRIPTION
.I Bzmore
is a filter which allows examination of compressed or plain text files
one screenful at a time on a soft-copy terminal.
.I bzmore
works on files compressed with
.I bzip2
and also on uncompressed files.
If a file does not exist,
.I bzmore
looks for a file of the same name with the addition of a .bz2 suffix.
.PP
.I Bzmore
normally pauses after each screenful, printing --More--
at the bottom of the screen.
If the user then types a carriage return, one more line is displayed.
If the user hits a space,
another screenful is displayed. Other possibilities are enumerated later.
.PP
.I Bzmore
looks in the file
.I /etc/termcap
to determine terminal characteristics,
and to determine the default window size.
On a terminal capable of displaying 24 lines,
the default window size is 22 lines.
Other sequences which may be typed when
.I bzmore
pauses, and their effects, are as follows (\fIi\fP is an optional integer
argument, defaulting to 1) :
.PP
.IP \fIi\|\fP<space>
display
.I i
more lines, (or another screenful if no argument is given)
.PP
.IP ^D
display 11 more lines (a ``scroll'').
If
.I i
is given, then the scroll size is set to \fIi\|\fP.
.PP
.IP d
same as ^D (control-D)
.PP
.IP \fIi\|\fPz
same as typing a space except that \fIi\|\fP, if present, becomes the new
window size. Note that the window size reverts back to the default at the
end of the current file.
.PP
.IP \fIi\|\fPs
skip \fIi\|\fP lines and print a screenful of lines
.PP
.IP \fIi\|\fPf
skip \fIi\fP screenfuls and print a screenful of lines
.PP
.IP "q or Q"
quit reading the current file; go on to the next (if any)
.PP
.IP "e or q"
When the prompt --More--(Next file:
.IR file )
is printed, this command causes bzmore to exit.
.PP
.IP s
When the prompt --More--(Next file:
.IR file )
is printed, this command causes bzmore to skip the next file and continue.
.PP
.IP =
Display the current line number.
.PP
.IP \fIi\|\fP/expr
search for the \fIi\|\fP-th occurrence of the regular expression \fIexpr.\fP
If the pattern is not found,
.I bzmore
goes on to the next file (if any).
Otherwise, a screenful is displayed, starting two lines before the place
where the expression was found.
The user's erase and kill characters may be used to edit the regular
expression.
Erasing back past the first column cancels the search command.
.PP
.IP \fIi\|\fPn
search for the \fIi\|\fP-th occurrence of the last regular expression entered.
.PP
.IP !command
invoke a shell with \fIcommand\|\fP.
The character `!' in "command" are replaced with the
previous shell command. The sequence "\\!" is replaced by "!".
.PP
.IP ":q or :Q"
quit reading the current file; go on to the next (if any)
(same as q or Q).
.PP
.IP .
(dot) repeat the previous command.
.PP
The commands take effect immediately, i.e., it is not necessary to
type a carriage return.
Up to the time when the command character itself is given,
the user may hit the line kill character to cancel the numerical
argument being formed.
In addition, the user may hit the erase character to redisplay the
--More-- message.
.PP
At any time when output is being sent to the terminal, the user can
hit the quit key (normally control\-\\).
.I Bzmore
will stop sending output, and will display the usual --More--
prompt.
The user may then enter one of the above commands in the normal manner.
Unfortunately, some output is lost when this is done, due to the
fact that any characters waiting in the terminal's output queue
are flushed when the quit signal occurs.
.PP
The terminal is set to
.I noecho
mode by this program so that the output can be continuous.
What you type will thus not show on your terminal, except for the / and !
commands.
.PP
If the standard output is not a teletype, then
.I bzmore
acts just like
.I bzcat,
except that a header is printed before each file.
.SH FILES
.DT
/etc/termcap Terminal data base
.SH "SEE ALSO"
more(1), less(1), bzip2(1), bzdiff(1), bzgrep(1)

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -663,10 +663,10 @@ void BZ2_compressBlock ( EState* s, Bool is_last_block )
/*-- If this is the first block, create the stream header. --*/ /*-- If this is the first block, create the stream header. --*/
if (s->blockNo == 1) { if (s->blockNo == 1) {
BZ2_bsInitWrite ( s ); BZ2_bsInitWrite ( s );
bsPutUChar ( s, 'B' ); bsPutUChar ( s, BZ_HDR_B );
bsPutUChar ( s, 'Z' ); bsPutUChar ( s, BZ_HDR_Z );
bsPutUChar ( s, 'h' ); bsPutUChar ( s, BZ_HDR_h );
bsPutUChar ( s, (UChar)('0' + s->blockSize100k) ); bsPutUChar ( s, (UChar)(BZ_HDR_0 + s->blockSize100k) );
} }
if (s->nblock > 0) { if (s->nblock > 0) {

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -235,18 +235,18 @@ Int32 BZ2_decompress ( DState* s )
switch (s->state) { switch (s->state) {
GET_UCHAR(BZ_X_MAGIC_1, uc); GET_UCHAR(BZ_X_MAGIC_1, uc);
if (uc != 'B') RETURN(BZ_DATA_ERROR_MAGIC); if (uc != BZ_HDR_B) RETURN(BZ_DATA_ERROR_MAGIC);
GET_UCHAR(BZ_X_MAGIC_2, uc); GET_UCHAR(BZ_X_MAGIC_2, uc);
if (uc != 'Z') RETURN(BZ_DATA_ERROR_MAGIC); if (uc != BZ_HDR_Z) RETURN(BZ_DATA_ERROR_MAGIC);
GET_UCHAR(BZ_X_MAGIC_3, uc) GET_UCHAR(BZ_X_MAGIC_3, uc)
if (uc != 'h') RETURN(BZ_DATA_ERROR_MAGIC); if (uc != BZ_HDR_h) RETURN(BZ_DATA_ERROR_MAGIC);
GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8) GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8)
if (s->blockSize100k < '1' || if (s->blockSize100k < (BZ_HDR_0 + 1) ||
s->blockSize100k > '9') RETURN(BZ_DATA_ERROR_MAGIC); s->blockSize100k > (BZ_HDR_0 + 9)) RETURN(BZ_DATA_ERROR_MAGIC);
s->blockSize100k -= '0'; s->blockSize100k -= BZ_HDR_0;
if (s->smallDecompress) { if (s->smallDecompress) {
s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) ); s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) );

View File

@ -19,7 +19,7 @@
#ifdef _WIN32 #ifdef _WIN32
#define BZ2_LIBNAME "libbz2-1.0.0.DLL" #define BZ2_LIBNAME "libbz2-1.0.2.DLL"
#include <windows.h> #include <windows.h>
static int BZ2DLLLoaded = 0; static int BZ2DLLLoaded = 0;
@ -130,8 +130,8 @@ int main(int argc,char *argv[])
}else{ }else{
fp_w = stdout; fp_w = stdout;
} }
if((BZ2fp_r == NULL && (BZ2fp_r = BZ2_bzdopen(fileno(stdin),"rb"))==NULL) if((fn_r == NULL && (BZ2fp_r = BZ2_bzdopen(fileno(stdin),"rb"))==NULL)
|| (BZ2fp_r != NULL && (BZ2fp_r = BZ2_bzopen(fn_r,"rb"))==NULL)){ || (fn_r != NULL && (BZ2fp_r = BZ2_bzopen(fn_r,"rb"))==NULL)){
printf("can't bz2openstream\n"); printf("can't bz2openstream\n");
exit(1); exit(1);
} }

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions

View File

@ -4,7 +4,7 @@
# Fixed up by JRS for bzip2-0.9.5d release. # Fixed up by JRS for bzip2-0.9.5d release.
CC=cl CC=cl
CFLAGS= -DWIN32 -MD -Ox -D_FILE_OFFSET_BITS=64 CFLAGS= -DWIN32 -MD -Ox -D_FILE_OFFSET_BITS=64 -nologo
OBJS= blocksort.obj \ OBJS= blocksort.obj \
huffman.obj \ huffman.obj \

View File

@ -2,10 +2,10 @@
@setfilename bzip2.info @setfilename bzip2.info
@ignore @ignore
This file documents bzip2 version 1.0, and associated library This file documents bzip2 version 1.0.2, and associated library
libbzip2, written by Julian Seward (jseward@acm.org). libbzip2, written by Julian Seward (jseward@acm.org).
Copyright (C) 1996-2000 Julian R Seward Copyright (C) 1996-2002 Julian R Seward
Permission is granted to make and distribute verbatim copies of Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice this manual provided the copyright notice and this permission notice
@ -30,8 +30,8 @@ END-INFO-DIR-ENTRY
@titlepage @titlepage
@title bzip2 and libbzip2 @title bzip2 and libbzip2
@subtitle a program and library for data compression @subtitle a program and library for data compression
@subtitle copyright (C) 1996-2000 Julian Seward @subtitle copyright (C) 1996-2002 Julian Seward
@subtitle version 1.0 of 21 March 2000 @subtitle version 1.0.2 of 30 December 2001
@author Julian Seward @author Julian Seward
@end titlepage @end titlepage
@ -40,11 +40,17 @@ END-INFO-DIR-ENTRY
@parskip 2mm @parskip 2mm
@end iftex @end iftex
@node Top, Overview, (dir), (dir) @node Top,,, (dir)
The following text is the License for this software. You should
find it identical to that contained in the file LICENSE in the
source distribution.
@bf{------------------ START OF THE LICENSE ------------------}
This program, @code{bzip2}, This program, @code{bzip2},
and associated library @code{libbzip2}, are and associated library @code{libbzip2}, are
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions
@ -82,14 +88,16 @@ Julian Seward, Cambridge, UK.
@code{jseward@@acm.org} @code{jseward@@acm.org}
@code{http://sourceware.cygnus.com/bzip2} @code{bzip2}/@code{libbzip2} version 1.0.2 of 30 December 2001.
@bf{------------------ END OF THE LICENSE ------------------}
Web sites:
@code{http://sources.redhat.com/bzip2}
@code{http://www.cacheprof.org} @code{http://www.cacheprof.org}
@code{http://www.muraroa.demon.co.uk}
@code{bzip2}/@code{libbzip2} version 1.0 of 21 March 2000.
PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
algorithms. However, I do not have the resources available to carry out algorithms. However, I do not have the resources available to carry out
a full patent search. Therefore I cannot give any guarantee of the a full patent search. Therefore I cannot give any guarantee of the
@ -101,7 +109,6 @@ above statement.
@node Overview, Implementation, Top, Top
@chapter Introduction @chapter Introduction
@code{bzip2} compresses files using the Burrows-Wheeler @code{bzip2} compresses files using the Burrows-Wheeler
@ -134,7 +141,7 @@ and nothing else.
@unnumberedsubsubsec NAME @unnumberedsubsubsec NAME
@itemize @itemize
@item @code{bzip2}, @code{bunzip2} @item @code{bzip2}, @code{bunzip2}
- a block-sorting file compressor, v1.0 - a block-sorting file compressor, v1.0.2
@item @code{bzcat} @item @code{bzcat}
- decompresses files to stdout - decompresses files to stdout
@item @code{bzip2recover} @item @code{bzip2recover}
@ -264,6 +271,11 @@ This really performs a trial decompression and throws away the result.
Force overwrite of output files. Normally, @code{bzip2} will not overwrite Force overwrite of output files. Normally, @code{bzip2} will not overwrite
existing output files. Also forces @code{bzip2} to break hard links existing output files. Also forces @code{bzip2} to break hard links
to files, which it otherwise wouldn't do. to files, which it otherwise wouldn't do.
@code{bzip2} normally declines to decompress files which don't have the
correct magic header bytes. If forced (@code{-f}), however, it will
pass such files through unmodified. This is how GNU @code{gzip}
behaves.
@item -k --keep @item -k --keep
Keep (don't delete) input files during compression Keep (don't delete) input files during compression
or decompression. or decompression.
@ -286,9 +298,13 @@ Further @code{-v}'s increase the verbosity level, spewing out lots of
information which is primarily of interest for diagnostic purposes. information which is primarily of interest for diagnostic purposes.
@item -L --license -V --version @item -L --license -V --version
Display the software version, license terms and conditions. Display the software version, license terms and conditions.
@item -1 to -9 @item -1 (or --fast) to -9 (or --best)
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
effect when decompressing. See MEMORY MANAGEMENT below. effect when decompressing. See MEMORY MANAGEMENT below.
The @code{--fast} and @code{--best} aliases are primarily for GNU
@code{gzip} compatibility. In particular, @code{--fast} doesn't make
things significantly faster. And @code{--best} merely selects the
default behaviour.
@item -- @item --
Treats all subsequent arguments as file names, even if they start Treats all subsequent arguments as file names, even if they start
with a dash. This is so you can handle files with names beginning with a dash. This is so you can handle files with names beginning
@ -389,21 +405,19 @@ integrity of the resulting files, and decompress those which are
undamaged. undamaged.
@code{bzip2recover} @code{bzip2recover}
takes a single argument, the name of the damaged file, takes a single argument, the name of the damaged file, and writes a
and writes a number of files @code{rec0001file.bz2}, number of files @code{rec00001file.bz2}, @code{rec00002file.bz2}, etc,
@code{rec0002file.bz2}, etc, containing the extracted blocks. containing the extracted blocks. The output filenames are designed so
The output filenames are designed so that the use of that the use of wildcards in subsequent processing -- for example,
wildcards in subsequent processing -- for example, @code{bzip2 -dc rec*file.bz2 > recovered_data} -- processes the files in
@code{bzip2 -dc rec*file.bz2 > recovered_data} -- lists the files in the correct order.
the correct order.
@code{bzip2recover} should be of most use dealing with large @code{.bz2} @code{bzip2recover} should be of most use dealing with large @code{.bz2}
files, as these will contain many blocks. It is clearly files, as these will contain many blocks. It is clearly futile to use
futile to use it on damaged single-block files, since a it on damaged single-block files, since a damaged block cannot be
damaged block cannot be recovered. If you wish to minimise recovered. If you wish to minimise any potential data loss through
any potential data loss through media or transmission errors, media or transmission errors, you might consider compressing with a
you might consider compressing with a smaller smaller block size.
block size.
@unnumberedsubsubsec PERFORMANCE NOTES @unnumberedsubsubsec PERFORMANCE NOTES
@ -435,22 +449,31 @@ I/O error messages are not as helpful as they could be. @code{bzip2}
tries hard to detect I/O errors and exit cleanly, but the details of tries hard to detect I/O errors and exit cleanly, but the details of
what the problem is sometimes seem rather misleading. what the problem is sometimes seem rather misleading.
This manual page pertains to version 1.0 of @code{bzip2}. Compressed This manual page pertains to version 1.0.2 of @code{bzip2}. Compressed
data created by this version is entirely forwards and backwards data created by this version is entirely forwards and backwards
compatible with the previous public releases, versions 0.1pl2, 0.9.0 and compatible with the previous public releases, versions 0.1pl2, 0.9.0,
0.9.5, but with the following exception: 0.9.0 and above can correctly 0.9.5, 1.0.0 and 1.0.1, but with the following exception: 0.9.0 and
decompress multiple concatenated compressed files. 0.1pl2 cannot do above can correctly decompress multiple concatenated compressed files.
this; it will stop after decompressing just the first file in the 0.1pl2 cannot do this; it will stop after decompressing just the first
stream. file in the stream.
@code{bzip2recover} versions prior to this one, 1.0.2, used 32-bit
integers to represent bit positions in compressed files, so it could not
handle compressed files more than 512 megabytes long. Version 1.0.2 and
above uses 64-bit ints on some platforms which support them (GNU
supported targets, and Windows). To establish whether or not
@code{bzip2recover} was built with such a limitation, run it without
arguments. In any event you can build yourself an unlimited version if
you can recompile it with @code{MaybeUInt64} set to be an unsigned
64-bit integer.
@code{bzip2recover} uses 32-bit integers to represent bit positions in
compressed files, so it cannot handle compressed files more than 512
megabytes long. This could easily be fixed.
@unnumberedsubsubsec AUTHOR @unnumberedsubsubsec AUTHOR
Julian Seward, @code{jseward@@acm.org}. Julian Seward, @code{jseward@@acm.org}.
@code{http://sources.redhat.com/bzip2}
The ideas embodied in @code{bzip2} are due to (at least) the following The ideas embodied in @code{bzip2} are due to (at least) the following
people: Michael Burrows and David Wheeler (for the block sorting people: Michael Burrows and David Wheeler (for the block sorting
transformation), David Wheeler (again, for the Huffman coder), Peter transformation), David Wheeler (again, for the Huffman coder), Peter
@ -461,8 +484,9 @@ indebted for their help, support and advice. See the manual in the
source distribution for pointers to sources of documentation. Christian source distribution for pointers to sources of documentation. Christian
von Roques encouraged me to look for faster sorting algorithms, so as to von Roques encouraged me to look for faster sorting algorithms, so as to
speed up compression. Bela Lubkin encouraged me to improve the speed up compression. Bela Lubkin encouraged me to improve the
worst-case compression performance. Many people sent patches, helped worst-case compression performance. The @code{bz*} scripts are derived
with portability problems, lent machines, gave advice and were generally from those of GNU @code{gzip}. Many people sent patches, helped with
portability problems, lent machines, gave advice and were generally
helpful. helpful.
@end quotation @end quotation
@ -1769,16 +1793,20 @@ was compiled with @code{BZ_NO_STDIO} set.
For a normal compile, an assertion failure yields the message For a normal compile, an assertion failure yields the message
@example @example
bzip2/libbzip2: internal error number N. bzip2/libbzip2: internal error number N.
This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000. This is a bug in bzip2/libbzip2, 1.0.2, 30-Dec-2001.
Please report it to me at: jseward@@acm.org. If this happened Please report it to me at: jseward@@acm.org. If this happened
when you were using some program which uses libbzip2 as a when you were using some program which uses libbzip2 as a
component, you should also report this bug to the author(s) component, you should also report this bug to the author(s)
of that program. Please make an effort to report this bug; of that program. Please make an effort to report this bug;
timely and accurate bug reports eventually lead to higher timely and accurate bug reports eventually lead to higher
quality software. Thanks. Julian Seward, 21 March 2000. quality software. Thanks. Julian Seward, 30 December 2001.
@end example @end example
where @code{N} is some error code number. @code{exit(3)} where @code{N} is some error code number. If @code{N == 1007}, it also
is then called. prints some extra text advising the reader that unreliable memory is
often associated with internal error 1007. (This is a
frequently-observed-phenomenon with versions 1.0.0/1.0.1).
@code{exit(3)} is then called.
For a @code{stdio}-free library, assertion failures result For a @code{stdio}-free library, assertion failures result
in a call to a function declared as: in a call to a function declared as:
@ -2056,10 +2084,10 @@ Maybe this isn't what you want.
If you want a compressor and/or library which is faster, uses less If you want a compressor and/or library which is faster, uses less
memory but gets pretty good compression, and has minimal latency, memory but gets pretty good compression, and has minimal latency,
consider Jean-loup consider Jean-loup
Gailly's and Mark Adler's work, @code{zlib-1.1.2} and Gailly's and Mark Adler's work, @code{zlib-1.1.3} and
@code{gzip-1.2.4}. Look for them at @code{gzip-1.2.4}. Look for them at
@code{http://www.cdrom.com/pub/infozip/zlib} and @code{http://www.zlib.org} and
@code{http://www.gzip.org} respectively. @code{http://www.gzip.org} respectively.
For something faster and lighter still, you might try Markus F X J For something faster and lighter still, you might try Markus F X J

16
mk251.c Normal file
View File

@ -0,0 +1,16 @@
/* Spew out a long sequence of the byte 251. When fed to bzip2
versions 1.0.0 or 1.0.1, causes it to die with internal error
1007 in blocksort.c. This assertion misses an extremely rare
case, which is fixed in this version (1.0.2) and above.
*/
#include <stdio.h>
int main ()
{
int i;
for (i = 0; i < 48500000 ; i++)
putchar(251);
return 0;
}

View File

@ -8,7 +8,7 @@
This file is a part of bzip2 and/or libbzip2, a program and This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression. library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved. Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions modification, are permitted provided that the following conditions

4
words3
View File

@ -15,8 +15,8 @@ not actually execute them.
Instructions for use are in the preformatted manual page, in the file Instructions for use are in the preformatted manual page, in the file
bzip2.txt. For more detailed documentation, read the full manual. bzip2.txt. For more detailed documentation, read the full manual.
It is available in Postscript form (manual.ps) and HTML form It is available in Postscript form (manual.ps), PDF form (manual.pdf),
(manual_toc.html). and HTML form (manual_toc.html).
You can also do "bzip2 --help" to see some helpful information. You can also do "bzip2 --help" to see some helpful information.
"bzip2 -L" displays the software license. "bzip2 -L" displays the software license.