blob: 2cb5461b55d80befab44337bb6565fe2fbf5524c [file] [log] [blame]
XZ Embedded
===========
XZ Embedded is a relatively small, limited implementation of the .xz
file format. Currently only decoding is implemented.
XZ Embedded was written for use in the Linux kernel, but the code can
be easily used in other environments too, including regular userspace
applications. See userspace/xzminidec.c for an example program.
NOTE: The version of XZ Embedded in the Linux kernel lacks a few
build-time-selectable optional features that are present in the
upstream XZ Embedded project: support for concatated .xz files,
CRC64, and ignoring unsupported check. These aren't in Linux
because they don't seem useful there but they would add to the
code size.
This README contains information that is useful only when the copy
of XZ Embedded isn't part of the Linux kernel tree. You should also
read linux/Documentation/staging/xz.rst even if you aren't using
XZ Embedded as part of Linux; information in that file is not
repeated in this README.
Conformance to the .xz file format specification
As of the .xz file format specification version 1.2.0, this
decompressor implementation has the following limitations:
- SHA-256 isn't supported. It can be ignored as an unsupported
checked type if that feature is enabled at build time.
- Delta filter is not included.
- BCJ filters don't support non-default start offset.
- LZMA2 supports at most 3 GiB dictionary.
There are a couple of corner cases where things have been simplified
at expense of detecting errors as early as possible. These should not
matter in practice at all since they don't cause security issues. But
it is good to know this if testing the code with the test files from
XZ Utils.
Compiler requirements
XZ Embedded should compile with any C99 or C11 compiler. The code
should also be GNU-C89 compatible still. GNU-C89 was used in the
Linux kernel until 2022. GNU-C89 support likely will be dropped
at some point.
Embedding into userspace applications
To embed the XZ decoder, copy the following files into a single
directory in your source code tree:
linux/include/linux/xz.h
linux/lib/xz/xz_crc32.c
linux/lib/xz/xz_dec_lzma2.c
linux/lib/xz/xz_dec_stream.c
linux/lib/xz/xz_lzma2.h
linux/lib/xz/xz_private.h
linux/lib/xz/xz_stream.h
userspace/xz_config.h
Alternatively, xz.h may be placed into a different directory but then
that directory must be in the compiler include path when compiling
the .c files.
Your code should use only the functions declared in xz.h. The rest of
the .h files are meant only for internal use in XZ Embedded.
You may want to modify xz_config.h to be more suitable for your build
environment. Probably you should at least skim through it even if the
default file works as is.
Supporting concatenated .xz files
Regular .xz files can be concatenated as is and the xz command line
tool will decompress all streams from a concatenated file (a few
other popular formats and tools support this too). This kind of .xz
files are more common than one might think because pxz, an early
threaded XZ compressor, created this kind of .xz files.
The xz_dec_run() function will stop after decompressing one stream.
This is good when XZ data is stored inside some other file format.
However, if one is decompressing regular standalone .xz files, one
will want to decompress all streams in the file. This is easy with
xz_dec_catrun(). To include support for xz_dec_catrun(), you need
to #define XZ_DEC_CONCATENATED in xz_config.h or in compiler flags.
Integrity check support
XZ Embedded always supports the integrity check types None and
CRC32. Support for CRC64 is optional. SHA-256 is currently not
supported in XZ Embedded although the .xz format does support it.
The xz tool from XZ Utils uses CRC64 by default, but CRC32 is usually
enough in embedded systems to keep the code size smaller.
If you want support for CRC64, you need to copy linux/lib/xz/xz_crc64.c
into your application, and #define XZ_USE_CRC64 in xz_config.h or in
compiler flags.
When using the internal CRC32 or CRC64, their lookup tables need to be
initialized with xz_crc32_init() and xz_crc64_init(), respectively.
See xz.h for details.
To use external CRC32 or CRC64 code instead of the code from
xz_crc32.c or xz_crc64.c, the following #defines may be used
in xz_config.h or in compiler flags:
#define XZ_INTERNAL_CRC32 0
#define XZ_INTERNAL_CRC64 0
Then it is up to you to provide compatible xz_crc32() or xz_crc64()
functions.
If the .xz file being decompressed uses an integrity check type that
isn't supported by XZ Embedded, it is treated as an error and the
file cannot be decompressed. For multi-call mode, this can be modified
by #defining XZ_DEC_ANY_CHECK. Then xz_dec_run() will return
XZ_UNSUPPORTED_CHECK when unsupported check type is detected. After
that decompression can be continued normally except that the
integrity check won't be verified. In single-call mode there's
no way to continue decoding, so XZ_DEC_ANY_CHECK is almost useless
in single-call mode.
BCJ filter support
If you want support for one or more BCJ filters, you need to copy
linux/lib/xz/xz_dec_bcj.c into your application, and use appropriate
#defines in xz_config.h or in compiler flags. You don't need these
#defines in the code that just uses XZ Embedded via xz.h, but having
them always #defined doesn't hurt either.
#define Instruction set BCJ filter endianness
XZ_DEC_X86 x86-32 or x86-64 Little endian only
XZ_DEC_POWERPC PowerPC Big endian only
XZ_DEC_IA64 Itanium (IA-64) Big or little endian
XZ_DEC_ARM ARM Little endian instructions
XZ_DEC_ARMTHUMB ARM-Thumb Big or little endian
XZ_DEC_ARM64 ARM64 Big or little endian
XZ_DEC_SPARC SPARC Big or little endian
XZ_DEC_RISCV RISC-V Big or little endian
While some architectures are (partially) bi-endian, the endianness
setting doesn't change the endianness of the instructions on all
architectures. That's why many filters work for both big and little
endian executables (Itanium and ARM based architectures have little
endian instructions and SPARC has big endian instructions).
Notes about shared libraries
If you are including XZ Embedded into a shared library, you should
rename the xz_* functions to prevent symbol conflicts in case your
library is linked against some other library or application that
also has XZ Embedded in it (which may even be a different version
of XZ Embedded).
Please don't create a shared library of XZ Embedded itself unless
it is fine to rebuild everything depending on that shared library
every time you upgrade to a newer version of XZ Embedded. There are
no API or ABI stability guarantees between different versions of
XZ Embedded.
Contact information
Email: Lasse Collin <[email protected]>
IRC: Larhzu on #tukaani on Libera Chat
GitHub: https://github.com/tukaani-project/xz-embedded