lz4frame: new API: LZ4F_compressBegin_usingDict()
Note: effectively limited to using the dictionary once for now,
as opposed to once per block when blocks are independent
(no impact when blocks are linked: dictionary is supposed to be used once anyway)
Also :
- clarifies that default lz4frame block size is 64 KB
- refactor LZ4 Frame spec, dictionary paragraph
- updated manual
diff --git a/doc/lz4_Frame_format.md b/doc/lz4_Frame_format.md
index fb31508..24f2734 100644
--- a/doc/lz4_Frame_format.md
+++ b/doc/lz4_Frame_format.md
@@ -16,7 +16,7 @@
### Version
-1.6.3 (12/09/2022)
+1.6.4 (28/12/2023)
Introduction
@@ -219,24 +219,26 @@
__Dictionary ID__
+A dictionary is useful to compress short input sequences.
+When present, the compressor can take advantage of dictionary's content
+as a kind of “known prefix” to encode the input in a more compact manner.
+
+When the frame descriptor defines independent blocks,
+every block is initialized with the same dictionary.
+If the frame descriptor defines linked blocks,
+the dictionary is only used once, at the beginning of the frame.
+
+The compressor and the decompressor must employ exactly the same dictionary for the data to be decodable.
+
+The Dict-ID field is offered as a way to help the decoder determine
+which dictionary must be used to correctly decode the compressed frame.
Dict-ID is only present if the associated flag is set.
It's an unsigned 32-bits value, stored using little-endian convention.
-A dictionary is useful to compress short input sequences.
-The compressor can take advantage of the dictionary context
-to encode the input in a more compact manner.
-It works as a kind of “known prefix” which is used by
-both the compressor and the decompressor to “warm-up” reference tables.
+Within a single frame, only a single Dict-ID field can be defined.
-The decompressor can use Dict-ID identifier to determine
-which dictionary must be used to correctly decode data.
-The compressor and the decompressor must use exactly the same dictionary.
-It's presumed that the 32-bits dictID uniquely identifies a dictionary.
-
-Within a single frame, a single dictionary can be defined.
-When the frame descriptor defines independent blocks,
-each block will be initialized with the same dictionary.
-If the frame descriptor defines linked blocks,
-the dictionary will only be used once, at the beginning of the frame.
+Note that the Dict-ID field is optional.
+Knowledge of which dictionary to employ can also be passed off-band,
+for example, it could be implied by the context of the application.
__Header Checksum__
@@ -397,6 +399,8 @@
Version changes
---------------
+1.6.4 : minor clarifications for Dictionaries
+
1.6.3 : minor : clarify Data Block
1.6.2 : clarifies specification of _EndMark_
@@ -429,6 +433,6 @@
0.6 : settled : stream size uses 8 bytes, endian convention is little endian
-0.5: added copyright notice
+0.5 : added copyright notice
0.4 : changed format to Google Doc compatible OpenDocument
diff --git a/doc/lz4_manual.html b/doc/lz4_manual.html
index 9cdd965..1a6df55 100644
--- a/doc/lz4_manual.html
+++ b/doc/lz4_manual.html
@@ -160,16 +160,17 @@
</p></pre><BR>
-<pre><b>int LZ4_compress_destSize (const char* src, char* dst, int* srcSizePtr, int targetDstSize);
+<pre><b>int LZ4_compress_destSize(const char* src, char* dst, int* srcSizePtr, int targetDstSize);
</b><p> Reverse the logic : compresses as much data as possible from 'src' buffer
- into already allocated buffer 'dst', of size >= 'targetDestSize'.
+ into already allocated buffer 'dst', of size >= 'dstCapacity'.
This function either compresses the entire 'src' content into 'dst' if it's large enough,
or fill 'dst' buffer completely with as much data as possible from 'src'.
note: acceleration parameter is fixed to "default".
- *srcSizePtr : will be modified to indicate how many bytes where read from 'src' to fill 'dst'.
+ *srcSizePtr : in+out parameter. Initially contains size of input.
+ Will be modified to indicate how many bytes where read from 'src' to fill 'dst'.
New value is necessarily <= input value.
- @return : Nb bytes written into 'dst' (necessarily <= targetDestSize)
+ @return : Nb bytes written into 'dst' (necessarily <= dstCapacity)
or 0 if compression fails.
Note : from v1.8.2 to v1.9.1, this function had a bug (fixed in v1.9.2+):
@@ -185,6 +186,12 @@
</p></pre><BR>
+<pre><b>int LZ4_compress_fast_extState_destSize(void* state, const char* src, char* dst, int *srcSizePtr, int dstCapacity, int acceleration);
+</b><p> Same as LZ4_compress_destSize(), using an externally allocated state.
+ Also exposes control of @acceleration.
+
+</p></pre><BR>
+
<pre><b>int LZ4_decompress_safe_partial (const char* src, char* dst, int srcSize, int targetOutputSize, int dstCapacity);
</b><p> Decompress an LZ4 compressed block, of size 'srcSize' at position 'src',
into destination buffer 'dst' of size 'dstCapacity'.
@@ -271,10 +278,10 @@
LZ4_loadDict() triggers a reset, so any previous data will be forgotten.
The same dictionary will have to be loaded on decompression side for successful decoding.
Dictionary are useful for better compression of small data (KB range).
- While LZ4 accept any input as dictionary,
- results are generally better when using Zstandard's Dictionary Builder.
+ While LZ4 itself accepts any input as dictionary, dictionary efficiency is also a topic.
+ When in doubt, employ the Zstandard's Dictionary Builder.
Loading a size of 0 is allowed, and is the same as reset.
- @return : loaded dictionary size, in bytes (necessarily <= 64 KB)
+ @return : loaded dictionary size, in bytes (note: only the last 64 KB are loaded)
</p></pre><BR>
@@ -446,6 +453,12 @@
</p></pre><BR>
+<pre><b>int LZ4_compress_destSize_extState(void* state, const char* src, char* dst, int* srcSizePtr, int targetDstSize, int acceleration);
+</b><p> Same as LZ4_compress_destSize(), but using an externally allocated state.
+ Also: exposes @acceleration
+
+</p></pre><BR>
+
<pre><b>LZ4LIB_STATIC_API void
LZ4_attach_dictionary(LZ4_stream_t* workingStream,
const LZ4_stream_t* dictionaryStream);
@@ -542,7 +555,7 @@
If you need static allocation, declare or allocate an LZ4_stream_t object.
</p></pre><BR>
-<pre><b>LZ4_stream_t* LZ4_initStream (void* buffer, size_t size);
+<pre><b>LZ4_stream_t* LZ4_initStream (void* stateBuffer, size_t size);
</b><p> An LZ4_stream_t structure must be initialized at least once.
This is automatically done when invoking LZ4_createStream(),
but it's not when the structure is simply declared on stack (for example).
diff --git a/doc/lz4frame_manual.html b/doc/lz4frame_manual.html
index 1246b53..c6e0dfe 100644
--- a/doc/lz4frame_manual.html
+++ b/doc/lz4frame_manual.html
@@ -18,7 +18,10 @@
<li><a href="#Chapter8">Compression</a></li>
<li><a href="#Chapter9">Decompression functions</a></li>
<li><a href="#Chapter10">Streaming decompression functions</a></li>
-<li><a href="#Chapter11">Bulk processing dictionary API</a></li>
+<li><a href="#Chapter11">Advanced compression operations</a></li>
+<li><a href="#Chapter12">Dictionary compression API</a></li>
+<li><a href="#Chapter13">Bulk processing dictionary API</a></li>
+<li><a href="#Chapter14">Custom memory allocation</a></li>
</ol>
<hr>
<a name="Chapter1"></a><h2>Introduction</h2><pre>
@@ -76,13 +79,13 @@
} LZ4F_frameType_t;
</b></pre><BR>
<pre><b>typedef struct {
- LZ4F_blockSizeID_t blockSizeID; </b>/* max64KB, max256KB, max1MB, max4MB; 0 == default */<b>
- LZ4F_blockMode_t blockMode; </b>/* LZ4F_blockLinked, LZ4F_blockIndependent; 0 == default */<b>
- LZ4F_contentChecksum_t contentChecksumFlag; </b>/* 1: frame terminated with 32-bit checksum of decompressed data; 0: disabled (default) */<b>
+ LZ4F_blockSizeID_t blockSizeID; </b>/* max64KB, max256KB, max1MB, max4MB; 0 == default (LZ4F_max64KB) */<b>
+ LZ4F_blockMode_t blockMode; </b>/* LZ4F_blockLinked, LZ4F_blockIndependent; 0 == default (LZ4F_blockLinked) */<b>
+ LZ4F_contentChecksum_t contentChecksumFlag; </b>/* 1: add a 32-bit checksum of frame's decompressed data; 0 == default (disabled) */<b>
LZ4F_frameType_t frameType; </b>/* read-only field : LZ4F_frame or LZ4F_skippableFrame */<b>
unsigned long long contentSize; </b>/* Size of uncompressed content ; 0 == unknown */<b>
unsigned dictID; </b>/* Dictionary ID, sent by compressor to help decoder select correct dictionary; 0 == no dictID provided */<b>
- LZ4F_blockChecksum_t blockChecksumFlag; </b>/* 1: each block followed by a checksum of block's compressed data; 0: disabled (default) */<b>
+ LZ4F_blockChecksum_t blockChecksumFlag; </b>/* 1: each block followed by a checksum of block's compressed data; 0 == default (disabled) */<b>
} LZ4F_frameInfo_t;
</b><p> makes it possible to set or read frame parameters.
Structure must be first init to 0, using memset() or LZ4F_INIT_FRAMEINFO,
@@ -105,14 +108,6 @@
<a name="Chapter5"></a><h2>Simple compression function</h2><pre></pre>
-<pre><b>size_t LZ4F_compressFrameBound(size_t srcSize, const LZ4F_preferences_t* preferencesPtr);
-</b><p> Returns the maximum possible compressed size with LZ4F_compressFrame() given srcSize and preferences.
- `preferencesPtr` is optional. It can be replaced by NULL, in which case, the function will assume default preferences.
- Note : this result is only usable with LZ4F_compressFrame().
- It may also be relevant to LZ4F_compressUpdate() _only if_ no flush() operation is ever performed.
-
-</p></pre><BR>
-
<pre><b>size_t LZ4F_compressFrame(void* dstBuffer, size_t dstCapacity,
const void* srcBuffer, size_t srcSize,
const LZ4F_preferences_t* preferencesPtr);
@@ -134,6 +129,19 @@
</p></pre><BR>
+<pre><b>size_t LZ4F_compressFrameBound(size_t srcSize, const LZ4F_preferences_t* preferencesPtr);
+</b><p> Returns the maximum possible compressed size with LZ4F_compressFrame() given srcSize and preferences.
+ `preferencesPtr` is optional. It can be replaced by NULL, in which case, the function will assume default preferences.
+ Note : this result is only usable with LZ4F_compressFrame().
+ It may also be relevant to LZ4F_compressUpdate() _only if_ no flush() operation is ever performed.
+
+</p></pre><BR>
+
+<pre><b>int LZ4F_compressionLevel_max(void); </b>/* v1.8.0+ */<b>
+</b><p> @return maximum allowed compression level (currently: 12)
+
+</p></pre><BR>
+
<a name="Chapter6"></a><h2>Advanced compression functions</h2><pre></pre>
<pre><b>typedef struct {
@@ -372,9 +380,12 @@
<pre><b>typedef enum { LZ4F_LIST_ERRORS(LZ4F_GENERATE_ENUM)
_LZ4F_dummy_error_enum_for_c89_never_used } LZ4F_errorCodes;
</b></pre><BR>
+<a name="Chapter11"></a><h2>Advanced compression operations</h2><pre></pre>
+
<pre><b>LZ4FLIB_STATIC_API size_t LZ4F_getBlockSize(LZ4F_blockSizeID_t blockSizeID);
-</b><p> Return, in scalar format (size_t),
- the maximum block size associated with blockSizeID.
+</b><p> @return, in scalar format (size_t),
+ the maximum block size associated with @blockSizeID,
+ or an error code (can be tested using LZ4F_isError()) if @blockSizeID is invalid.
</p></pre><BR>
<pre><b>LZ4FLIB_STATIC_API size_t
@@ -382,21 +393,54 @@
void* dstBuffer, size_t dstCapacity,
const void* srcBuffer, size_t srcSize,
const LZ4F_compressOptions_t* cOptPtr);
-</b><p> LZ4F_uncompressedUpdate() can be called repetitively to add as much data uncompressed data as necessary.
+</b><p> LZ4F_uncompressedUpdate() can be called repetitively to add data stored as uncompressed blocks.
Important rule: dstCapacity MUST be large enough to store the entire source buffer as
no compression is done for this operation
If this condition is not respected, LZ4F_uncompressedUpdate() will fail (result is an errorCode).
After an error, the state is left in a UB state, and must be re-initialized or freed.
- If previously a compressed block was written, buffered data is flushed
+ If previously a compressed block was written, buffered data is flushed first,
before appending uncompressed data is continued.
- This is only supported when LZ4F_blockIndependent is used
+ This operation is only supported when LZ4F_blockIndependent is used.
`cOptPtr` is optional : NULL can be provided, in which case all options are set to default.
@return : number of bytes written into `dstBuffer` (it can be zero, meaning input data was just buffered).
or an error code if it fails (which can be tested using LZ4F_isError())
</p></pre><BR>
-<a name="Chapter11"></a><h2>Bulk processing dictionary API</h2><pre></pre>
+<a name="Chapter12"></a><h2>Dictionary compression API</h2><pre></pre>
+
+<pre><b>LZ4FLIB_STATIC_API size_t
+LZ4F_compressBegin_usingDict(LZ4F_cctx* cctx,
+ void* dstBuffer, size_t dstCapacity,
+ const void* dictBuffer, size_t dictSize,
+ const LZ4F_preferences_t* prefsPtr);
+</b><p> Inits dictionary compression streaming, and writes the frame header into dstBuffer.
+ `dstCapacity` must be >= LZ4F_HEADER_SIZE_MAX bytes.
+ `prefsPtr` is optional : you may provide NULL as argument,
+ however, it's the only way to provide dictID in the frame header.
+ `dictBuffer` must outlive the compression session.
+ @return : number of bytes written into dstBuffer for the header,
+ or an error code (which can be tested using LZ4F_isError())
+ NOTE: this entry point doesn't fully follow the spec:
+ when a frame consists of independent blocks,
+ each block should be compressed using the dictionary.
+ But currently, only the first block uses the dictionary.
+ This is still decodable, but less efficient.
+
+</p></pre><BR>
+
+<pre><b>LZ4FLIB_STATIC_API size_t
+LZ4F_decompress_usingDict(LZ4F_dctx* dctxPtr,
+ void* dstBuffer, size_t* dstSizePtr,
+ const void* srcBuffer, size_t* srcSizePtr,
+ const void* dict, size_t dictSize,
+ const LZ4F_decompressOptions_t* decompressOptionsPtr);
+</b><p> Same as LZ4F_decompress(), using a predefined dictionary.
+ Dictionary is used "in place", without any preprocessing.
+ It must remain accessible throughout the entire frame decoding.
+</p></pre><BR>
+
+<a name="Chapter13"></a><h2>Bulk processing dictionary API</h2><pre></pre>
<pre><b>LZ4FLIB_STATIC_API LZ4F_CDict* LZ4F_createCDict(const void* dictBuffer, size_t dictSize);
LZ4FLIB_STATIC_API void LZ4F_freeCDict(LZ4F_CDict* CDict);
@@ -429,23 +473,15 @@
const LZ4F_CDict* cdict,
const LZ4F_preferences_t* prefsPtr);
</b><p> Inits streaming dictionary compression, and writes the frame header into dstBuffer.
- dstCapacity must be >= LZ4F_HEADER_SIZE_MAX bytes.
+ `dstCapacity` must be >= LZ4F_HEADER_SIZE_MAX bytes.
`prefsPtr` is optional : you may provide NULL as argument,
however, it's the only way to provide dictID in the frame header.
+ `cdict` must outlive the compression session.
@return : number of bytes written into dstBuffer for the header,
or an error code (which can be tested using LZ4F_isError())
</p></pre><BR>
-<pre><b>LZ4FLIB_STATIC_API size_t
-LZ4F_decompress_usingDict(LZ4F_dctx* dctxPtr,
- void* dstBuffer, size_t* dstSizePtr,
- const void* srcBuffer, size_t* srcSizePtr,
- const void* dict, size_t dictSize,
- const LZ4F_decompressOptions_t* decompressOptionsPtr);
-</b><p> Same as LZ4F_decompress(), using a predefined dictionary.
- Dictionary is used "in place", without any preprocessing.
- It must remain accessible throughout the entire frame decoding.
-</p></pre><BR>
+<a name="Chapter14"></a><h2>Custom memory allocation</h2><pre></pre>
<pre><b>typedef void* (*LZ4F_AllocFunction) (void* opaqueState, size_t size);
typedef void* (*LZ4F_CallocFunction) (void* opaqueState, size_t size);