unpublish static-only function

these functions are now unpublished in dll by default.
One needs to opt-in, using macro LZ4_PUBLISH_STATIC_FUNCTIONS.

used this opportunity to update a bunch of api comments in lz4.h
diff --git a/doc/lz4_manual.html b/doc/lz4_manual.html
index 6ebf8d2..39a48f3 100644
--- a/doc/lz4_manual.html
+++ b/doc/lz4_manual.html
@@ -21,7 +21,7 @@
 </ol>
 <hr>
 <a name="Chapter1"></a><h2>Introduction</h2><pre>
-  LZ4 is lossless compression algorithm, providing compression speed at 400 MB/s per core,
+  LZ4 is lossless compression algorithm, providing compression speed at 500 MB/s per core,
   scalable with multi-cores CPU. It features an extremely fast decoder, with speed in
   multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.
 
@@ -37,15 +37,15 @@
 
   An additional format, called LZ4 frame specification (doc/lz4_Frame_format.md),
   take care of encoding standard metadata alongside LZ4-compressed blocks.
-  If your application requires interoperability, it's recommended to use it.
-  A library is provided to take care of it, see lz4frame.h.
+  Frame format is required for interoperability.
+  It is delivered through a companion API, declared in lz4frame.h.
 <BR></pre>
 
 <a name="Chapter2"></a><h2>Version</h2><pre></pre>
 
 <pre><b>int LZ4_versionNumber (void);  </b>/**< library version number; useful to check dll version */<b>
 </b></pre><BR>
-<pre><b>const char* LZ4_versionString (void);   </b>/**< library version string; unseful to check dll version */<b>
+<pre><b>const char* LZ4_versionString (void);   </b>/**< library version string; useful to check dll version */<b>
 </b></pre><BR>
 <a name="Chapter3"></a><h2>Tuning parameter</h2><pre></pre>
 
@@ -53,8 +53,8 @@
 # define LZ4_MEMORY_USAGE 14
 #endif
 </b><p> Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
- Increasing memory usage improves compression ratio
- Reduced memory usage may improve speed, thanks to cache effect
+ Increasing memory usage improves compression ratio.
+ Reduced memory usage may improve speed, thanks to better cache locality.
  Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache
  
 </p></pre><BR>
@@ -68,21 +68,21 @@
     It also runs faster, so it's a recommended setting.
     If the function cannot compress 'src' into a more limited 'dst' budget,
     compression stops *immediately*, and the function result is zero.
-    Note : as a consequence, 'dst' content is not valid.
-    Note 2 : This function is protected against buffer overflow scenarios (never writes outside 'dst' buffer, nor read outside 'source' buffer).
+    In which case, 'dst' content is undefined (invalid).
         srcSize : max supported value is LZ4_MAX_INPUT_SIZE.
         dstCapacity : size of buffer 'dst' (which must be already allocated)
-        return  : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
-                  or 0 if compression fails 
+       @return  : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
+                  or 0 if compression fails
+    Note : This function is protected against buffer overflow scenarios (never writes outside 'dst' buffer, nor read outside 'source' buffer).
 </p></pre><BR>
 
 <pre><b>int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);
 </b><p>    compressedSize : is the exact complete size of the compressed block.
     dstCapacity : is the size of destination buffer, which must be already allocated.
-    return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
+   @return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
              If destination buffer is not large enough, decoding will stop and output an error code (negative value).
              If the source stream is detected malformed, the function will stop decoding and return a negative result.
-             This function is protected against malicious data packets.
+    Note : This function is protected against malicious data packets (never writes outside 'dst' buffer, nor read outside 'source' buffer).
 </p></pre><BR>
 
 <a name="Chapter5"></a><h2>Advanced Functions</h2><pre></pre>
@@ -107,10 +107,11 @@
 
 <pre><b>int LZ4_sizeofState(void);
 int LZ4_compress_fast_extState (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
-</b><p>    Same compression function, just using an externally allocated memory space to store compression state.
-    Use LZ4_sizeofState() to know how much memory must be allocated,
-    and allocate it on 8-bytes boundaries (using malloc() typically).
-    Then, provide this buffer as 'void* state' to compression function.
+</b><p>  Same as LZ4_compress_fast(), using an externally allocated memory space for its state.
+  Use LZ4_sizeofState() to know how much memory must be allocated,
+  and allocate it on 8-bytes boundaries (using `malloc()` typically).
+  Then, provide this buffer as `void* state` to compression function.
+ 
 </p></pre><BR>
 
 <pre><b>int LZ4_compress_destSize (const char* src, char* dst, int* srcSizePtr, int targetDstSize);
@@ -132,7 +133,7 @@
   and now `LZ4_decompress_safe()` can be as fast and sometimes faster than `LZ4_decompress_fast()`.
   Moreover, LZ4_decompress_fast() is not protected vs malformed input, as it doesn't perform full validation of compressed data.
   As a consequence, this function is no longer recommended, and may be deprecated in future versions.
-  It's only remaining specificity is that it can decompress data without knowing its compressed size.
+  It's last remaining specificity is that it can decompress data without knowing its compressed size.
 
   originalSize : is the uncompressed size to regenerate.
                  `dst` must be already allocated, its size must be >= 'originalSize' bytes.
@@ -175,13 +176,6 @@
 
 <a name="Chapter6"></a><h2>Streaming Compression Functions</h2><pre></pre>
 
-<pre><b>LZ4_stream_t* LZ4_createStream(void);
-int           LZ4_freeStream (LZ4_stream_t* streamPtr);
-</b><p>  LZ4_createStream() will allocate and initialize an `LZ4_stream_t` structure.
-  LZ4_freeStream() releases its memory.
- 
-</p></pre><BR>
-
 <pre><b>void LZ4_resetStream (LZ4_stream_t* streamPtr);
 </b><p>  An LZ4_stream_t structure can be allocated once and re-used multiple times.
   Use this function to start compressing a new stream.
@@ -198,7 +192,7 @@
 
 <pre><b>int LZ4_compress_fast_continue (LZ4_stream_t* streamPtr, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
 </b><p>  Compress 'src' content using data from previously compressed blocks, for better compression ratio.
-  'dst' buffer must be already allocated.
+ 'dst' buffer must be already allocated.
   If dstCapacity >= LZ4_compressBound(srcSize), compression is guaranteed to succeed, and runs faster.
 
  @return : size of compressed block
@@ -206,10 +200,10 @@
 
   Note 1 : Each invocation to LZ4_compress_fast_continue() generates a new block.
            Each block has precise boundaries.
+           Each block must be decompressed separately, calling LZ4_decompress_*() with relevant metadata.
            It's not possible to append blocks together and expect a single invocation of LZ4_decompress_*() to decompress them together.
-           Each block must be decompressed separately, calling LZ4_decompress_*() with associated metadata.
 
-  Note 2 : The previous 64KB of source data is __assumed__ to remain present, unmodified, at same address in memory!
+  Note 2 : The previous 64KB of source data is __assumed__ to remain present, unmodified, at same address in memory !
 
   Note 3 : When input is structured as a double-buffer, each buffer can have any size, including < 64 KB.
            Make sure that buffers are separated, by at least one byte.
@@ -217,7 +211,7 @@
 
   Note 4 : If input buffer is a ring-buffer, it can have any size, including < 64 KB.
 
-  Note 5 : After an error, the stream status is invalid, it can only be reset or freed.
+  Note 5 : After an error, the stream status is undefined (invalid), it can only be reset or freed.
  
 </p></pre><BR>
 
@@ -250,7 +244,7 @@
 </p></pre><BR>
 
 <pre><b>int LZ4_decoderRingBufferSize(int maxBlockSize);
-#define LZ4_DECODER_RING_BUFFER_SIZE(mbs) (65536 + 14 + (mbs))  </b>/* for static allocation; mbs presumed valid */<b>
+#define LZ4_DECODER_RING_BUFFER_SIZE(maxBlockSize) (65536 + 14 + (maxBlockSize))  </b>/* for static allocation; maxBlockSize presumed valid */<b>
 </b><p>  Note : in a ring buffer scenario (optional),
   blocks are presumed decompressed next to each other
   up to the moment there is not enough remaining space for next block (remainingSize < maxBlockSize),
@@ -295,32 +289,34 @@
 </b><p>  These decoding functions work the same as
   a combination of LZ4_setStreamDecode() followed by LZ4_decompress_*_continue()
   They are stand-alone, and don't need an LZ4_streamDecode_t structure.
-  Dictionary is presumed stable : it must remain accessible and unmodified during next decompression.
+  Dictionary is presumed stable : it must remain accessible and unmodified during decompression.
+  Performance tip : Decompression speed can be substantially increased
+                    when dst == dictStart + dictSize.
  
 </p></pre><BR>
 
 <a name="Chapter8"></a><h2>Unstable declarations</h2><pre>
- Declarations in this section should be considered unstable.
- Use at your own peril, etc., etc.
- They may be removed in the future.
- Their signatures may change.
+ Declarations in this section must be considered unstable.
+ Their signatures may change, or may be removed in the future.
+ They are therefore only safe to depend on
+ when the caller is statically linked against the library.
+ To access their declarations, define LZ4_STATIC_LINKING_ONLY.
 <BR></pre>
 
-<pre><b>void LZ4_resetStream_fast (LZ4_stream_t* streamPtr);
+<pre><b>LZ4LIB_STATIC_API void LZ4_resetStream_fast (LZ4_stream_t* streamPtr);
 </b><p>  Use this, like LZ4_resetStream(), to prepare a context for a new chain of
   calls to a streaming API (e.g., LZ4_compress_fast_continue()).
 
   Note:
-  Using this in advance of a non- streaming-compression function is redundant,
-  and potentially bad for performance, since they all perform their own custom
-  reset internally.
+  Using this in advance of a non-streaming-compression function is redundant,
+  since they all perform their own custom reset internally.
 
   Differences from LZ4_resetStream():
   When an LZ4_stream_t is known to be in a internally coherent state,
-  it can often be prepared for a new compression with almost no work, only
-  sometimes falling back to the full, expensive reset that is always required
-  when the stream is in an indeterminate state (i.e., the reset performed by
-  LZ4_resetStream()).
+  it can often be prepared for a new compression with almost no work,
+  only sometimes falling back to the full, expensive reset
+  that is always required when the stream is in an indeterminate state
+  (i.e., the reset performed by LZ4_resetStream()).
 
   LZ4_streams are guaranteed to be in a valid state when:
   - returned from LZ4_createStream()
@@ -339,22 +335,21 @@
  
 </p></pre><BR>
 
-<pre><b>int LZ4_compress_fast_extState_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
+<pre><b>LZ4LIB_STATIC_API int LZ4_compress_fast_extState_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
 </b><p>  A variant of LZ4_compress_fast_extState().
 
-  Using this variant avoids an expensive initialization step. It is only safe
-  to call if the state buffer is known to be correctly initialized already
-  (see above comment on LZ4_resetStream_fast() for a definition of "correctly
-  initialized"). From a high level, the difference is that this function
-  initializes the provided state with a call to something like
-  LZ4_resetStream_fast() while LZ4_compress_fast_extState() starts with a
-  call to LZ4_resetStream().
+  Using this variant avoids an expensive initialization step.
+  It is only safe to call if the state buffer is known to be correctly initialized already
+  (see above comment on LZ4_resetStream_fast() for a definition of "correctly initialized").
+  From a high level, the difference is that
+  this function initializes the provided state with a call to something like LZ4_resetStream_fast()
+  while LZ4_compress_fast_extState() starts with a call to LZ4_resetStream().
  
 </p></pre><BR>
 
-<pre><b>void LZ4_attach_dictionary(LZ4_stream_t *working_stream, const LZ4_stream_t *dictionary_stream);
-</b><p>  This is an experimental API that allows for the efficient use of a
-  static dictionary many times.
+<pre><b>LZ4LIB_STATIC_API void LZ4_attach_dictionary(LZ4_stream_t* workingStream, const LZ4_stream_t* dictionaryStream);
+</b><p>  This is an experimental API that allows
+  efficient use of a static dictionary many times.
 
   Rather than re-loading the dictionary buffer into a working context before
   each compression, or copying a pre-loaded dictionary's LZ4_stream_t into a
@@ -365,8 +360,8 @@
   Currently, only streams which have been prepared by LZ4_loadDict() should
   be expected to work.
 
-  Alternatively, the provided dictionary stream pointer may be NULL, in which
-  case any existing dictionary stream is unset.
+  Alternatively, the provided dictionaryStream may be NULL,
+  in which case any existing dictionary stream is unset.
 
   If a dictionary is provided, it replaces any pre-existing stream history.
   The dictionary contents are the only history that can be referenced and
@@ -381,9 +376,9 @@
 </p></pre><BR>
 
 <a name="Chapter9"></a><h2>Private definitions</h2><pre>
- Do not use these definitions.
- They are exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
- Using these definitions will expose code to API and/or ABI break in future versions of the library.
+ Do not use these definitions directly.
+ They are only exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
+ Accessing members will expose code to API and/or ABI break in future versions of the library.
 <BR></pre>
 
 <pre><b>typedef struct {
@@ -406,11 +401,11 @@
     unsigned long long table[LZ4_STREAMSIZE_U64];
     LZ4_stream_t_internal internal_donotuse;
 } ;  </b>/* previously typedef'd to LZ4_stream_t */<b>
-</b><p> information structure to track an LZ4 stream.
- init this structure before first use.
- note : only use in association with static linking !
-        this definition is not API/ABI safe,
-        it may change in a future version !
+</b><p>  information structure to track an LZ4 stream.
+  init this structure with LZ4_resetStream() before first use.
+  note : only use in association with static linking !
+         this definition is not API/ABI safe,
+         it may change in a future version !
  
 </p></pre><BR>
 
@@ -420,11 +415,11 @@
     unsigned long long table[LZ4_STREAMDECODESIZE_U64];
     LZ4_streamDecode_t_internal internal_donotuse;
 } ;   </b>/* previously typedef'd to LZ4_streamDecode_t */<b>
-</b><p> information structure to track an LZ4 stream during decompression.
- init this structure  using LZ4_setStreamDecode (or memset()) before first use
- note : only use in association with static linking !
-        this definition is not API/ABI safe,
-        and may change in a future version !
+</b><p>  information structure to track an LZ4 stream during decompression.
+  init this structure  using LZ4_setStreamDecode() before first use.
+  note : only use in association with static linking !
+         this definition is not API/ABI safe,
+         and may change in a future version !
  
 </p></pre><BR>