| <?xml version="1.0"?> |
| <!-- |
| |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| |
| --> |
| <document> |
| <properties> |
| <title>Commons Compress TAR package</title> |
| <author email="[email protected]">Commons Documentation Team</author> |
| </properties> |
| <body> |
| <section name="The TAR package"> |
| |
| <p>In addition to the information stored |
| in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code> |
| stores various attributes including information about the |
| original owner and permissions.</p> |
| |
| <p>There are several different dialects of the TAR format, maybe |
| even different TAR formats. The tar package contains special |
| cases in order to read many of the existing dialects and will by |
| default try to create archives in the original format (often |
| called "ustar"). This original format didn't support file names |
| longer than 100 characters or bigger than 8 GiB and the tar |
| package will by default fail if you try to write an entry that |
| goes beyond those limits. "ustar" is the common denominator of |
| all the existing tar dialects and is understood by most of the |
| existing tools.</p> |
| |
| <p>The tar package does not support the full POSIX tar standard |
| nor more modern GNU extension of said standard.</p> |
| |
| <subsection name="Long File Names"> |
| |
| <p>The <code>longFileMode</code> option of |
| <code>TarArchiveOutputStream</code> controls how files with |
| names longer than 100 characters are handled. The possible |
| choices are:</p> |
| |
| <ul> |
| <li><code>LONGFILE_ERROR</code>: throw an exception if such a |
| file is added. This is the default.</li> |
| <li><code>LONGFILE_TRUNCATE</code>: truncate such names.</li> |
| <li><code>LONGFILE_GNU</code>: use a GNU tar variant now |
| refered to as "oldgnu" of storing such names. If you choose |
| the GNU tar option, the archive can not be extracted using |
| many other tar implementations like the ones of OpenBSD, |
| Solaris or MacOS X.</li> |
| <li><code>LONGFILE_POSIX</code>: use a PAX <a |
| href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended |
| header</a> as defined by POSIX 1003.1. Most modern tar |
| implementations are able to extract such archives. <em>since |
| Commons Compress 1.4</em></li> |
| </ul> |
| |
| <p><code>TarArchiveInputStream</code> will recognize the GNU |
| tar as well as the POSIX extensions (starting with Commons |
| Compress 1.2) for long file names and reads the longer names |
| transparently.</p> |
| </subsection> |
| |
| <subsection name="Big Numeric Values"> |
| |
| <p>The <code>bigNumberMode</code> option of |
| <code>TarArchiveOutputStream</code> controls how files larger |
| than 8GiB or with other big numeric values that can't be |
| encoded in traditional header fields are handled. The |
| possible choices are:</p> |
| |
| <ul> |
| <li><code>BIGNUMBER_ERROR</code>: throw an exception if such an |
| entry is added. This is the default.</li> |
| <li><code>BIGNUMBER_STAR</code>: use a variant first |
| introduced by Jörg Schilling's <a |
| href="http://developer.berlios.de/projects/star">star</a> |
| and later adopted by GNU and BSD tar. This method is not |
| supported by all implementations.</li> |
| <li><code>BIGNUMBER_POSIX</code>: use a PAX <a |
| href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended |
| header</a> as defined by POSIX 1003.1. Most modern tar |
| implementations are able to extract such archives.</li> |
| </ul> |
| |
| <p>Starting with Commons Compress 1.4 |
| <code>TarArchiveInputStream</code> will recognize the star as |
| well as the POSIX extensions for big numeric values and reads them |
| transparently.</p> |
| </subsection> |
| |
| <subsection name="File Name Encoding"> |
| <p>The original ustar format only supports 7-Bit ASCII file |
| names, later implementations use the platform's default |
| encoding to encode file names. The POSIX standard recommends |
| using PAX extension headers for non-ASCII file names |
| instead.</p> |
| |
| <p>Commons Compress 1.1 to 1.3 assumed file names would be |
| encoded using ISO-8859-1. Starting with Commons Compress 1.4 |
| you can specify the encoding to expect (to use when writing) |
| as a parameter to <code>TarArchiveInputStream</code> |
| (<code>TarArchiveOutputStream</code>), it now defaults to the |
| platform's default encoding.</p> |
| |
| <p>Since Commons Compress 1.4 another optional parameter - |
| <code>addPaxHeadersForNonAsciiNames</code> - of |
| <code>TarArchiveOutputStream</code> controls whether PAX |
| extension headers will be written for non-ASCII file names. |
| By default they will not be written to preserve space. |
| <code>TarArchiveInputStream</code> will read them |
| transparently if present.</p> |
| </subsection> |
| |
| <subsection name="Sparse files"> |
| |
| <p><code>TarArchiveInputStream</code> will recognize sparse |
| file entries stored using the "oldgnu" format |
| (<code>--sparse-version=0.0</code> in GNU tar) but is not |
| able to extract them correctly. <a href="#Unsupported |
| Features"><code>canReadEntryData</code></a> will return false |
| on such entries. The other variants of sparse files can |
| currently not be detected at all.</p> |
| </subsection> |
| |
| <subsection name="Consuming Archives Completely"> |
| |
| <p>The end of a tar archive is signalled by two consecutive |
| records of all zeros. Unfortunately not all tar |
| implementations adhere to this and some only write one record |
| to end the archive. Commons Compress will always write two |
| records but stop reading an archive as soon as finds one |
| record of all zeros.</p> |
| |
| <p>Prior to version 1.5 this could leave the second EOF record |
| inside the stream when <code>getNextEntry</code> or |
| <code>getNextTarEntry</code> returned <code>null</code> |
| Starting with version 1.5 <code>TarArchiveInputStream</code> |
| will try to read a second record as well if present, |
| effectively consuming the archive completely.</p> |
| |
| </subsection> |
| |
| <subsection name="PAX Extended Header"> |
| <p>The tar package has supported reading PAX extended headers |
| since 1.3 for local headers and 1.11 for global headers. The |
| following entries of PAX headers are applied when reading:</p> |
| |
| <dl> |
| <dt>path</dt> |
| <dd>set the entry's name</dd> |
| |
| <dt>linkpath</dt> |
| <dd>set the entry's link name</dd> |
| |
| <dt>gid</dt> |
| <dd>set the entry's group id</dd> |
| |
| <dt>gname</dt> |
| <dd>set the entry's group name</dd> |
| |
| <dt>uid</dt> |
| <dd>set the entry's user id</dd> |
| |
| <dt>uname</dt> |
| <dd>set the entry's user name</dd> |
| |
| <dt>size</dt> |
| <dd>set the entry's size</dd> |
| |
| <dt>mtime</dt> |
| <dd>set the entry's modification time</dd> |
| |
| <dt>SCHILY.devminor</dt> |
| <dd>set the entry's minor device number</dd> |
| |
| <dt>SCHILY.devmajor</dt> |
| <dd>set the entry's major device number</dd> |
| </dl> |
| |
| <p>in addition some fields used by GNU tar and star used to |
| signal sparse entries are supported and are used for the |
| <code>is*GNUSparse</code> and <code>isStarSparse</code> |
| methods.</p> |
| |
| <p>Some PAX extra headers may be set when writing archives, |
| for example for non-ASCII names or big numeric values. This |
| depends on various setting of the output stream - see the |
| previous sections.</p> |
| |
| <p>Since 1.15 you can directly access all PAX extension |
| headers that have been found when reading an entry or specify |
| extra headers to be written to a (local) PAX extended header |
| entry.</p> |
| |
| <p>Some hints if you try to set extended headers:</p> |
| |
| <ul> |
| <li>pax header keywords should be ascii. star/gnutar |
| (SCHILY.xattr.* ) do not check for this. libarchive/bsdtar |
| (LIBARCHIVE.xattr.*) uses URL-Encoding.</li> |
| <li>pax header values should be encoded as UTF-8 characters |
| (including trailing <code>\0</code>). star/gnutar |
| (SCHILY.xattr.*) do not check for this. libarchive/bsdtar |
| (LIBARCHIVE.xattr.*) encode values using Base64.</li> |
| <li>libarchive/bsdtar will read SCHILY.xattr headers, but |
| will not generate them.</li> |
| <li>gnutar will complain about LIBARCHIVE.xattr (and any |
| other unknown) headers and will neither encode nor decode |
| them.</li> |
| </ul> |
| </subsection> |
| |
| </section> |
| </body> |
| </document> |