| jsoup changelog |
| |
| Release 1.17.1 [PENDING] |
| * Improvement: in Jsoup.connect(), added support for request-level authentication, supporting authentication to |
| proxies and to servers. |
| <https://github.com/jhy/jsoup/pull/2046> |
| |
| * Improvement: in the Elements list, added direct support for `#set(index, element)`, `#remove(index)`, |
| `#remove(object)`, `#clear()`, `#removeAll(collection)`, `#retainAll(collection)`, `#removeIf(filter)`, |
| `#replaceAll(operator)`. These methods update the original DOM, as well as the Elements list. |
| <https://github.com/jhy/jsoup/pull/2017> |
| |
| * Improvement: added the NodeIterator class, to efficiently traverse a node tree using the Iterator interface. And |
| added Stream Element#stream() and Node#nodeStream() methods, to enable fluent composable stream pipelines of node |
| traversals. |
| <https://github.com/jhy/jsoup/pull/2051> |
| |
| * Improvement: when changing the OutputSettings syntax to XML, the xhtml EscapeMode is automatically set by default. |
| |
| * Improvement: added the `:is(selector list)` pseudo-selector, which finds elements that match any of the selectors in |
| the selector list. Useful for making large ORed selectors more readable. |
| |
| * Improvement: repackaged the library with native (vs automatic) JPMS module support. |
| <https://github.com/jhy/jsoup/pull/2025> |
| |
| * Improvement: better fidelity of source positions when tracking is enabled. And implicitly created or closed elements |
| are tracked and detectable via Range.isImplicit(). |
| <https://github.com/jhy/jsoup/pull/2056> |
| |
| * Improvement: when source tracking is enabled, the source position for attribute names and values is now available. |
| Attribute#sourceRange() provides the ranges. |
| <https://github.com/jhy/jsoup/pull/2057> |
| |
| * Improvement: when running concurrently under Java 21+ Virtual Threads, virtual threads could be pinned to their |
| carrier platform thread when parsing an input stream. To improve performance, particularly when parsing fetched |
| URLs, the internal ConstrainableInputStream has been replaced by ControllableInputStream, which avoids the locking |
| which caused that pinning. |
| <https://github.com/jhy/jsoup/issues/2054> |
| |
| * Bugfix: when outputting with XML syntax, HTML elements that were parsed as data nodes (<script> and <style>) should |
| be emitted as CDATA nodes, so that they can be parsed correctly by an XML parser. |
| <https://github.com/jhy/jsoup/pull/1720> |
| |
| * Bugfix: the Immediate Parent selector `>` could match elements above the root context element, causing incorrect |
| elements to be returned when used on elements other than the root document. |
| <https://github.com/jhy/jsoup/issues/2018> |
| |
| * Bugfix: in a sub-query such as `p:has(> span, > i)`, combinators following the `,` Or combinator would be |
| incorrectly skipped, such that the sub-query was parsed as `i` instead of `> i`. |
| <https://github.com/jhy/jsoup/issues/1707> |
| |
| * Bugfix: in W3CDom, if the jsoup input document contained an empty doctype, the conversion would fail with a |
| DOMException. Now, said doctype is discarded, and the conversion continues. |
| |
| * Bugfix: when cleaning a document containing SVG elements (or other foreign elements that have preserved case names), |
| the cleaned output would be incorrectly nested if the safelist had a different case than the input document. |
| <https://github.com/jhy/jsoup/issues/2049> |
| |
| * Bugfix: when cleaning a document, the output style of unknown self-closing tags from the input was not preserved in |
| the output. (So a <foo /> in the input, if safe-listed, would be output as <foo></foo>.) |
| <https://github.com/jhy/jsoup/issues/2049> |
| |
| * Build Improvement: added a local test proxy implementation, for proxy integration tests. |
| <https://github.com/jhy/jsoup/pull/2029> |
| |
| * Build Improvement: added tests for HTTPS request support, using a local self-signed cert. Includes proxy tests. |
| <https://github.com/jhy/jsoup/pull/2032> |
| |
| * Change: the InputStream returned in Connection.Response.bodyStream() is no longer a ConstrainedInputStream, and |
| so is not subject to settings such as timeout or maximum size. It is now a plain BufferedInputStream around the |
| response stream. Whilst this behaviour was not documented, you may have been inadvertently relying on those |
| constraints. The constraints are still applied to other methods such as .parse() and .bufferUp(). So if you do want |
| a constrained BufferedInputStream, you may do Connection.Response.bufferUp().bodyStream(). |
| <https://github.com/jhy/jsoup/issues/2054> |
| |
| Release 1.16.2 [20-Oct-2023] |
| * Improvement: optimized the performance of complex CSS selectors, by adding a cost-based query planner. Evaluators |
| are sorted by their relative execution cost, and executed in order of lower to higher cost. This speeds the |
| matching process by ensuring that simpler evaluations (such as a tag name match) are conducted prior to more |
| complex evaluations (such as an attribute regex, or a deep child scan with a :has). |
| |
| * Improvement: added support for <svg> and <math> tags (and their children). This includes tag namespaces and case |
| preservation on applicable tags and attributes. |
| <https://github.com/jhy/jsoup/pull/2008> |
| |
| * Improvement: when converting jsoup Documents to W3C Documents in W3CDom, HTML documents will be placed in the |
| `http://www.w3.org/1999/xhtml` namespace by default, per the HTML5 spec. This can be controlled by setting |
| `W3CDom#namespaceAware(false)`. |
| <https://github.com/jhy/jsoup/pull/1848> |
| |
| * Improvement: speed optimized the Structural Evaluators by memoizing previous evaluations. Particularly the `~` |
| (any preceding sibling) and `:nth-of-type` selectors are improved. |
| <https://github.com/jhy/jsoup/issues/1956> |
| |
| * Improvement: tweaked the performance of the Element nextElementSibling, previousElementSibling, firstElementSibling, |
| lastElementSibling, firstElementChild, and lastElementChild. They now inplace filter/skip in the child-node list, vs |
| having to allocate and scan a complete Element filtered list. |
| |
| * Improvement: optimized internal methods that previously called Element.children() to use filter/skip child-node list |
| accessors instead, reducing new Element List allocations. |
| |
| * Improvement: tweaked the performance of parsing :pseudo selectors. |
| |
| * Improvement: when using the `:empty` pseudo-selector, blank textnodes are now considered empty. Previously, |
| an element containing any whitespace was not considered empty. |
| <https://github.com/jhy/jsoup/issues/1976> |
| |
| * Improvement: in forms, <input type="image"> should be excluded from formData() (and hence from form submissions). |
| <https://github.com/jhy/jsoup/pull/2010> |
| |
| * Improvement: in Safelist, made isSafeTag and isSafeAttribute public methods, for extensibility. |
| <https://github.com/jhy/jsoup/issues/1780> |
| |
| * Bugfix: `form` elements and empty elements (such as `img`) did not have their attributes de-duplicated. |
| <https://github.com/jhy/jsoup/pull/1950> |
| |
| * Bugfix: if Document.OutputSettings was cloned from a clone, an NPE would be thrown when used. |
| <https://github.com/jhy/jsoup/pull/1964> |
| |
| * Bugfix: in Jsoup.connect(url), URL paths containing a %2B were incorrectly recoded to a '+', or a '+' was recoded |
| to a ' '. Fixed by reverting to the previous behavior of not encoding supplied paths, other than normalizing to |
| ASCII. |
| <https://github.com/jhy/jsoup/issues/1952> |
| |
| * Bugfix: in Jsoup.connect(url), strings containing supplemental characters (e.g. emoji) were not URL escaped |
| correctly. |
| |
| * Bugfix: in Jsoup.connect(url), the ConstrainableInputStream would clear Thread interrupts when reading the body. |
| This precluded callers from spawning a thread, running a number of requests for a length of time, then joining that |
| thread after interrupting it. |
| <https://github.com/jhy/jsoup/issues/1991> |
| |
| * Bugfix: when tracking HTML source positions, the closing tags for H1...H6 elements were not tracked correctly. |
| <https://github.com/jhy/jsoup/issues/1987> |
| |
| * Bugfix: in Jsoup.connect(), a DELETE method request did not support a request body. |
| <https://github.com/jhy/jsoup/issues/1972> |
| |
| * Bugfix: when calling Element.cssSelector() on an extremely deeply nested element, a StackOverflowError could occur. |
| Further, a StackOverflowError may occur when running the query. |
| <https://github.com/jhy/jsoup/issues/2001> |
| |
| * Bugfix: appending a node back to its original Element after empty() would throw an Index out of bounds exception. |
| Also, now the child nodes that were removed have their parent node cleared, fully detaching them from the original |
| parent. |
| <https://github.com/jhy/jsoup/issues/2013> |
| |
| * Bugfix: in Jsoup.Connection when adding headers, the value may have been assumed to be an incorrectly decoded |
| ISO_8859_1 string, and re-encoded as UTF-8. The value is now left as-is. |
| |
| * Change: removed previously deprecated methods Document#normalise, Element#forEach(org.jsoup.helper.Consumer<>), |
| Node#forEach(org.jsoup.helper.Consumer<>), and the org.jsoup.helper.Consumer interface; the latter being a |
| previously required compatibility shim prior to Android's de-sugaring support. |
| |
| * Change: the previous compatibility shim org.jsoup.UncheckedIOException is deprecated in favor of the now supported |
| java.io.UncheckedIOException. If you are catching the former, modify your code to catch the latter instead. |
| <https://github.com/jhy/jsoup/pull/1989> |
| |
| * Change: blocked noscript tags from being added to Safelists, due to incompatibilities between parsers with and |
| without script-mode enabled. |
| |
| Release 1.16.1 [29-Apr-2023] |
| * Improvement: in Jsoup.connect(url), natively support URLs with Unicode characters in the path or query string, |
| without having to be escaped by the caller. |
| <https://github.com/jhy/jsoup/issues/1914> |
| |
| * Improvement: Calling Node.remove() on a node with no parent is now a no-op, vs a validation error. |
| <https://github.com/jhy/jsoup/issues/1898> |
| |
| * Bugfix: aligned the HTML Tree Builder processing steps for AfterBody and AfterAfterBody to the updated WHATWG |
| standard, to not pop the stack to close <body> or <html> elements. This prevents an errant </html> closing preceding |
| structure. Also added appropriate error message outputs in this case. |
| <https://github.com/jhy/jsoup/issues/1851> |
| |
| * Bugfix: Corrected support for ruby elements (<ruby>, <rp>, <rt>, and <rtc>) to current spec. |
| <https://github.com/jhy/jsoup/issues/1294> |
| |
| * Bugfix: When using Node.before(node) or Node.after(node), if the incoming node was a sibling of the context node, |
| the incoming node may be inserted into the wrong relative location. |
| <https://github.com/jhy/jsoup/issues/1898> |
| |
| * Bugfix: In Jsoup.connect(url), if the input URL had components that were already % escaped, they would be escaped |
| again, causing errors when fetched. |
| <https://github.com/jhy/jsoup/issues/1902> |
| |
| * Bugfix: when tracking input source positions, text in tables that was fostered had invalid positions. |
| <https://github.com/jhy/jsoup/issues/1927> |
| |
| * Bugfix: If the Document.OutputSettings class was initialized, and then Entities.escape(String) called, an NPE may be |
| thrown due to a class loading circular dependency. |
| <https://github.com/jhy/jsoup/issues/1910> |
| |
| * Bugfix: when pretty-printing, the first inline Element or Comment in a block would not be wrap-indented if it were |
| preceded by a blank text node. |
| <https://github.com/jhy/jsoup/issues/1906> |
| |
| * Bugfix: when pretty-printing a <pre> containing block tags, those tags were incorrectly indented. |
| <https://github.com/jhy/jsoup/issues/1891> |
| |
| * Bugfix: when pretty-printing nested inlineable blocks (such as a <p> in a <td>), the inner element should be |
| indented. |
| <https://github.com/jhy/jsoup/issues/1926> |
| |
| * Bugfix: <br> tags should be wrap-indented when in block tags (and not when in inline tags). |
| <https://github.com/jhy/jsoup/issues/1911> |
| |
| * Bugfix: the contents of a sufficiently large <textarea> with un-escaped HTML closing tags may be incorrectly parsed |
| to an empty node. |
| <https://github.com/jhy/jsoup/issues/1929> |
| |
| Release 1.15.4 [18-Feb-2023] |
| * Improvement: added the ability to escape CSS selectors (tags, IDs, classes) to match elements that don't follow |
| regular CSS syntax. For example, to match by classname <p class="one.two">, use document.select("p.one\\.two"); |
| <https://github.com/jhy/jsoup/issues/838> |
| |
| * Improvement: when pretty-printing, wrap text that follows a <br> tag. |
| <https://github.com/jhy/jsoup/issues/1858> |
| |
| * Improvement: when pretty-printing, normalize newlines that follow self-closing tags in custom tags. |
| <https://github.com/jhy/jsoup/issues/1852> |
| |
| * Improvement: when pretty-printing, collapse non-significant whitespace between a block and an inline tag. |
| <https://github.com/jhy/jsoup/issues/1802> |
| |
| * Improvement: in Element#forEach and Node#forEachNode, use java.util.function.Consumer instead of the previous |
| Android compatibility shim org.jsoup.helper.Consumer. Subsequently, the latter has been deprecated. |
| <https://github.com/jhy/jsoup/pull/1870> |
| |
| * Improvement: added a new method Document#forms(), to conveniently retrieve a List<FormElement> containing the <form> |
| elements in a document. |
| |
| * Improvement: added a new method Document#expectForm(query), to find the first matching FormElement, or blow up |
| trying. |
| |
| * Bugfix: URLs containing characters such as [ and ] were not escaped correctly, and would throw a |
| MalformedURLException when fetched. |
| <https://github.com/jhy/jsoup/issues/1873> |
| |
| * Bugfix: Element.cssSelector would create invalid selectors for elements where the tag name, ID, or classnames needed |
| to be escaped (e.g. if a class name contained a ':' or '.'). |
| <https://github.com/jhy/jsoup/issues/1742> |
| |
| * Bugfix: element.text() should have a space between a block and an inline element. |
| <https://github.com/jhy/jsoup/issues/1877> |
| |
| * Bugfix: if a Node or an Element was replaced with itself, that node would incorrectly be orphaned. |
| <https://github.com/jhy/jsoup/issues/1843> |
| |
| * Bugfix: form data on a previous request was copied to a new request in newRequest(), resulting in an accumulation of |
| form data when executing multi-step form submissions, or data sent to later requests incorrectly. Now, newRequest() |
| only copies session related settings (cookies, proxy settings, user-agent, etc) but not the request data nor the |
| body. |
| <https://github.com/jhy/jsoup/issues/1778> |
| |
| * Bugfix: fixed an issue in Safelist.removeAttributes which could throw a ConcurrentModificationException when using |
| the ":all" pseudo-attribute. |
| |
| * Bugfix: given extremely deeply nested HTML, a number of methods in Element could throw a StackOverflowError due |
| to excessive recursion. Namely: #data(), #hasText(), #parents(), and #wrap(html). |
| <https://github.com/jhy/jsoup/issues/1864> |
| |
| * Change: deprecated the unused Document#normalise() method. Normalization occurs during the HTML tree construction, |
| and no longer as a distinct phase. |
| |
| Release 1.15.3 [2022-Aug-24] |
| * Security: fixed an issue where the jsoup cleaner may incorrectly sanitize crafted XSS attempts if |
| SafeList.preserveRelativeLinks is enabled. |
| <https://github.com/jhy/jsoup/security/advisories/GHSA-gp7f-rwcx-9369> |
| |
| * Improvement: the Cleaner will preserve the source position of cleaned elements, if source tracking is enabled in the |
| original parse. |
| |
| * Improvement: the error messages output from Validate are more descriptive. Exceptions are now ValidationExceptions |
| (extending IllegalArgumentException). Stack traces do not include the Validate class, to make it simpler to see |
| where the exception originated. Common validation errors including malformed URLs and empty selector results have |
| more explicit error messages. |
| |
| * Bugfix: the DataUtil would incorrectly read from InputStreams that emitted reads less than the requested size. This |
| lead to incorrect results when parsing from chunked server responses, for example. |
| <https://github.com/jhy/jsoup/issues/1807> |
| |
| * Build Improvement: added implementation version and related fields to the jar manifest. |
| <https://github.com/jhy/jsoup/issues/1809> |
| |
| *** Release 1.15.2 [2022-Jul-04] |
| * Improvement: added the ability to track the position (line, column, index) in the original input source from where |
| a given node was parsed. Accessible via Node.sourceRange() and Element.endSourceRange(). |
| <https://github.com/jhy/jsoup/pull/1790> |
| |
| * Improvement: added Element.firstElementChild(), Element.lastElementChild(), Node.firstChild(), Node.lastChild(), |
| as convenient accessors to those child nodes and elements. |
| |
| * Improvement: added Element.expectFirst(cssQuery), which is just like Element.selectFirst(), but instead of returning |
| a null if there is no match, will throw an IllegalArgumentException. This is useful if you want to simply abort |
| processing if an expected match is not found. |
| |
| * Improvement: when pretty-printing HTML, doctypes are emitted on a newline if there is a preceding comment. |
| <https://github.com/jhy/jsoup/pull/1664> |
| |
| * Improvement: when pretty-printing, trim the leading and trailing spaces of textnodes in block tags when possible, |
| so that they are indented correctly. |
| <https://github.com/jhy/jsoup/issues/1798> |
| |
| * Improvement: in Element#selectXpath(), disable namespace awareness. This makes it possible to always select elements |
| by their simple local name, regardless of whether an xmlns attribute was set. |
| <https://github.com/jhy/jsoup/issues/1801> |
| |
| * Bugfix: when using the readToByteBuffer method, such as in Connection.Response.body(), if the document has not |
| already been parsed and must be read fully, and there is any maximum buffer size being applied, only the default |
| internal buffer size is read. |
| <https://github.com/jhy/jsoup/issues/1774> |
| |
| * Bugfix: when serializing HTML, newlines in elements descending from a pre tag were incorrectly skipped. That caused |
| what should have been preformatted output to instead be a run of text. |
| <https://github.com/jhy/jsoup/issues/1776> |
| |
| * Bugfix: when pretty-print serializing HTML, newlines separating phrasing content (e.g. a <span> tag within a <p> tag |
| would be incorrectly skipped, instead of normalized to a space. Additionally, improved space normalization between |
| other end of line occurrences, and whitespace handling after a closing </body> |
| <https://github.com/jhy/jsoup/issues/1787> |
| |
| *** Release 1.15.1 [2022-May-15] |
| * Change: removed previously deprecated methods and classes (including org.jsoup.safety.Whitelist; use |
| org.jsoup.safety.Safelist instead). |
| |
| * Improvement: when converting jsoup Documents to W3C Documents in W3CDom, preserve HTML valid attribute names if the |
| input document is using the HTML syntax. (Previously, would always coerce using the more restrictive XML syntax.) |
| <https://github.com/jhy/jsoup/pull/1648> |
| |
| * Improvement: added the :containsWholeText(text) selector, to match against non-normalized Element text. That can be |
| useful when elements can only be distinguished by e.g. specific case, or leading whitespace, etc. |
| <https://github.com/jhy/jsoup/issues/1636> |
| |
| * Improvement: added Element#wholeOwnText() to retrieve the original (non-normalized) ownText of an Element. Also |
| added the :containsWholeOwnText(text) selector, to match against that. BR elements are now treated as newlines |
| in the wholeText methods. |
| <https://github.com/jhy/jsoup/issues/1636> |
| |
| * Improvement: added the :matchesWholeText(regex) and :matchesWholeOwnText(regex) selectors, to match against whole |
| (non-normalized, case sensitive) element text and own text, respectively. |
| <https://github.com/jhy/jsoup/issues/1636> |
| |
| * Improvement: when evaluating an XPath query against a context element, the complete document is now visible to the |
| query, vs only the context element's sub-tree. This enables support for queries outside (parent or sibling) the |
| element, e.g. ancestor-or-self::*. |
| <https://github.com/jhy/jsoup/issues/1652> |
| |
| * Improvement: allow a maxPaddingWidth on the indent level in OutputSettings when pretty printing. This defaults to |
| 30 to limit the indent level for very deeply nested elements, and may be disabled by setting to -1. |
| <https://github.com/jhy/jsoup/pull/1655> |
| |
| * Improvement: when cloning a Node or an Element, the clone gets a cloned OwnerDocument containing only that clone, so |
| as to preserve applicable settings, such as the Pretty Print settings. |
| <https://github.com/jhy/jsoup/issues/763> |
| |
| * Improvement: added a convenience method Jsoup.parse(File). |
| <https://github.com/jhy/jsoup/issues/1693> |
| |
| * Improvement: in the NodeTraversor, added default implementations for NodeVisitor.tail() and NodeFilter.tail(), so |
| that code using only head() methods can be written as lambdas. |
| |
| * Improvement: in NodeTraversor, added support for removing nodes via Node.remove() during NodeVisitor.head(). |
| <https://github.com/jhy/jsoup/issues/1699> |
| |
| * Improvement: added Node.forEachNode(Consumer<Node>) and Element.forEach(Consumer<Element) methods, to efficiently |
| traverse the DOM with a functional interface. |
| <https://github.com/jhy/jsoup/issues/1700> |
| |
| * Bugfix: boolean attribute names should be case-insensitive, but were not when the parser was configured to preserve |
| case. |
| <https://github.com/jhy/jsoup/issues/1656> |
| |
| * Bugfix: when reading from SequenceInputStreams across the buffer, the input stream was closed too early, resulting |
| in missed content. |
| <https://github.com/jhy/jsoup/pull/1671> |
| |
| * Bugfix: a comment with all dashes (<!----->) should not emit a parse error. |
| <https://github.com/jhy/jsoup/issues/1667> |
| |
| * Bugfix: when throwing a SelectorParseException for an invalid selector, don't try to String.format the input, as |
| that could throw an IllegalFormatException. |
| <https://github.com/jhy/jsoup/issues/1691> |
| |
| * Bugfix: when serializing HTML with Pretty Print enabled, extraneous whitespace may be added on closing tags, or |
| extra newlines may be added at the end of script blocks. |
| <https://github.com/jhy/jsoup/issues/1688> |
| <https://github.com/jhy/jsoup/issues/1689> |
| |
| * Bugfix: when copy-creating a Safelist from another, perform a deep-copy of the original's settings, so that changes |
| to the original after creation do not affect the copy. |
| <https://github.com/jhy/jsoup/pull/1763> |
| |
| * Bugfix [Fuzz]: speed improvement when parsing constructed HTML containing very deeply incorrectly stacked formatting |
| elements with many attributes. |
| <https://github.com/jhy/jsoup/issues/1695> |
| |
| * Bugfix [Fuzz]: during parsing, a StackOverflowException was possible given crafted HTML with hundreds of nested |
| table elements followed by invalid formatting elements. |
| <https://github.com/jhy/jsoup/issues/1697> |
| |
| *** Release 1.14.3 [2021-Sep-30] |
| * Improvement: added native XPath support in Element#selectXpath(String) |
| <https://github.com/jhy/jsoup/pull/1629> |
| |
| * Improvement: added full support for the <template> tag to the HTML5 parser spec. |
| <https://github.com/jhy/jsoup/issues/1634> |
| |
| * Improvement: added support in CharacterReader to track newlines, so that parse errors can be reported more |
| intuitively. |
| <https://github.com/jhy/jsoup/pull/1624> |
| |
| * Improvement: tracked parse errors now have more details, including the erroneous token, to help clarify the errors. |
| |
| * Improvement: speed and memory optimizations for the :has(subquery) selector. |
| |
| * Improvement: the :contains(text) and :containsOwn(text) selectors are now whitespace normalized, aligning to the |
| document text that they are matching against. |
| <https://github.com/jhy/jsoup/issues/876> |
| |
| * Improvement: in Element, speed optimized adopting all of an element's child nodes into a currently empty element. |
| Improves the HTML adoption agency algorithm when adopting elements with many children. |
| <https://github.com/jhy/jsoup/issues/1638> |
| |
| * Improvement: increased the parse speed when in RCData (e.g. <title>) and unescaped <tag> tokens are found, by |
| memoizing the </title> scan and reducing GC. |
| <https://github.com/jhy/jsoup/issues/1644> |
| |
| * Improvement: when parsing custom tags (in HTML or XML), added a flyweight cache on Tag.valueOf(name) to reduce |
| memory overhead when many tags are repeated. Also tuned other areas of the parser when many very deeply stacked |
| custom elements were present. |
| <https://github.com/jhy/jsoup/issues/1646> |
| |
| * Bugfix: when tracking errors or checking for validity in the Cleaner, errors were incorrectly raised for missing |
| optional closing tags. |
| |
| * Bugfix: the OSGi bundle meta-data incorrectly set a version on the import of javax.annotation (used as a build-time |
| dependency for nullability assertions). |
| <https://github.com/jhy/jsoup/issues/1616> |
| |
| * Bugfix: the Attributes::equals() method was sensitive to the order of its contents, but it should not be. |
| <https://github.com/jhy/jsoup/issues/1492> |
| |
| * Bugfix: when the HTML parser was configured to preserve case, Element text methods would miss adding whitespace for |
| "BR" tags. |
| |
| * Bugfix: attribute names are now normalized & validated correctly for the specific output syntax (HTML or XML). |
| Previously, syntactically invalid attribute names could be output by the html() methods. Such attributes are still |
| available in the DOM, and will be normalized if possible on output. |
| <https://github.com/jhy/jsoup/issues/1474> |
| |
| * Bugfix [Fuzz]: fixed an IOOB when an empty select tag was followed by a body tag that needed reparenting. |
| <https://github.com/jhy/jsoup/issues/1639> |
| |
| * Build Improvement: fixed nullability annotations for Node.equals(other) and other equals methods. |
| <https://github.com/jhy/jsoup/issues/1628> |
| |
| * Build Improvement: added JDK 17 to the CI builds. |
| <https://github.com/jhy/jsoup/pull/1641> |
| |
| *** Release 1.14.2 [2021-Aug-15] |
| * Improvement: support Pattern.quote \Q and \E escapes in the selector regex matchers. |
| <https://github.com/jhy/jsoup/pull/1536> |
| |
| * Improvement: Element.absUrl() now supports tel: URLs, and other URLs that are already absolute but that Java does |
| not have input stream handlers for. |
| <https://github.com/jhy/jsoup/issues/1610> |
| |
| * Bugfix: when serializing output, escape characters that are in the < 0x20 range. This improves XML output |
| compatibility, and makes HTML output with these characters easier to read (as they're otherwise invisible). |
| <https://github.com/jhy/jsoup/issues/1556> |
| |
| * Bugfix: the *|el wildcard namespace selector now also matches elements with no namespace. |
| <https://github.com/jhy/jsoup/issues/1565> |
| |
| * Bugfix: corrected a potential case of the parser input stream not being closed immediately on a read exception. |
| |
| * Bugfix: when making a HTTP POST, if the request write fails, make sure the connection is immediately cleaned up. |
| |
| * Bugfix: in the XML parser, XML processing instructions without attributes would be serialized as if they did. |
| <https://github.com/jhy/jsoup/issues/770> |
| |
| * Bugfix: updated the HtmlTreeParser resetInsertionMode to the current spec for supported elements. |
| <https://github.com/jhy/jsoup/issues/1491> |
| |
| * Bugfix: fixed an NPE when parsing fragment HTML into a standalone table element. |
| <https://github.com/jhy/jsoup/issues/1603> |
| |
| * Bugfix: fixed an NPE when parsing fragment heading HTML into a standalone p element. |
| <https://github.com/jhy/jsoup/issues/1601> |
| |
| * Bugfix: fixed an IOOB when parsing a formatting fragment into a standalone p element. |
| <https://github.com/jhy/jsoup/issues/1602> |
| |
| * Bugfix: tag names must start with an ascii-alpha character. |
| <https://github.com/jhy/jsoup/issues/1006> |
| |
| * Bugfix [Fuzz]: fixed a slow parse when a tag or an attribute name has thousands of null characters in it. |
| <https://github.com/jhy/jsoup/issues/1580> |
| |
| * Bugfix [Fuzz]: the adoption agency algorithm can have an incorrect bookmark position |
| <https://github.com/jhy/jsoup/issues/1576> |
| |
| * Bugfix [Fuzz]: malformed HTML could result in null elements on stack |
| <https://github.com/jhy/jsoup/issues/1579> |
| |
| * Bugfix [Fuzz]: malformed deeply nested table elements could create a stack overflow. |
| <https://github.com/jhy/jsoup/issues/1577> |
| |
| * Bugfix [Fuzz]: Speed optimized malformed HTML creating elements with thousands of elements - limit the attribute |
| count per element when parsing to 512 (in real-world HTML, P99 is ~ 8). |
| <https://github.com/jhy/jsoup/issues/1578> |
| |
| * Bugfix [Fuzz]: Speed improvement for the foster formatting elements algo, by limiting how far up a crafted stack |
| to scan. |
| <https://github.com/jhy/jsoup/issues/1593> |
| |
| * Bugfix [Fuzz]: Speed improvement when parsing crafted HTML when transferring form attributes. |
| <https://github.com/jhy/jsoup/issues/1595> |
| |
| * Bugfix [Fuzz]: Speed improvement when the stack was thousands of items deep, and non-matching close tags sent. |
| <https://github.com/jhy/jsoup/issues/1596> |
| |
| * Bugfix [Fuzz]: Speed improvement when an attribute name is 600K of quote characters or otherwise needs accumulation |
| vs being able to read in one hit. |
| <https://github.com/jhy/jsoup/issues/1605> |
| |
| * Bugfix [Fuzz]: Speed improvement when closing missing empty tags (in XML comment processed as HTML) when thousands |
| deep in stack. |
| <https://github.com/jhy/jsoup/issues/1606> |
| |
| * Bugfix [Fuzz]: Fix a potential stack-overflow in the parser given crafted HTML, when the parser looped in the |
| InSelectInTable state. |
| |
| * Bugfix [Fuzz]: Fix an IOOB when the HTML root was cleared from the stack and then attributes were merged onto it. |
| <https://github.com/jhy/jsoup/issues/1611> |
| |
| * Bugfix [Fuzz]: Improved the speed of parsing when crafted HTML contains hundreds of active formatting elements |
| that were copied for all new elements (similar to an amplification attack). The number of considered active |
| formatting elements that will be cloned when mis-nested is now capped to 12. |
| <https://github.com/jhy/jsoup/issues/1613> |
| |
| *** Release 1.14.1 [2021-Jul-10] |
| * Change: updated the minimum supported Java version from Java 7 to Java 8. |
| |
| * Change: updated the minimum Android API level from 8 to 10. |
| |
| * Change: although Node#childNodes() returns an UnmodifiableList as a view into its children, it was still |
| directly backed by the internal child list. That made some uses, such as looping and moving those children to |
| another element, throw a ConcurrentModificationException. Now this method returns its own list so that they are |
| separated and changes to the parent's contents will not impact the children view. This aligns with similar methods |
| such as Element#children(). If you have code that iterates this list and makes parenting changes to its contents, |
| you may need to make a code update. |
| <https://github.com/jhy/jsoup/issues/1431> |
| |
| * Change: the org.jsoup.Connection interface has been modified to introduce new methods for sessions and the cookie |
| store. If you have a custom implementation of this interface, you will need to add implementations of these methods. |
| |
| * Improvement: added HTTP request session management support with Jsoup.newSession(). This extends the Connection |
| implementation to support (optional) sessions, which allow request defaults (timeout, proxy, etc) to be set once and |
| then applied to all requests within that session. |
| |
| Cookies are re-implemented to correctly support path and domain filtering when used within a session. A default |
| in-memory cookie store is used for the session, or a custom implementation (perhaps disk-persistent, or pre-set) |
| can be used instead. |
| |
| Forms submitted using the FormElement#submit() use the same session that was used to fetch the document and so pass |
| cookies and other defaults appropriately. |
| |
| The session is multi-thread safe and can execute multiple requests concurrently. If the user accidentally tries to |
| execute the same request object across multiple threads (vs calling Connection#newRequest()), |
| that is detected cleanly and a clear exception is thrown (vs weird blowups in input stream reading, or forcing |
| everything through a synchronized bottleneck. |
| <https://github.com/jhy/jsoup/pull/1476> |
| |
| * Improvement: renamed the Whitelist class to Safelist, with the goal of more inclusive language. A shim is provided |
| for backwards compatibility (source and binary). This shim is marked as deprecated and will be removed in the |
| jsoup 1.15.1 release. |
| <https://github.com/jhy/jsoup/pull/1464> |
| |
| * Improvement: added support for Internationalized Domain Names (IDNs) in Jsoup.Connect. |
| <https://github.com/jhy/jsoup/issues/1300> |
| |
| * Improvement: added support for loading and parsing gzipped HTML files in Jsoup.parse(File in, charset, baseUri). |
| |
| * Improvement: reduced thread contention in HttpConnection and Document. |
| <https://github.com/jhy/jsoup/pull/1455> |
| |
| * Improvement: better parsing performance when under high thread concurrency |
| <https://github.com/jhy/jsoup/pull/1402> |
| |
| * Improvement: added Element#id(String) ID attribute setter. |
| |
| * Improvement: in Document, #body() and #head() accessors will now automatically create those elements, if they were |
| missing (e.g. if the Document was not parsed from HTML). Additionally, the #body() method returns the frameset |
| element (instead of null) for frameset documents. |
| |
| * Improvement: when cleaning a document, the output settings of the original document are cloned into the cleaned |
| document. |
| <https://github.com/jhy/jsoup/issues/1417> |
| |
| * Improvement: when parsing XML, disable pretty-printing by default. |
| <https://github.com/jhy/jsoup/issues/1168> |
| |
| * Improvement: much better performance in Node#clone() for large and deeply nested documents. Complexity was O(n^2) or |
| worse, now O(n). |
| |
| * Improvement: during traversal using the NodeTraversor, nodes may now be replaced with Node#replaceWith(Node). |
| <https://github.com/jhy/jsoup/issues/1289> |
| |
| * Improvement: added Element#insertChildren and Element#prependChildren, as convenience methods in addition to |
| Element#insertChildren(index, children), for bulk moving nodes. |
| |
| * Improvement: clean up relative URLs with too many .. segments better. |
| <https://github.com/jhy/jsoup/pull/1482> |
| |
| * Build Improvement: integrated jsoup into the OSS Fuzz project, which semi-randomly generates millions of different |
| HTML and XML input files, searching for areas to improve in the parser for increased robustness and throughput. |
| <https://github.com/jhy/jsoup/issues/1502> |
| |
| * Build Improvement: integrated with GitHub's CodeQL static code analyzer. |
| <https://github.com/jhy/jsoup/pull/1494> |
| |
| * Build Improvement: moved to GitHub Workflows for build verification. |
| |
| * Build Improvement: updated Jetty (used for integration tests; not bundled) to 9.4.42. |
| |
| * Build Improvement: added nullability annotations and initial settings. |
| <https://github.com/jhy/jsoup/pull/1467> |
| |
| * Bugfix: corrected the adoption agency algorithm, to handle cases where e.g. a <a> tag incorrectly nests further <a> |
| tags. |
| <https://github.com/jhy/jsoup/pull/1517> <https://github.com/jhy/jsoup/issues/845> |
| |
| * Bugfix: when parsing HTML, could throw NPEs on some tags (isindex or table>input). |
| <https://github.com/jhy/jsoup/issues/1404> |
| |
| * Bugfix: in HttpConnection.Request, headers beginning with "sec-" (e.g. Sec-Fetch-Mode) were silently discarded by |
| the underlying Java HttpURLConnection. These are now settable correctly. |
| <https://github.com/jhy/jsoup/issues/1461> |
| |
| * Bugfix: when adding child Nodes to a Node, could incorrectly reparent all nodes if the first parent had the same |
| length of children as the incoming node list. |
| |
| * Bugfix: when wrapping an orphaned element, would throw an NPE. |
| |
| * Bugfix: when wrapping an element with HTML that included multiple sibling elements, those siblings were incorrectly |
| added as children of the wrapper instead of siblings. |
| |
| * Bugfix: when setting the content of a script or style tag via the Element#html(String) method, the content is now |
| treated as a DataNode, not a TextNode. This means that characters like '<' will no longer be incorrectly escaped. |
| As a related ergonomic improvement, the same behavior applies for Element#text(String) (i.e. the content will be |
| treated as a DataNode, despite calling the text() method. |
| <https://github.com/jhy/jsoup/issues/1419> |
| |
| * Bugfix: when wrapping HTML around an existing element with Element#wrap(String), will now take the content as |
| provided and ignore normal HTML tree-building rules. This allows for e.g. a div tag to be placed inside of p tags. |
| |
| * Bugfix: the Elements#forms() method should return the selected immediate elements that are Forms, not children. |
| <https://github.com/jhy/jsoup/pull/1403> |
| |
| * Bugfix: when creating a selector for an element with Element#cssSelector, if the element used a non-unique ID |
| attribute, the returned selector may not match the desired element. |
| <https://github.com/jhy/jsoup/issues/1085> |
| |
| * Bugfix: corrected the toString() methods of the Evaluator classes. |
| |
| * Bugfix: when converting a jsoup document to a W3C document (in W3CDom#convert), if a tag had XML illegal characters, |
| a DOMException would be thrown. Now instead, that tag is represented as a text node. |
| <https://github.com/jhy/jsoup/issues/1093> |
| |
| * Bugfix: if a HTML file ended with an open noscript tag, an "EOF" string would appear in the HTML output. |
| |
| * Bugfix: when parsing a document as XML, automatically set the output syntax to XML, and ensure that "<" characters |
| in attributes are escaped as "<" (which is not required in HTML as the quoted attribute contents are safe, but is |
| required in XML). |
| <https://github.com/jhy/jsoup/issues/1420> |
| |
| * Bugfix: [Fuzz] when parsing an attribute key containing "abs:abs", a validation error would be incorrectly |
| thrown. |
| <https://github.com/jhy/jsoup/issues/1541> |
| |
| * Bugfix: [Fuzz] could NPE while parsing in resetInsertionMode(). |
| <https://github.com/jhy/jsoup/issues/1538> |
| |
| * Bugfix: [Fuzz] when parsing XML, could Stack Overflow when parsing XML declarations. |
| <https://github.com/jhy/jsoup/issues/1539> |
| |
| * Bugfix: [Fuzz] fixed a potential Stack Overflow when parsing mis-nested tfoot tags, and updated the tree parser for |
| this situation to match the updated HTML5 spec. |
| <https://github.com/jhy/jsoup/issues/1543> |
| |
| * Bugfix: [Fuzz] fixed a potentially slow HTML parse when tags are nested extremely deep (e.g. 88K depth), by limiting |
| the formatting tag search depth to 256. In practice, it's generally between 4 - 8. |
| <https://github.com/jhy/jsoup/issues/1544> |
| |
| * Bugfix: [Fuzz] when parsing an unterminated RCDATA token (e.g. a <title> tag), could throw an IO Exception "No |
| buffer left to unconsume" when trying to rewind the buffer. |
| <https://github.com/jhy/jsoup/issues/1542> |
| |
| *** Release 1.13.1 [2020-Feb-29] |
| * Improvement: added Element#closest(selector), which walks up the tree to find the nearest element matching the |
| selector. |
| <https://github.com/jhy/jsoup/issues/1326> |
| |
| * Improvement: memory optimizations, reducing the retained size of a Document by ~ 39%, and allocations by ~ 9%: |
| 1. Attributes holder in Elements is only created if the element has attributes |
| 2. Only track the baseUri in an element when it is set via DOM to a new value for a given tree |
| 3. After parsing, do not retain the input character reader (and associated buffers) in the Document#parser |
| |
| * Improvement: substantial parse speed improvements vs 1.12.x (bringing back to par with previous releases). |
| <https://github.com/jhy/jsoup/issues/1327> |
| |
| * Improvement: when pretty-printing, comments in inline tags are not pushed to a newline |
| |
| * Improvement: added Attributes#hasDeclaredValueForKey(key) and Attribute#hasDeclaredValueForKeyIgnoreCase(), to check |
| if an attribute is set but has no value. Useful in place of the deprecated and removed BooleanAttribute class and |
| instanceof test. |
| |
| * Improvement: removed old methods and classes that were marked deprecated in previous releases. |
| |
| * Improvement: added Element#select(Evaluator) and Element#selectFirst(Evaluator), to allow re-use of a parsed CSS |
| selector if using the same evaluator many times. |
| <https://github.com/jhy/jsoup/issues/1319> |
| |
| * Improvement: added Elements#forms(), Elements#textNodes(), Elements#dataNodes(), and Elements#comments(), as a |
| convenient way to get access to these node types directly from an element selection. |
| |
| * Improvement: preserve whitespace before html and head tag, if pretty-printing is off. |
| |
| * Bugfix: in a <select> tag, a second <optgroup> would not automatically close an earlier open <optgroup> |
| <https://github.com/jhy/jsoup/issues/1313> |
| |
| * Bugfix: in CharacterReader when parsing an input stream, could throw a Mark Invalid exception if the reader was |
| marked, a bufferUp occurred, and then the reader was rewound. |
| <https://github.com/jhy/jsoup/issues/1324> |
| |
| * Bugfix: empty tags and form tags did not have their attributes normalized (lower-cased by default) |
| <https://github.com/jhy/jsoup/pull/1323> |
| |
| * Bugfix: when preserve case was set to on, the HTML pretty-print formatter didn't indent capitalized tags correctly. |
| |
| * Bugfix: ensure that script and style contents are parsed into DataNodes, not TextNodes, when in case-sensitive |
| parse mode. |
| |
| **** Release 1.12.2 [2020-Feb-08] |
| * Improvement: the :has() selector now supports relative selectors. For example, the query |
| "div:has(> a)" will select all "div" elements that have at least one direct child "a" element. |
| <https://github.com/jhy/jsoup/pull/1214> |
| |
| * Improvement: added Element chaining methods for various overridden methods on Node. |
| <https://github.com/jhy/jsoup/issues/1193> |
| |
| * Improvement: ensure HTTP keepalives work when fetching content via body() and bodyAsBytes(). |
| <https://github.com/jhy/jsoup/issues/1232> |
| |
| * Improvement: set the default max body size in Jsoup.Connection to 2MB (up from 1MB) so fewer people get trimmed |
| content if they have not set it, but still in sensible bounds. Also updated the default user-agent to improve |
| default compatibility. |
| |
| * Improvement: dramatic speed improvement when bulk inserting child nodes into an element (wrapping contents). |
| <https://github.com/jhy/jsoup/issues/1281> |
| |
| * Improvement: added Element#childrenSize() as a convenience to get the size of an element's element children. |
| <https://github.com/jhy/jsoup/pull/1291> |
| |
| * Improvement: in W3CDom.asString, allow the output mode to be specified as HTML or as XML. It will default to |
| checking the content, and automatically selecting. |
| |
| * Improvement: added a Document#documentType() method, to get a doc's doctype. |
| |
| * Improvement: To DocumentType, added #name(), #publicID(), and #systemId() methods to fetch those fields. |
| |
| * Improvement: in W3CDom conversions from jsoup documents, retain the DocumentType, and be able to serialize it. |
| <https://github.com/jhy/jsoup/issues/1183> |
| |
| * Bugfix: on pages fetch by Jsoup.Connection, a "Mark Invalid" exception might be incorrectly thrown, or the page may |
| miss some data. This occurred on larger pages when the file transfer was chunked, and an invalid HTML entity |
| happened to cross a chunk boundary. |
| <https://github.com/jhy/jsoup/issues/1218> |
| |
| * Bugfix: if duplicate attributes in an element exist, retain the first vs the last attribute with the same name. Case |
| aware (HTML case-insensitive names, XML are case-sensitive). |
| <https://github.com/jhy/jsoup/issues/1219> |
| |
| * Bugfix: don't submit input type=button form elements. |
| <https://github.com/jhy/jsoup/issues/1231> |
| |
| * Bugfix: handle error position reporting correctly and don't blow up in some edge cases. |
| <https://github.com/jhy/jsoup/issues/1251> |
| <https://github.com/jhy/jsoup/pull/1253> |
| |
| * Bugfix: handle the ^= (starts with) selector correctly when the prefix starts with a space. |
| <https://github.com/jhy/jsoup/pull/1280> |
| |
| * Bugfix: don't strip out zero-width-joiners (or zero-width-non-joiners) when normalizing text. That breaks combined |
| emoji (and other text semantics). 🤦♂️ |
| <https://github.com/jhy/jsoup/issues/1269> |
| |
| * Bugfix: Evaluator.TagEndsWith (namespaced elements) and Tag disagreed in case-sensitivity. Now correctly matches |
| case-insensitively. |
| <https://github.com/jhy/jsoup/issues/1257> |
| |
| * Bugfix: Don't throw an exception if a selector ends in a space, just trim it. |
| <https://github.com/jhy/jsoup/issues/1274> |
| |
| * Bugfix: HTML parser adds redundant text when parsing self-closing textarea. |
| <https://github.com/jhy/jsoup/issues/1220> |
| |
| * Bugfix: Don't add spurious whitespace or newlines to HTML or text for inline tags. |
| <https://github.com/jhy/jsoup/issues/1305> |
| <https://github.com/jhy/jsoup/issues/731> |
| |
| * Bugfix: TextNode.outerHtml() wouldn't normalize correctly without a parent. |
| <https://github.com/jhy/jsoup/issues/1309> |
| |
| * Bugfix: Removed binary input detection as it was causing too many false positives. |
| <https://github.com/jhy/jsoup/issues/1250> |
| |
| * Bugfix: when cloning a TextNode, if .attributes() was hit before the clone() method, the text value would only be a |
| shallow clone. |
| <https://github.com/jhy/jsoup/issues/1176> |
| |
| * Various code hygiene updates. |
| |
| **** Release 1.12.1 [2019-May-12] |
| * Change: removed deprecated method to disable TLS cert checking Connection.validateTLSCertificates(). |
| |
| * Change: some internal methods have been rearranged; if you extended any of the Jsoup internals you may need to make |
| updates. |
| |
| * Improvement: documents now remember their parser, so when later manipulating them, the correct HTML or XML tree |
| builder is reused, as are the parser settings like case preservation. |
| <https://github.com/jhy/jsoup/issues/769> |
| |
| * Improvement: Jsoup now detects the character set of the input if specified in an XML Declaration, when using the |
| HTML parser. Previously that only happened when the XML parser was specified. |
| <https://github.com/jhy/jsoup/issues/1009> |
| |
| * Improvement: if the document's input character set does not support encoding, flip it to one that does. |
| <https://github.com/jhy/jsoup/issues/1007> |
| |
| * Improvement: if a start tag is missing a > and a new tag is seen with a <, treat that as a new tag. (This differs |
| from the HTML5 spec, which would make at attribute with a name beginning with <, but in practice this impacts too |
| many pages. |
| <https://github.com/jhy/jsoup/issues/797> |
| |
| * Improvement: performance tweaks when parsing start tags, data, tables. |
| |
| * Improvement: added Element.nextElementSiblings() and Element.previousElementSiblings() |
| <https://github.com/jhy/jsoup/pull/1054> |
| |
| * Improvement: treat center tags as block tags. |
| <https://github.com/jhy/jsoup/pull/1113> |
| |
| * Improvement: allow forms to be submitted with Content-Type=multipart/form-data without requiring a file upload; |
| automatically set the mime boundary. |
| <https://github.com/jhy/jsoup/pull/1058> |
| |
| * Improvement: Jsoup will now detect if an input file or URL is binary, and will refuse to attempt to parse it, with |
| an IO exception. This prevents runaway processing time and wasted effort creating meaningless parsed DOM trees. |
| <https://github.com/jhy/jsoup/issues/1192> |
| |
| * Bugfix: when using the tag case preserving parsing settings, certain HTML tree building rules where not followed |
| for upper case tags. |
| <https://github.com/jhy/jsoup/issues/1149> |
| |
| * Bugfix: when converting a Jsoup document to a W3C DOM, if an element is namespaced but not in a defined namespace, |
| set it to the global namespace. |
| <https://github.com/jhy/jsoup/issues/848> |
| |
| * Bugfix: attributes created with the Attribute constructor with just spaces for names would incorrectly pass |
| validation. |
| <https://github.com/jhy/jsoup/issues/1159> |
| |
| * Bugfix: some pseudo XML Declarations were incorrectly handled when using the XML Parser, leading to an IOOB |
| exception when parsing. |
| <https://github.com/jhy/jsoup/issues/1139> |
| |
| * Bugfix: when parsing URL parameter names in an attribute that is not correctly HTML encoded, and near the end of the |
| current buffer, those parameters may be incorrectly dropped. (Improved CharacterReader mark/reset support.) |
| <https://github.com/jhy/jsoup/pull/1154> |
| |
| * Bugfix: boolean attribute values would be returned as null, vs an empty string, when accessed via the |
| Attribute#getValue() method. |
| <https://github.com/jhy/jsoup/issues/1065> |
| |
| * Bugfix: orphan Attribute objects (i.e. created outside of a parse or an Element) would throw an NPE on |
| Attribute#setValue(val) |
| <https://github.com/jhy/jsoup/issues/1107> |
| |
| * Bugfix: Element.shallowClone() was not making a clone of its attributes. |
| <https://github.com/jhy/jsoup/issues/1201> |
| |
| * Bugfix: fixed an ArrayIndexOutOfBoundsException in HttpConnection.looksLikeUtf8 when testing small strings in |
| specific ranges. |
| <https://github.com/jhy/jsoup/issues/1172> |
| |
| * Updated jetty-server (which is used for integration tests) to latest 9.2 series (9.2.28). |
| |
| *** Release 1.11.3 [2018-Apr-15] |
| * Improvement: CDATA sections are now treated as whitespace preserving (regardless of the containing element), and are |
| round-tripped into output HTML. |
| <https://github.com/jhy/jsoup/issues/406> |
| <https://github.com/jhy/jsoup/issues/965> |
| |
| * Improvement: added support for Deflate encoding. |
| <https://github.com/jhy/jsoup/pull/982> |
| |
| * Improvement: when parsing <pre> tags, skip the first newline if present. |
| <https://github.com/jhy/jsoup/issues/825> |
| |
| * Improvement: support nested quotes for attribute selection queries. |
| <https://github.com/jhy/jsoup/pull/988> |
| |
| * Improvement: character references from Windows-1252 that are not valid Unicode are mapped to the appropriate |
| Unicode replacement. |
| <https://github.com/jhy/jsoup/pull/1046> |
| |
| * Improvement: accept a custom SSL socket factory in Jsoup.Connection. |
| <https://github.com/jhy/jsoup/pull/1038> |
| |
| * Bugfix: "Mark has been invalidated" exception was thrown when parsing some URLs on Android <= 6. |
| <https://github.com/jhy/jsoup/issues/990> |
| |
| * Bugfix: The Element.text() for <div>One</div>Two was "OneTwo", not "One Two". |
| <https://github.com/jhy/jsoup/issues/812> |
| |
| * Bugfix: boolean attributes with empty string values were not collapsing in HTML output. |
| <https://github.com/jhy/jsoup/issues/985> |
| |
| * Bugfix: when using the XML Parser set to lowercase normalize tags, uppercase closing tags were not correctly |
| handled. |
| <https://github.com/jhy/jsoup/issues/998> |
| |
| * Bugfix: when parsing from a URL, an end tag could be read incorrectly if it started on a buffer boundary. |
| <https://github.com/jhy/jsoup/issues/995> |
| |
| * Bugfix: when parsing from a URL, if the remote server failed to complete its write (i.e. it writes less than the |
| Content Length header promised on a gzipped stream), the parse method would incorrectly throw an unchecked |
| exception. It now throws the declared IOException. |
| <https://github.com/jhy/jsoup/issues/980> |
| |
| * Bugfix: leaf nodes (such as text nodes) where throwing an unsupported operation exception on childNodes(), instead |
| of just returning an empty list. |
| <https://github.com/jhy/jsoup/issues/1032> |
| |
| * Bugfix: documents with a leading UTF-8 BOM did not have that BOM consumed, so it acted as a zero width no-break |
| space, which could impact the parse tree. |
| <https://github.com/jhy/jsoup/issues/1003> |
| |
| * Bugfix: when parsing an invalid XML declaration, the parse would fail. |
| <https://github.com/jhy/jsoup/issues/1015> |
| |
| *** Release 1.11.2 [2017-Nov-19] |
| * Improvement: added a new pseudo selector :matchText, which allows text nodes to match as if they were elements. |
| This enables finding text that is only marked by a "br" tag, for example. |
| <https://github.com/jhy/jsoup/issues/550> |
| |
| * Change: marked Connection.validateTLSCertificates() as deprecated. |
| |
| * Improvement: normalize invisible characters (like soft-hyphens) in Element.text(). |
| <https://github.com/jhy/jsoup/issues/978> |
| |
| * Improvement: added Element.wholeText(), to easily get the un-normalized text value of an element and its children. |
| <https://github.com/jhy/jsoup/pull/564> |
| |
| * Bugfix: in a deep DOM stack, a StackOverFlow exception could occur when generating implied end tags. |
| <https://github.com/jhy/jsoup/issues/966> |
| |
| * Bugfix: when parsing attribute values that happened to cross a buffer boundary, a character was dropped. |
| <https://github.com/jhy/jsoup/issues/967> |
| |
| * Bugfix: fixed an issue that prevented using infinite timeouts in Jsoup.Connection. |
| <https://github.com/jhy/jsoup/issues/968> |
| |
| * Bugfix: whitespace preserving tags were not honoured when nested deeper than two levels deep. |
| <https://github.com/jhy/jsoup/issues/722> |
| |
| * Bugfix: an unterminated comment token at the end of the HTML input would cause an out of bounds exception. |
| <https://github.com/jhy/jsoup/issues/972> |
| |
| * Bugfix: an NPE in the Cleaner which would occur if an <a href> attribute value was missing. |
| <https://github.com/jhy/jsoup/issues/973> |
| |
| * Bugfix: when serializing the same document in a multiple threads, on Android, with a character set that is not ascii |
| or UTF-8, an encoding exception could occur. |
| <https://github.com/jhy/jsoup/issues/970> |
| |
| * Bugfix: removing a form value from the DOM would not remove it from FormData. |
| <https://github.com/jhy/jsoup/pull/969> |
| |
| * Bugfix: in the W3CDom transformer, siblings were incorrectly inheriting namespaces defined on previous siblings. |
| <https://github.com/jhy/jsoup/issues/977> |
| |
| *** Release 1.11.1 [2017-Nov-06] |
| * Updated language level to Java 7 from Java 5. To maintain Android support (of minversion 8), try-with-resources are |
| not used. |
| <https://github.com/jhy/jsoup/issues/899> |
| |
| * When loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, |
| rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage |
| objects when loading large files. Note that this change means that a response, once parsed, may not be parsed |
| again from the same response object unless you call response.bufferUp() first, which will buffer the full response |
| into memory. |
| <https://github.com/jhy/jsoup/issues/904> |
| |
| * Added Connection.Response.bodyStream(), a method to get the response body as an input stream. This is useful for |
| saving a large response straight to a file, without buffering fully into memory first. |
| |
| * Performance improvements in text and HTML generation (through less GC). |
| |
| * Reduced memory consumption of text, scripts, and comments in the DOM by 40%, by refactoring the node |
| hierarchy to not track childnodes or attributes by default for lead nodes. For the average document, that's about a |
| 30% memory reduction. |
| <https://github.com/jhy/jsoup/issues/911> |
| |
| * Reduced memory consumption of Elements by refactoring their Attributes to be a simple pair of arrays, vs a |
| LinkedHashSet. |
| <https://github.com/jhy/jsoup/issues/911> |
| |
| * Added support for Element.selectFirst(query), to efficiently find the first matching element. |
| |
| * Added Element.appendTo(parent) to simplify slinging elements about. |
| <https://github.com/jhy/jsoup/pull/662> |
| |
| * Added support for multiple headers with the same name in Jsoup.Connect |
| |
| * Added Element.shallowClone() and Node.shallowClone(), to allow cloning nodes without getting all their children. |
| <https://github.com/jhy/jsoup/issues/900> |
| |
| * Updated Element.text() and the :contains(text) selector to consider character as spaces. |
| |
| * Updated Jsoup.connect().timeout() to implement a total connect + combined read timeout. Previously it specified |
| connect and buffer read times only, so to implement a combined total timeout, you had to have another thread send |
| an interrupt. |
| |
| * Improved performance of Node.addChildren (was quadratic) |
| <https://github.com/jhy/jsoup/pull/930> |
| |
| * Added missing support for template tags in tables |
| <https://github.com/jhy/jsoup/pull/901> |
| |
| * In Jsoup.connect file uploads, added the ability to set the uploaded files' mimetype. |
| <https://github.com/jhy/jsoup/issues/936> |
| |
| * Improved Node traversal, including less object creation, and partial and filtering traversor support. |
| <https://github.com/jhy/jsoup/pull/849> |
| |
| * Bugfix: if a document was re-decoded after character set detection, the HTML parser was not reset correctly, |
| which could lead to an incorrect DOM. |
| <https://github.com/jhy/jsoup/issues/877> |
| |
| * Bugfix: attributes with the same name but different case would be incorrectly treated as different attributes. |
| <https://github.com/jhy/jsoup/pull/903> |
| |
| * Bugfix: self-closing tags for known empty elements were incorrectly treated as errors. |
| <https://github.com/jhy/jsoup/issues/868> |
| |
| * Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest of the page to be |
| incorrectly parsed as data or text. |
| <https://github.com/jhy/jsoup/issues/906> |
| |
| * Bugfix: fixed an issue with unknown mixed-case tags |
| <https://github.com/jhy/jsoup/pull/942> |
| |
| * Bugfix: fixed an issue where the entity resources were left open after startup, causing a warning. |
| <https://github.com/jhy/jsoup/pull/928> |
| |
| * Bugfix: fixed an issue where Element.getElementsByIndexLessThan(index) would incorrectly provide the root element |
| <https://github.com/jhy/jsoup/pull/918> |
| |
| * Improved parse time for pages with exceptionally deeply nested tags. |
| <https://github.com/jhy/jsoup/issues/955> |
| |
| * Improvement / workaround: modified the Entities implementation to load its data from a .class vs from a jar resource. |
| Faster, and safer on Android. |
| <https://github.com/jhy/jsoup/issues/959> |
| |
| *** Release 1.10.3 [2017-Jun-11] |
| * Added Elements.eachText() and Elements.eachAttr(name), which return a list of Element's text or attribute values, |
| respectively. This makes it simpler to for example get a list of each URL on a page: |
| List<String> urls = doc.select("a").eachAttr("abs:href""); |
| |
| * Improved selector validation for :contains(...) with unbalanced quotes. |
| <https://github.com/jhy/jsoup/issues/803> |
| |
| * Improved the speed of index based CSS selectors and other methods that use elementSiblingIndex, by a factor of 34x. |
| <https://github.com/jhy/jsoup/pull/862> |
| |
| * Added Node.clearAttributes(), to simplify removing of all attributes of a Node / Element. |
| <https://github.com/jhy/jsoup/issues/829> |
| |
| * Bugfix: if an attribute name started or ended with a control character, the parse would fail with a validation |
| exception. |
| <https://github.com/jhy/jsoup/issues/793> |
| |
| * Bugfix: Element.hasClass() and the ".classname" selector would not find the class attribute case-insensitively. |
| <https://github.com/jhy/jsoup/issues/814> |
| |
| * Bugfix: In Jsoup.Connection, if a redirect contained a query string with %xx escapes, they would be double escaped |
| before the redirect was followed, leading to fetching an incorrect location. |
| |
| * Bugfix: In Jsoup.Connection, if a request body was set and the connection was redirected, the body would incorrectly |
| still be sent. |
| <https://github.com/jhy/jsoup/pull/881> |
| |
| * Bugfix: In DataUtil when detecting the character set from meta data, and there are two Content-Types defined, use |
| the one that defines a character set. |
| <https://github.com/jhy/jsoup/pull/835> |
| |
| * Bugfix: when parsing unknown tags in case-sensitive HTML mode, end tags would not close scope correctly. |
| <https://github.com/jhy/jsoup/issues/819> |
| |
| * In Jsoup.Connection, ensure there is no Content-Type set when being redirected to a GET. |
| <https://github.com/jhy/jsoup/pull/895> |
| |
| * Bugfix: in certain locales (Turkey specifically), lowercasing and case insensitivity could fail for specific items. |
| <https://github.com/jhy/jsoup/pull/820> |
| |
| * Bugfix: after an element was cloned, changes to its child list where not notifying the element correctly. |
| <https://github.com/jhy/jsoup/issues/951> |
| |
| *** Release 1.10.2 [2017-Jan-02] |
| * Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading |
| the HTML entity files. About 1.72x faster in this area. |
| |
| * Added Element.is(query) to check if an element matches this CSS query. |
| |
| * Added new methods to Elements: next(query), nextAll(query), prev(query), prevAll(query) to select next and previous |
| element siblings from a current selection, with optional selectors. |
| |
| * Added Node.root() to get the topmost ancestor of a Node. |
| |
| * Added the new selector :containsData(), to find elements that hold data, like script and style tags. |
| |
| * Changed Jsoup.isValid(bodyHtml) to validate that the input contains only body HTML that is safe according to the |
| safelist, and does not include HTML errors. And in the Jsoup.Cleaner.isValid(Document) method, make sure the doc |
| only includes body HTML. |
| <https://github.com/jhy/jsoup/issues/245> |
| <https://github.com/jhy/jsoup/issues/632> |
| |
| * In Safelists, validate that a removed protocol exists before removing said protocol. |
| |
| * Allow the Jsoup.Connect thread to be interrupted when reading the input stream; helps when reading from a long stream |
| of data that doesn't read timeout. |
| <https://github.com/jhy/jsoup/pull/712> |
| |
| * Jsoup.Connect now uses a desktop user agent by default. Many developers were getting caught by not specifying the |
| user agent, and sending the default 'Java'. That causes many servers to return different content than what they would |
| to a desktop browser, and what the developer was expecting. |
| |
| * Increased the default connect/read timeout in Jsoup.Connect to 30 seconds. |
| |
| * Jsoup.Connect now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, and converts |
| the header value appropriately. This improves compatibility with servers that are configured incorrectly. |
| |
| * Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe correctly. |
| <https://github.com/jhy/jsoup/issues/706> |
| |
| * Bugfix: a "SYSTEM" flag in doctype tags would be incorrectly removed. |
| <https://github.com/jhy/jsoup/issues/408> |
| |
| * Bugfix: removing attributes from an Element with removeAttr() would cause a ConcurrentModificationException. |
| |
| * Bugfix: the contents of Comment nodes were not returned by Element.data() |
| |
| * Bugfix: if source checked out on Windows with git autocrlf=true, Entities.load would fail because of the \r char. |
| |
| *** Release 1.10.1 [2016-Oct-23] |
| * New feature: added the option to preserve case for tags and/or attributes, with ParseSettings. By default, the HTML |
| parser will continue to normalize tag names and attribute names to lower case, and the XML parser will now preserve |
| case, according to the relevant spec. The CSS selectors for tags and attributes remain case insensitive, per the CSS |
| spec. |
| |
| * Improved support for extended HTML entities, including supplemental characters and multiple character references. |
| Also reduced memory consumption of the entity tables. |
| <https://github.com/jhy/jsoup/issues/602> |
| <https://github.com/jhy/jsoup/issues/603> |
| |
| * Added support for *|E wildcard namespace selectors. |
| <https://github.com/jhy/jsoup/pull/724> |
| |
| * Added support for setting multiple connection headers at once with Connection.headers(Map) |
| <https://github.com/jhy/jsoup/pull/725> |
| |
| * Added support for setting/overriding the response character set in Connection.Response, for cases where the charset |
| is not defined by the server, or is defined incorrectly. |
| <https://github.com/jhy/jsoup/issues/743> |
| |
| * Improved performance of class selectors by reducing memory allocation and garbage collection. |
| <https://github.com/jhy/jsoup/pull/753> |
| |
| * Improved performance of HTML output by reducing the creation of temporary attribute list iterators. |
| <https://github.com/jhy/jsoup/pull/755> |
| |
| * Fixed an issue when converting to the W3CDom XML, where valid (but ugly) HTML attribute names containing characters |
| like '"' could not be converted into valid XML attribute names. These attribute names are now normalized if possible, |
| or not added to the XML DOM. |
| <https://github.com/jhy/jsoup/issues/721> |
| |
| * Fixed an OOB exception when loading an empty-body URL and parsing with the XML parser. |
| <https://github.com/jhy/jsoup/issues/727> |
| |
| * Fixed an issue where attribute names starting with a slash would be parsed incorrectly. |
| <https://github.com/jhy/jsoup/pull/748> |
| |
| * Don't reuse charset encoders from OutputSettings, to make threadsafe. |
| <https://github.com/jhy/jsoup/issues/740> |
| |
| * Fixed an issue in connections with a requestBody where a custom content-type header could be ignored. |
| <https://github.com/jhy/jsoup/issues/756> |
| |
| *** Release 1.9.2 [2016-May-17] |
| * Fixed an issue where tag names that contained non-ascii characters but started with an ascii character |
| would cause the parser to get stuck in an infinite loop. |
| <https://github.com/jhy/jsoup/issues/704> |
| |
| * In XML documents, detect the charset from the XML prolog - <?xml encoding="UTF-8"?> |
| <https://github.com/jhy/jsoup/issues/701> |
| |
| * Fixed an issue where created XML documents would have an incorrect prolog. |
| <https://github.com/jhy/jsoup/issues/652> |
| |
| * Fixed an issue where you could not use an attribute selector to find values containing unbalanced braces or |
| parentheses. |
| <https://github.com/jhy/jsoup/issues/611> |
| |
| * Fixed an issue where namespaced tags (like <fb:comment>) would cause Element.cssSelector() to fail. |
| <https://github.com/jhy/jsoup/pull/677> |
| |
| *** Release 1.9.1 [2016-Apr-16] |
| * Added support for HTTP and SOCKS request proxies, specifiable per connection. |
| <https://github.com/jhy/jsoup/pull/570> |
| |
| * Added support for sending plain HTTP request bodies in POST and PUT requests, with Connection.requestBody(String). |
| |
| * Added support in Jsoup.Connect for HEAD, OPTIONS, TRACE. |
| <https://github.com/jhy/jsoup/issues/613> |
| |
| * Added support for HTTP 307 Temporary Redirect (replays posts, if applicable). |
| <https://github.com/jhy/jsoup/pull/666> |
| |
| * Performance improvements when parsing HTML, particularly for Android Dalvik. |
| |
| * Added support for writing HTML into Appendable objects (like OutputStreamWriter), to enable stream serialization. |
| <https://github.com/jhy/jsoup/pull/470/> |
| |
| * Added support for XML namespaces when converting jsoup documents to W3C documents. |
| <https://github.com/jhy/jsoup/pull/672> |
| |
| * Added support for UTF-16 and UTF-32 character set detection from byte-order-marks (BOM). |
| <https://github.com/jhy/jsoup/issues/695> |
| |
| * Added support for tags with non-ascii (unicode) letters. |
| <https://github.com/jhy/jsoup/issues/667> |
| |
| * Added Connection.data(key) to retrieve a data KeyVal by its key. Useful to update form data before submission. |
| |
| * Fixed an issue in the Parent selector where it would not match against the root element it was applied to. |
| <https://github.com/jhy/jsoup/pull/619> |
| |
| * Fix an issue where elements.select(query) would not return every matching element if they had the same content. |
| <https://github.com/jhy/jsoup/issues/614> |
| |
| * Added not-null validators to Element.appendText() and Element.prependText() |
| <https://github.com/jhy/jsoup/issues/690> |
| |
| * Fixed an issue when moving nodes using Element.insert(index, children) where the sibling index would be set |
| incorrectly, leading to the original loads being lost. |
| <https://github.com/jhy/jsoup/issues/689> |
| |
| * Reverted Node.equals() and Node.hashCode() back to identity (object) comparisons, as deep content inspection |
| had negative performance impacts and hashkey stability problems. Functionality replaced with Node.hasSameContent(). |
| <https://github.com/jhy/jsoup/issues/688> |
| |
| * In Jsoup.Connect, if the same header key is seen multiple times, combine their values with a comma per the HTTP RFC, |
| instead of keeping just one value. Also fixes an issue where header values could be out of order. |
| <https://github.com/jhy/jsoup/issues/618> |
| |
| *** Release 1.8.3 [2015-Aug-02] |
| * Added support for custom boolean attributes. |
| <https://github.com/jhy/jsoup/pull/555> |
| |
| * When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser. |
| <https://github.com/jhy/jsoup/pull/574> |
| |
| * Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster. On Android |
| Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of |
| various websites; also from further memory reduction for nodes with no children, and other tweaks. |
| |
| * Fixed an issue in Element.getElementSiblingIndex (and related methods) where sibling elements with the same content |
| would incorrectly have the same sibling index. |
| <https://github.com/jhy/jsoup/issues/554> |
| |
| * Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the |
| document. |
| <https://github.com/jhy/jsoup/issues/552> |
| |
| * Fixed an issue where a table nested within a TH cell would parse to an incorrect tree. |
| <https://github.com/jhy/jsoup/issues/575> |
| |
| * When serializing a document using the XHTML encoding entities, if the character set did not support chars |
| (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; when using XHTML |
| encoding entities (as is not defined), regardless of the output character set. |
| <https://github.com/jhy/jsoup/issues/523> |
| |
| * Fixed an issue when resolving URLs, if the absolute URL had no path, the relative URL was not normalized correctly. |
| Also fixed an issue where connections that were redirected to a relative URL did not have the same normalization |
| rules as a URL read from Nodes.absUrl(String). |
| <https://github.com/jhy/jsoup/issues/585> |
| |
| * When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML. |
| <https://github.com/jhy/jsoup/issues/528> |
| |
| *** Release 1.8.2 [2015-Apr-13] |
| * Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger |
| speed increase. For non-Android JREs, around 1.1x to 1.2x. |
| |
| * Dramatic performance improvement in HTML serialization on Android (KitKat and later), of 115x. Improvement by working |
| around a character set encoding speed regression in Android. |
| <https://github.com/jhy/jsoup/issues/383> |
| |
| * Performance improvement for the class name selector on Android (.class) of 2.5x to 14x. Around 1.2x |
| on non-Android JREs. |
| |
| * File upload support. Added the ability to specify input streams for POST data, which will upload content in |
| MIME multipart/form-data encoding. |
| |
| * Add a meta-charset element to documents when setting the character set, so that the document's charset is |
| unambiguous. |
| <https://github.com/jhy/jsoup/pull/486> |
| |
| * Added ability to disable TLS (SSL) certificate validation. Helpful if you're hitting a host with a bad cert, |
| or your JDK doesn't support SNI. |
| <https://github.com/jhy/jsoup/pull/343> |
| |
| * Added ability to further tweak the canned Cleaner Safelists by removing existing settings. |
| <https://github.com/jhy/jsoup/pull/449> |
| |
| * Added option in Cleaner Safelist to allow linking to in-page anchors (#) |
| <https://github.com/jhy/jsoup/pull/441> |
| |
| * Use a lowercase doctype tag for HTML5 documents. |
| |
| * Add support for 201 Created with redirect, and other status codes. Treats any HTTP status code 2xx or 3xx as an OK |
| response, and follow redirects whenever there is a Location header. |
| <https://github.com/jhy/jsoup/issues/312> |
| |
| * Added support for HTTP method verbs PUT, DELETE, and PATCH. |
| |
| * Added support for overriding the default POST character of UTF-8 |
| <https://github.com/jhy/jsoup/pull/491> |
| |
| * W3C DOM support: added ability to convert from a jsoup document to a W3C document, with the W3Dom helper class. |
| |
| * In the HtmlToPlainText example program, added the ability to filter using a CSS selector. Also clarified |
| the usage documentation. |
| |
| * Fixed validation of cookie names in HttpConnection cookie methods. |
| <https://github.com/jhy/jsoup/pull/377> |
| |
| * Fixed an issue where <option> tags would be missed when preparing a form for submission if missing a selected |
| attribute. |
| |
| * Fixed an issue where submitting a form would incorrectly include radio and checkbox values without the checked |
| attribute. |
| |
| * Fixed an issue where Element.classNames() would return a set containing an empty class; and may have extraneous |
| whitespace. |
| <https://github.com/jhy/jsoup/pull/469> |
| |
| * Fixed an issue where attributes selected by value were not correctly space normalized. |
| <https://github.com/jhy/jsoup/pull/526> |
| |
| * In head+noscript elements, treat content as character data, instead of jumping out of head parsing. |
| <https://github.com/jhy/jsoup/pull/540> |
| |
| * Fixed performance issue when parsing HTML with elements with many children that need re-parenting. |
| <https://github.com/jhy/jsoup/pull/506> |
| |
| * Fixed an issue where a server returning an unsupported character set response would cause a runtime |
| UnsupportedCharsetException, instead of falling back to the default UTF-8 charset. |
| <https://github.com/jhy/jsoup/pull/509> |
| |
| * Fixed an issue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length. |
| <https://github.com/jhy/jsoup/issues/538> |
| |
| * Improved the equals() and hashcode() methods in Node, to consider all their child content, for DOM tree comparisons. |
| <https://github.com/jhy/jsoup/issues/537> |
| |
| * Improved performance in Selector when searching multiple roots. |
| <https://github.com/jhy/jsoup/issues/518> |
| |
| *** Release 1.8.1 [2014-Sep-27] |
| * Introduced the ability to chose between HTML and XML output, and made HTML the default. This means img tags are |
| output as <img>, not <img />. XML is the default when using the XmlTreeBuilder. Control this with the |
| Document.OutputSettings.syntax() method. |
| |
| * Improved the performance of Element.text() by 3.2x |
| |
| * Improved the performance of Element.html() by 1.7x |
| |
| * Improved file read time by 2x, giving around a 10% speed improvement to file parses. |
| <https://github.com/jhy/jsoup/issues/248> |
| |
| * Tightened the scope of what characters are escaped in attributes and textnodes, to align with the spec. Also, when |
| using the extended escape entities map, only escape a character if the current output charset does not support it. |
| This produces smaller, more legible HTML, with greater control over the output (by setting charset and escape mode). |
| |
| * If pretty-print is disabled, don't trim outer whitespace in Element.html() |
| <https://github.com/jhy/jsoup/issues/368> |
| |
| * In the HTML Cleaner, allow span tags in the basic safelist, and span and div tags in the relaxed safelist. |
| |
| * Added Element.cssSelector(), which returns a unique CSS selector/path for an element. |
| <https://github.com/jhy/jsoup/pull/459> |
| |
| * Fixed an issue where <svg><img/></svg> was parsed as <svg><image/></svg> |
| <https://github.com/jhy/jsoup/issues/364> |
| |
| * Fixed an issue where a UTF-8 BOM character was not detected if the HTTP response did not specify a charset, and |
| the HTML body did, leading to the head contents incorrectly being parsed into the body. Changed the behavior so that |
| when the UTF-8 BOM is detected, it will take precedence for determining the charset to decode with. |
| <https://github.com/jhy/jsoup/issues/348> |
| |
| * Relaxed doctype validation, allowing doctypes to not specify a name. |
| <https://github.com/jhy/jsoup/issues/460> |
| |
| * Fixed an issue in parsing a base URI when loading a URL containing a http-equiv element. |
| <https://github.com/jhy/jsoup/issues/440> |
| |
| * Fixed an issue for Java 1.5 / Android 2.2 compatibility, and verify it doesn't regress. |
| <https://github.com/jhy/jsoup/issues/375> |
| <https://github.com/jhy/jsoup/pull/403> |
| |
| * Fixed an issue that would throw an NPE when trying to set invalid HTML into a title element. |
| <https://github.com/jhy/jsoup/pull/410> |
| |
| * Added support for quoted attribute values in CSS Selectors |
| <https://github.com/jhy/jsoup/pull/400> |
| |
| * Fixed support for nth-of-type selectors with unknown tags. |
| <https://github.com/jhy/jsoup/pull/402> |
| |
| * Added support for 'application/*+xml' mimetypes. |
| <https://github.com/jhy/jsoup/pull/444> |
| |
| * Fixed support for allowing script tags in cleaner Safelists. |
| <https://github.com/jhy/jsoup/issues/299> |
| <https://github.com/jhy/jsoup/issues/388> |
| |
| * In FormElements, don't submit disabled inputs, and use 'on' as checkbox value default. |
| <https://github.com/jhy/jsoup/issues/489> |
| |
| *** Release 1.7.3 [2013-Nov-10] |
| * Introduced FormElement, providing easy access to form controls and their data, and the ability to submit forms |
| with Jsoup.Connect. |
| |
| * Reduced GC impact during HTML parsing, with 17% fewer objects created, and 3% faster parses. |
| |
| * Reduced CSS selection time by 26% for common queries. |
| |
| * Improved HTTP character set detection. |
| <https://github.com/jhy/jsoup/pull/325> <https://github.com/jhy/jsoup/issues/321> |
| |
| * Added Document.location, to get the URL the document was retrieved from. Helpful if connection was redirected. |
| <https://github.com/jhy/jsoup/pull/306> |
| |
| * Fixed support for self-closing script tags. |
| <https://github.com/jhy/jsoup/issues/305> |
| |
| * Fixed a crash when reading an unterminated CDATA section. |
| <https://github.com/jhy/jsoup/issues/349> |
| |
| * Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes. |
| <https://github.com/jhy/jsoup/issues/313> |
| |
| * Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow. |
| <https://github.com/jhy/jsoup/issues/290> |
| |
| * Fixed an issue when connecting or redirecting to a URL that contains a space. |
| <https://github.com/jhy/jsoup/pull/354> <https://github.com/jhy/jsoup/issues/114> |
| |
| * Added support for the HTTP/1.1 Temporary Redirect (307) status code. |
| <https://github.com/jhy/jsoup/issues/452> |
| |
| *** Release 1.7.2 [2013-Jan-27] |
| * Added support for supplementary characters outside of the Basic Multilingual Plane. |
| <https://github.com/jhy/jsoup/issues/288> <https://github.com/jhy/jsoup/pull/289> |
| |
| * Added support for structural pseudo CSS selectors, including :first-child, :last-child, :nth-child, :nth-last-child, |
| :first-of-type, :last-of-type, :nth-of-type, :nth-last-of-type, :only-child, :only-of-type, :empty, and :root |
| <https://github.com/jhy/jsoup/pull/208> |
| |
| * Added a maximum body response size to Jsoup.Connection, to prevent running out of memory when trying to read |
| extremely large documents. The default is 1MB. |
| |
| * Refactored the Cleaner to traverse rather than recurse child nodes, to avoid the risk of overflowing the stack. |
| <https://github.com/jhy/jsoup/issues/246> |
| |
| * Added Element.insertChildren(), to easily insert a list of child nodes at a specific index. |
| <https://github.com/jhy/jsoup/issues/239> |
| |
| * Added Node.childNodesCopy(), to create an independent copy of a Node's children. |
| |
| * When parsing in XML mode, preserve XML declarations (<?xml ... ?>). |
| <https://github.com/jhy/jsoup/issues/242> |
| |
| * Introduced Parser.parseXmlFragment(), to allow easy parsing of XML fragments. |
| <https://github.com/jhy/jsoup/issues/279> |
| |
| * Allow Safelist test methods to be extended |
| <https://github.com/jhy/jsoup/issues/85> |
| |
| * Added Document.OutputSettings.outline mode, to aid HTML debugging by printing out in outline mode, similar to |
| browser HTML inspectors. |
| <https://github.com/jhy/jsoup/issues/273> |
| |
| * When parsing, allow all tags to self-close. Tags that aren't expected to self-close will get an end tag. |
| <https://github.com/jhy/jsoup/issues/258> |
| |
| * Fixed an issue when parsing <textarea>/RCData tags containing unescaped closing tags that would drop the trailing >. |
| |
| * Corrected the javadoc for Element#child() to note that it throws IndexOutOfBounds. |
| <https://github.com/jhy/jsoup/issues/277> |
| |
| * When cloning an Element, reset the classnames set so as not to hold a pointer to the source's. |
| <https://github.com/jhy/jsoup/issues/278> |
| |
| * Limit how far up the stack the formatting adoption agency algorithm will travel, to prevent the chance of a run-away |
| parse when the HTML stack is hopelessly deep. |
| <https://github.com/jhy/jsoup/issues/234> |
| |
| * Modified Element.text() to build text by traversing child nodes rather than recursing. This avoids stack-overflow |
| errors when the DOM is very deep and the VM stack-size is low. |
| <https://github.com/jhy/jsoup/issues/271> |
| |
| *** Release 1.7.1 [2012-Sep-23] |
| * Improved parse time, now 2.3x faster than previous release, with lower memory consumption. |
| |
| * Reduced memory consumption when selecting elements. |
| |
| * Introduced finer granularity of exceptions in Jsoup.connect, including HttpStatusException and |
| UnsupportedMimeTypeException. |
| <https://github.com/jhy/jsoup/issues/229> |
| |
| * Fixed an issue when determining the Windows-1254 character-set from a meta tag when run in the Turkish locale. |
| <https://github.com/jhy/jsoup/issues/191> |
| |
| * Fixed whitespace preservation in <textarea> tags. |
| <https://github.com/jhy/jsoup/issues/167> |
| |
| * In jsoup.connect, fail faster if the return content type is not supported. |
| <https://github.com/jhy/jsoup/issues/153> |
| |
| * In jsoup.clean, allow custom OutputSettings, to control pretty printing, character set, and entity escaping. |
| <https://github.com/jhy/jsoup/issues/148> |
| |
| * Fixed an issue that prevented frameset documents to be cleaned by the Cleaner. |
| <https://github.com/jhy/jsoup/issues/154> |
| |
| * Fixed an issue when normalising whitespace for strings containing high-surrogate characters. |
| <https://github.com/jhy/jsoup/issues/214> |
| |
| * If a server doesn't specify a content-type header, treat that as OK. |
| <https://github.com/jhy/jsoup/issues/213> |
| |
| * If a server returns an unsupported character-set header, attempt to decode the content with the default charset |
| (UTF8), instead of bailing with an unsupported charset exception. |
| <https://github.com/jhy/jsoup/issues/215> |
| |
| * Removed an unnecessary synchronisation in Tag.valueOf, allowing multi-threaded parsing to run faster. |
| <https://github.com/jhy/jsoup/issues/238> |
| |
| * Made entity decoding less greedy, so that non-entities are less likely to be incorrectly treated as entities. |
| <https://github.com/jhy/jsoup/issues/224> |
| |
| * Whitespace normalise document.title() output. |
| <https://github.com/jhy/jsoup/issues/168> |
| |
| * In Jsoup.connection, enforce a connection disconnect after every connect. This precludes keep-alive connections to |
| the same host, but in practise many implementations will leak connections, particularly on error. |
| |
| *** Release 1.6.3 [2012-May-28] |
| * Fixed parsing of group-or commas in CSS selectors, to correctly handle sub-queries containing commas. |
| <https://github.com/jhy/jsoup/issues/179> |
| |
| * If a node has no parent, return null on previousSibling and nextSibling instead of throwing a null pointer exception. |
| <https://github.com/jhy/jsoup/issues/184> |
| |
| * Updated Node.siblingNodes() and Element.siblingElements() to exclude the current node (a node is not its own sibling). |
| |
| * Fixed HTML entity parser to correctly parse entities like frac14 (letter + number combo). |
| <https://github.com/jhy/jsoup/issues/145> |
| |
| * Fixed issue where contents of a script tag within a comment could be incorrectly parsed. |
| <https://github.com/jhy/jsoup/issues/115> |
| |
| * Fixed GAE support: load HTML entities from a file on startup, instead of embedding in the class. |
| |
| * Fixed NPE when HTML fragment parsing a <style> tag |
| <https://github.com/jhy/jsoup/issues/189> |
| |
| * Fixed issue with :all pseudo-tag in HTML sanitizer when cleaning tags previously defined in safelist |
| <https://github.com/jhy/jsoup/issues/156> |
| |
| * Fixed NPE in Parser.parseFragment() when context parameter is null. |
| <https://github.com/jhy/jsoup/issues/195> |
| |
| * In HTML Safelists, when defining allowed attributes for a tag, automatically add the tag to the allowed list. |
| |
| *** Release 1.6.2 [2012-Mar-27] |
| * Added a simplified XML parsing mode, which can usefully parse valid and invalid XML, but does not enforce any HTML |
| document structure or special tag behaviour. |
| |
| * Added the optional ability to track errors when tokenising and parsing. |
| |
| * Added jsoup.connect.cookies(Map) method, to set multiple cookies at once, possibly from a prior request. |
| |
| * Added Element.textNodes() and Element.dataNodes(), to easily access an element's children text nodes and data nodes. |
| |
| * Added an example program that demonstrates how to format HTML as plain-text, and the use of the NodeVisitor interface. |
| |
| * Added Node.traverse() and Elements.traverse() methods, to iterate through a node's descendants. |
| |
| * Updated jsoup.connect so that when requests made as POSTs are redirected, the redirect is followed as a GET. |
| <https://github.com/jhy/jsoup/issues/120> |
| |
| * Updated the Cleaner and Safelists to optionally preserve related links in elements, instead of converting them |
| to absolute links. |
| |
| * Updated the Cleaner to support custom allowed protocols such as "cid:" and "data:". |
| <https://github.com/jhy/jsoup/issues/127> |
| |
| * Updated handling of <base href> tags, to act on only the first one seen when parsing, to align with modern browsers. |
| |
| * Updated Node.setBaseUri(), to recursively set on all the node's descendants. |
| |
| * Fixed handling of null characters within comments. |
| <https://github.com/jhy/jsoup/issues/121> |
| |
| * Tweaked escaped entity detection in attributes to not treat &entity_... as an entity form. |
| <https://github.com/jhy/jsoup/issues/129> |
| |
| * Fixed doctype tokeniser to allow whitespace between name and public identifier. |
| |
| * Fixed issue where comments within a table tag would be duplicate-fostered into body. |
| <https://github.com/jhy/jsoup/pull/165> |
| |
| * Fixed an issue where a spurious byte-order-mark at the start of a document would cause the parser to miss head |
| contents. |
| <https://github.com/jhy/jsoup/issues/134> |
| |
| * Fixed an issue where content after a frameset could cause a NPE crash. Now correctly implements spec and ignores |
| the trailing content. |
| <https://github.com/jhy/jsoup/issues/162> |
| |
| * Tweaked whitespace checks to align with HTML spec |
| <https://github.com/jhy/jsoup/pull/175> |
| |
| * Tweaked HTML output of closing script and style tags to not add an extraneous newline when pretty-printing. |
| |
| * Substantially reduced default memory allocation within Node.outerHtml, to reduce memory pressure when serialising |
| smaller DOMs. |
| <https://github.com/jhy/jsoup/issues/143> |
| |
| *** Release 1.6.1 [2011-Jul-02] |
| * Fixed Java 1.5 compatibility. |
| <https://github.com/jhy/jsoup/issues/103> |
| |
| * Fixed an issue when parsing <script> tags in body where the tokeniser wouldn't switch to the InScript state, which |
| meant that data wasn't parsed correctly. |
| <https://github.com/jhy/jsoup/issues/104> |
| |
| * Fixed an issue with a missing quote when serialising DocumentType nodes. |
| <https://github.com/jhy/jsoup/issues/109> |
| |
| * Fixed issue where a single 0 character was lexed incorrectly as a null character. |
| <https://github.com/jhy/jsoup/issues/107> |
| |
| * Fixed normalisation of carriage returns to newlines on input HTML. |
| <https://github.com/jhy/jsoup/issues/110> |
| |
| * Disabled memory mapped files when loading files from disk, to improve compatibility in Windows environments. |
| |
| *** Release 1.6.0 [2011-Jun-13] |
| * HTML5 conformant parser. Complete reimplementation of HTML tokenisation and parsing, to implement the |
| http://whatwg.org/html spec. This ensures jsoup parses HTML identically to current modern browsers. |
| |
| * When parsing files from disk, files are loaded via memory mapping, to increase parse speed. |
| |
| * Reduced memory overhead and lowered garbage collector pressure with Attribute, Node and Element model optimisations. |
| |
| * Improved "abs:" absolute URL handling in Elements.attr("abs:href") and Node.hasAttr("abs:href"). |
| <https://github.com/jhy/jsoup/issues/97> |
| |
| * Fixed cookie handling issue in jsoup.Connect where empty cookies would cause a validation exception. |
| <https://github.com/jhy/jsoup/issues/87> |
| |
| * Added jsoup.Connect configuration options to allow HTTP errors to be ignored, and the content-type to be ignored. |
| Contributed by Jesse Piascik (piascikj) |
| <https://github.com/jhy/jsoup/pull/78> |
| |
| * Added Node.before(node) and Node.after(node), to allow existing nodes to be moved, or new nodes to be inserted, into |
| precise DOM positions. |
| |
| * Added Node.unwrap() and Elements.unwrap(), to remove a node but keep its contents. Useful for e.g. removing unwanted |
| formatting tags. |
| <https://github.com/jhy/jsoup/issues/100> |
| |
| * Now handles unclosed <title> tags in document by breaking out of the title at the next start tag, instead of |
| eating up to the end of the document. |
| <https://github.com/jhy/jsoup/issues/82> |
| |
| * Added OSGi bundle support to the jsoup package jar. |
| <https://github.com/jhy/jsoup/issues/98> |
| |
| *** Release 1.5.2 [2011-Feb-27] |
| * Fixed issue with selector parser where some boolean AND + OR combined queries (e.g. "meta[http-equiv], meta[content]") |
| were being parsed incorrectly as OR only queries (e.g. former as "meta, [http-equiv], meta[content]") |
| |
| * Fixed issue where a content-type specified in a meta tag may not be reliably detected, due to the above issue. |
| |
| * Updated Element.text() and Element.ownText() methods to ensure <br> tags output as whitespace. |
| |
| * Tweaked Element.outerHtml() method to not generate initial newline on first output element. |
| |
| *** Release 1.5.1 [2011-Feb-19] |
| |
| * Integrated new single-pass selector evaluators, contributed by knz (Anton Kazennikov). This significantly speeds up |
| the execution of combined selector queries. |
| |
| * Implemented workaround to fix Scala support. Contributed by bbeck (Brandon Beck). |
| |
| * Added ability to change an element's tag with Element.tagName(String), and to change many at once |
| with Elements.tagName(String). |
| |
| * Added Node.wrap(html), Node.before(html), and Node.after(html), to allow HTML to be easily added to all nodes. These |
| functions were previously supported on Elements only. |
| |
| * Added TextNode.splitText(index), which allows a text node to be split into two nodes at a specified index point. |
| This is convenient if you need to surround some text in an element. |
| |
| * Updated Jsoup.Connection so that cookies set on a redirect response will be included on both the redirected request |
| and response. |
| |
| * Infinite redirection loops in Jsoup.Connect are now prevented. |
| |
| * Allow Jsoup.Connect to parse application/xml and application/xhtml+xml responses. |
| |
| * Modified Jsoup.Connect to always follow relative links, regardless of the underlying HTTP sub-system. |
| |
| * Defined U (underline) element as an inline tag. |
| |
| * Force strict entity matching (must be &xxx; and not &xxx) in element attributes. |
| |
| * Implemented clone method for Elements (contributed by knz). |
| |
| * Fixed tokeniser optimisation when scanning for missing data element close tags. |
| |
| * Fixed issue when using descendant regex attribute selectors. |
| |
| *** Release 1.4.1 [2010-Nov-23] |
| |
| * Added ability to load and parse HTML from an input stream. |
| |
| * Implemented Node.clone() to create deep, independent copies of Nodes, Elements, and Documents. |
| |
| * Added :not() selector, to find elements that do not match the selector. E.g. div:not(.logo) finds divs that |
| do not have the "logo" class name. |
| |
| * Added Elements.not(selector) method, to remove undesired results from selector results. |
| |
| * Implemented DataNode.setWholeData() to allow updating of script and style data contents. |
| |
| * Relaxed parse rules of H1 - H6, to allow nested content. This is against spec, but matches browser and publisher |
| behaviour. |
| |
| * Relaxed parse rule of SPAN to treat as block, to allow nested block content. |
| |
| * Fixed issue in jsoup.connect when extracting character set from content-type header; now supports quoted |
| charset declaration. |
| |
| * Fixed support for jsoup.connect to follow redirects between http & https URLs. |
| |
| * Document normalisation now more enthusiastically enforces the correct document structure. |
| |
| * Support node.outerHtml() method when node has no parent (e.g. when it has been removed from its DOM tree) |
| |
| * Fixed support for HTML entities with numbers in name (e.g. ¾, ¹). |
| |
| * Fixed absolute URL generation from relative URLs which are only query strings. |
| |
| *** Release 1.3.3 [2010-Sep-19] |
| * Implemented Elements.empty() and Elements.remove(). This allows easy element removal, like: |
| doc.select("iframe").remove(); |
| |
| * Fixed issue in Entities when unescaping $ ("$") |
| <http://github.com/jhy/jsoup/issues/issue/34> |
| |
| * Added restricted XHTML output entity option |
| <http://github.com/jhy/jsoup/issues/issue/35> |
| |
| *** Release 1.3.2 [2010-Aug-30] |
| * Treat HTTP headers as case insensitive in Jsoup.Connection. Improves compatibility for HTTP responses. |
| |
| * Improved malformed table parsing by implementing ignorable end tags. |
| |
| *** Release 1.3.1 [2010-Aug-23] |
| * Removed dependency on Apache Commons-lang. Jsoup now has no external dependencies. |
| |
| * Added new Connection implementation, to enable easier and richer HTTP requests that parse to Documents. This includes |
| support for gzip responses, cookies, headers, data parameters, user-agent, referrer, etc. |
| |
| * Added Element.ownText() method, to get only the direct text of an element, not including the text of its children. |
| |
| * Added support for selectors :containsOwn(text) and :matchesOwn(regex), to supplement Element.ownText(). |
| |
| * Added support for non-pretty-printed HTML output, to more closely mirror the input HTML. |
| |
| * Further speed optimisations for parsing and output generation. |
| |
| * Fixed support for case-sensitive HTML escape entities. |
| <http://github.com/jhy/jsoup/issues/issue/31> |
| |
| * Fixed issue when parsing tags with keyless attributes. |
| <http://github.com/jhy/jsoup/issues/issue/32> |
| |
| *** Release 1.2.3 [2010-Aug-04] |
| * Added support for automatic input character set detection and decoding. Jsoup now automatically detects the encoding |
| character set when parsing HTML from a File or URL. The parser checks the content-type header, then the |
| <meta http-equiv> or <meta charset> tag, and finally falls back to UTF-8. |
| |
| * Added ability to configure the document's output charset, to control which characters are HTML escaped, and which |
| are kept intact. The output charset defaults to the document's input charset. This simplifies non-ascii output. |
| |
| * Added full support for all new HTML5 tags. |
| |
| * Added support for HTML5 dataset custom data attributes, with the Element.dataset() map. |
| |
| * Added support for the [^attributePrefix] selector query, to find elements with attributes starting with a prefix. |
| Useful for finding elements with datasets: [^data-] matches <p data-name="jsoup"> |
| |
| * Added support for namespaced elements (<fb:name>) and selectors to find them (fb|name) |
| |
| * Implemented Node.ownerDocument DOM method |
| |
| * Improved implicit table element handling (particularly around thead, tbody, and tfoot). |
| |
| * Improved HTML output format for empty elements and auto-detected self closing tags |
| |
| * Changed DT & DD tags to block-mode tags, to follow practice over spec |
| |
| * Added support for tag names with - and _ (<abc_foo>, <abc-foo>) |
| |
| * Handle tags with internal trailing space (<foo >) |
| |
| * Fixed support for character class regular expressions in [attr=~regex] selector |
| |
| *** Release 1.2.2 [2010-Jul-11] |
| |
| * Performance optimisation: |
| - core HTML parser engine now 3.5 times faster |
| - HTML generator now 2.5 times faster |
| - much lower memory use and garbage collection time |
| |
| * Added support for :matches(regex) selector, to find elements containing text matching regular expression |
| |
| * Added support for [key~=regex] attribute selector, to find elements with attribute values matching regular expression |
| |
| * Upgraded the selector query parser to allow nested selectors like 'div:has(p:matches(regex))' |
| |
| *** Release 1.2.1 [2010-Jun-21] |
| * Added .before(html) and .after(html) methods to Element and Elements, to insert sibling HTML |
| |
| * Added :contains(text) selector, to search for elements containing the specified text |
| |
| * Added :has(selector) pseudo-selector |
| <http://github.com/jhy/jsoup/issues/issue/20> |
| |
| * Added Element#parents and Elements#parents to retrieve an element's ancestor chain |
| <http://github.com/jhy/jsoup/issues/issue/20> |
| |
| * Fixes an issue where appending / prepending rows to a table (or to similar implicit |
| element structures) would create a redundant wrapping elements |
| <http://github.com/jhy/jsoup/issues/issue/21> |
| |
| * Improved implicit close tag heuristic detection when parsing malformed HTML |
| |
| * Fixes an issue where text content after a script (or other data-node) was |
| incorrectly added to the data node. |
| <http://github.com/jhy/jsoup/issues/issue/22> |
| |
| * Fixes an issue where text order was incorrect when parsing pre-document |
| HTML. |
| <http://github.com/jhy/jsoup/issues/issue/23> |
| |
| *** Release 1.1.1 [2010-Jun-08] |
| * Added selector support for :eq, :lt, and :gt |
| <http://github.com/jhy/jsoup/issues/issue/16> |
| |
| * Added TextNode#text and TextNode#text(String) |
| <http://github.com/jhy/jsoup/issues/issue/18> |
| |
| * Throw exception if trying to parse non-text content |
| <http://github.com/jhy/jsoup/issues/issue/17> |
| |
| * Added Node#remove and Node#replaceWith |
| <http://github.com/jhy/jsoup/issues/issue/19> |
| |
| * Allow _ and - in CSS ID selectors (per CSS spec). |
| <http://github.com/jhy/jsoup/issues/issue/10> |
| |
| * Relative links are resolved to absolute when cleaning, to normalize |
| output and to verify safe protocol. (Were previously discarded.) |
| <http://github.com/jhy/jsoup/issues/issue/12> |
| |
| * Allow combinators at start of selector query, for query refinements |
| <http://github.com/jhy/jsoup/issues/issue/13> |
| |
| * Added Element#val() and #val(String) methods, for form values |
| <http://github.com/jhy/jsoup/issues/issue/14> |
| |
| * Changed textarea contents to parse as TextNodes, not DataNodes, |
| so contents visible to text() (and val(), as treated as form input) |
| |
| * Fixed support for Java 1.5 |
| |
| *** Release 0.3.1 (2010-Feb-20) |
| * New features: supports Elements#html(), html(String), |
| prepend(String), append(String); bulk methods for corresponding |
| methods in Element. |
| |
| * New feature: Jsoup.isValid(html, safelist) method for user input |
| form validation. |
| |
| * Improved Elements.attr(String) to find first matching element |
| with attribute. |
| |
| * Fixed assertion error when cleaning HTML with empty attribute |
| <http://github.com/jhy/jsoup/issues/issue/7> |
| |
| *** Release 0.2.2 (2010-Feb-07) |
| * jsoup packages are now available in the Maven central repository. |
| |
| * New feature: supports Element#addClass, removeClass, toggleClass; |
| also collection class methods on Elements. |
| * New feature: supports Element#wrap(html) and Elements#wrap(html). |
| * New selector syntax: supports E + F adjacent sibling selector |
| * New selector syntax: supports E ~ F preceding sibling selector |
| * New: supports Element#elementSiblingIndex() |
| |
| * Improved document normalisation. |
| * Improved HTML string output format (pretty-print) |
| |
| * Fixed absolute URL resolution issue when a base tag has no href. |
| |
| *** Release 0.1.2 (2010-Feb-02) |
| * Fixed unrecognised tag handler to be more permissive |
| <http://github.com/jhy/jsoup/issues/issue/1> |
| |
| |
| *** Release 0.1.1 (2010-Jan-31) |
| * Initial beta release of jsoup |