| <?xml version="1.0"?> |
| <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" |
| "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ |
| <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> |
| <!ENTITY version SYSTEM "version.xml"> |
| ]> |
| <chapter id="what-is-harfbuzz"> |
| <title>What is HarfBuzz?</title> |
| <para> |
| HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you |
| give HarfBuzz a font and a string containing a sequence of Unicode |
| codepoints, HarfBuzz selects and positions the corresponding |
| glyphs from the font, applying all of the necessary layout rules |
| and font features. HarfBuzz then returns the string to you in the |
| form that is correctly arranged for the language and writing |
| system. |
| </para> |
| <para> |
| HarfBuzz can properly shape all of the world's major writing |
| systems. It runs on all major operating systems and software |
| platforms and it supports the modern font formats in use |
| today. |
| </para> |
| <section id="what-is-text-shaping"> |
| <title>What is text shaping?</title> |
| <para> |
| Text shaping is the process of translating a string of character |
| codes (such as Unicode codepoints) into a properly arranged |
| sequence of glyphs that can be rendered onto a screen or into |
| final output form for inclusion in a document. |
| </para> |
| <para> |
| The shaping process is dependent on the input string, the active |
| font, the script (or writing system) that the string is in, and |
| the language that the string is in. |
| </para> |
| <para> |
| Modern software systems generally only deal with strings in the |
| Unicode encoding scheme (although legacy systems and documents may |
| involve other encodings). |
| </para> |
| <para> |
| There are several font formats that a program might |
| encounter, each of which has a set of standard text-shaping |
| rules. |
| </para> |
| <para>The dominant format is <ulink |
| url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The |
| OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for |
| various scripts from around the world. These shaping models depend on |
| the font including certain features in its <literal>GSUB</literal> |
| and <literal>GPOS</literal> tables. |
| </para> |
| <para> |
| Alternatively, OpenType fonts can include shaping features for |
| the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model. |
| </para> |
| <para> |
| TrueType fonts can also include OpenType shaping |
| features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple |
| Advanced Typography</ulink> (AAT) tables to implement shaping |
| support. AAT fonts are generally only found on macOS and iOS systems. |
| </para> |
| <para> |
| Text strings will usually be tagged with a script and language |
| tag that provide the context needed to perform text shaping |
| correctly. The necessary <ulink |
| url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">Script</ulink> |
| and <ulink |
| url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink> |
| tags are defined by OpenType. |
| </para> |
| </section> |
| |
| <section id="why-do-i-need-a-shaping-engine"> |
| <title>Why do I need a shaping engine?</title> |
| <para> |
| Text shaping is an integral part of preparing text for |
| display. Before a Unicode sequence can be rendered, the |
| codepoints in the sequence must be mapped to the corresponding |
| glyphs provided in the font, and those glyphs must be positioned |
| correctly relative to each other. For many of the scripts |
| supported in Unicode, these steps involve script-specific layout |
| rules, including complex joining, reordering, and positioning |
| behavior. Implementing these rules is the job of the shaping engine. |
| </para> |
| <para> |
| Text shaping is a fairly low-level operation. HarfBuzz is |
| used directly by text-handling libraries like <ulink |
| url="https://www.pango.org/">Pango</ulink>, as well as by the layout |
| engines in Firefox, LibreOffice, and Chromium. Unless you are |
| <emphasis>writing</emphasis> one of these layout engines |
| yourself, you will probably not need to use HarfBuzz: normally, |
| a layout engine, toolkit, or other library will turn text into |
| glyphs for you. |
| </para> |
| <para> |
| However, if you <emphasis>are</emphasis> writing a layout engine |
| or graphics library yourself, then you will need to perform text |
| shaping, and this is where HarfBuzz can help you. |
| </para> |
| <para> |
| Here are some specific scenarios where a text-shaping engine |
| like HarfBuzz helps you: |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| OpenType fonts contain a set of glyphs (that is, shapes |
| to represent the letters, numbers, punctuation marks, and |
| all other symbols), which are indexed by a <literal>glyph ID</literal>. |
| </para> |
| <para> |
| A particular glyph ID within the font does not necessarily |
| correlate to a predictable Unicode codepoint. For instance, |
| some fonts have the letter "a" as glyph ID 1, but |
| many others do not. In order to retrieve the right glyph |
| from the font to display "a", you need to consult |
| the table inside the font (the <literal>cmap</literal> |
| table) that maps Unicode codepoints to glyph IDs. In other |
| words, <emphasis>text shaping turns codepoints into glyph |
| IDs</emphasis>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Many OpenType fonts contain ligatures: combinations of |
| characters that are rendered as a single unit. For instance, |
| it is common for the <literal>fi</literal> letter |
| combination to appear in print as the single ligature glyph |
| "fi". |
| </para> |
| <para> |
| Whether you should render an "f, i" sequence |
| as <literal>fi</literal> or as "fi" does not |
| depend on the input text. Instead, it depends on the whether |
| or not the font includes an "fi" glyph and on the |
| level of ligature application you wish to perform. The font |
| and the amount of ligature application used are under your |
| control. In other words, <emphasis>text shaping involves |
| querying the font's ligature tables and determining what |
| substitutions should be made</emphasis>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| While ligatures like "fi" are optional typographic |
| refinements, some languages <emphasis>require</emphasis> certain |
| substitutions to be made in order to display text correctly. |
| </para> |
| <para> |
| For example, in Tamil, when the letter "TTA" (ட) |
| letter is followed by "U" (உ), the pair |
| must be replaced by the single glyph "டு". The |
| sequence of Unicode characters "டஉ" needs to be |
| substituted with a single "டு" glyph from the |
| font. |
| </para> |
| <para> |
| But "டு" does not have a Unicode codepoint. To |
| find this glyph, you need to consult the table inside |
| the font (the <literal>GSUB</literal> table) that contains |
| substitution information. In other words, <emphasis>text shaping |
| chooses the correct glyph for a sequence of characters |
| provided</emphasis>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Similarly, each Arabic character has four different variants |
| corresponding to the different positions it might appear in |
| within a sequence. Inside a font, there will be separate |
| glyphs for the initial, medial, final, and isolated forms of |
| each letter, each at a different glyph ID. |
| </para> |
| <para> |
| Unicode only assigns one codepoint per character, so a |
| Unicode string will not tell you which glyph variant to use |
| for each character. To decide, you need to analyze the whole |
| string and determine the appropriate glyph for each character |
| based on its position. In other words, <emphasis>text |
| shaping chooses the correct form of the letter by its |
| position and returns the correct glyph from the font</emphasis>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Other languages involve marks and accents that need to be |
| rendered in specific positions relative a base character. For |
| instance, the Moldovan language includes the Cyrillic letter |
| "zhe" (ж) with a breve accent, like so: "ӂ". |
| </para> |
| <para> |
| Some fonts will provide this character as a single |
| zhe-with-breve glyph, but other fonts will not and, instead, |
| will expect the rendering engine to form the character by |
| superimposing the separate "ж" and "˘" |
| glyphs. |
| </para> |
| <para> |
| But exactly where you should draw the breve depends on the |
| height and width of the preceding zhe glyph. To find the |
| right position, you need to consult the table inside |
| the font (the <literal>GPOS</literal> table) that contains |
| positioning information. |
| In other words, <emphasis>text shaping tells you whether you |
| have a precomposed glyph within your font or if you need to |
| compose a glyph yourself out of combining marks—and, |
| if so, where to position those marks.</emphasis> |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> |
| If tasks like these are something that you need to do, then you |
| need a text shaping engine. You could use Uniscribe if you are |
| writing Windows software; you could use CoreText on macOS; or |
| you could use HarfBuzz. |
| </para> |
| <note> |
| <para> |
| In the rest of this manual, the text will assume that the reader |
| is that implementor of a text-layout engine. |
| </para> |
| </note> |
| </section> |
| |
| |
| <section> |
| <title>What does HarfBuzz do?</title> |
| <para> |
| HarfBuzz provides text shaping through a cross-platform |
| C API that accepts sequences of Unicode codepoints as input. Currently, |
| the following OpenType shaping models are supported: |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| Indic (covering Devanagari, Bengali, Gujarati, |
| Gurmukhi, Kannada, Malayalam, Oriya, Tamil, Telugu, and |
| Sinhala) |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Arabic (covering Arabic, N'Ko, Syriac, and Mongolian) |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Thai and Lao |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Khmer |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Myanmar |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Tibetan |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Hangul |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Hebrew |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| The Universal Shaping Engine or <emphasis>USE</emphasis> |
| (covering complex scripts not covered by the above shaping |
| models) |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| A default shaping model for non-complex scripts |
| (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh, |
| and many others) |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Emoji (including emoji modifier sequences, flag sequences, |
| and ZWJ sequences) |
| </para> |
| </listitem> |
| </itemizedlist> |
| |
| <para> |
| In addition to OpenType shaping, HarfBuzz supports the latest |
| version of Graphite shaping (the "Graphite 2" model) and AAT |
| shaping. |
| </para> |
| |
| <para> |
| HarfBuzz can read and understand TrueType fonts (.ttf), TrueType |
| collections (.ttc), and OpenType fonts (.otf, including those |
| fonts that contain TrueType-style outlines and those that |
| contain PostScript CFF or CFF2 outlines). |
| </para> |
| |
| <para> |
| HarfBuzz is designed and tested to run on top of the FreeType |
| font renderer. It can run on Linux, Android, Windows, macOS, and |
| iOS systems. |
| </para> |
| |
| <para> |
| In addition to its core shaping functionality, HarfBuzz provides |
| functions for accessing other font features, including optional |
| GSUB and GPOS OpenType features, as well as |
| all color-font formats (<literal>CBDT</literal>, |
| <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and |
| <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz |
| also includes a font-subsetting feature. HarfBuzz can perform |
| some low-level math-shaping operations, although it does not |
| currently perform full shaping for mathematical typesetting. |
| </para> |
| |
| <para> |
| A suite of command-line utilities is also provided in the |
| source-code tree, designed to help users test and debug |
| HarfBuzz's features on real-world fonts and input. |
| </para> |
| </section> |
| |
| <section id="what-harfbuzz-doesnt-do"> |
| <title>What HarfBuzz doesn't do</title> |
| <para> |
| HarfBuzz will take a Unicode string, shape it, and give you the |
| information required to lay it out correctly on a single |
| horizontal (or vertical) line using the font provided. That is the |
| extent of HarfBuzz's responsibility. |
| </para> |
| <para> |
| It is important to note that if you are implementing a complete |
| text-layout engine you may have other responsibilities that |
| HarfBuzz will <emphasis>not</emphasis> help you with. For example: |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| HarfBuzz won't help you with bidirectionality. If you want to |
| lay out text that includes a mix of Hebrew and English, you |
| will need to ensure that each buffer provided to HarfBuzz |
| has all of its characters in the same order and that the |
| directionality of the buffer is set correctly. This may mean |
| segmenting the text before it is placed into HarfBuzz buffers. In |
| other words, the user will hit the keys in the following |
| sequence: |
| </para> |
| <programlisting> |
| A B C [space] ג ב א [space] D E F |
| </programlisting> |
| <para> |
| but will expect to see in the output: |
| </para> |
| <programlisting> |
| ABC אבג DEF |
| </programlisting> |
| <para> |
| This reordering is called <emphasis>bidi processing</emphasis> |
| ("bidi" is short for bidirectional), and there's an |
| algorithm as an annex to the Unicode Standard which tells you how |
| to process a string of mixed directionality. |
| Before sending your string to HarfBuzz, you may need to apply the |
| bidi algorithm to it. Libraries such as <ulink |
| url="http://icu-project.org/">ICU</ulink> and <ulink |
| url="http://fribidi.org/">fribidi</ulink> can do this for you. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| HarfBuzz won't help you with text that contains different font |
| properties. For instance, if you have the string "a |
| <emphasis>huge</emphasis> breakfast", and you expect |
| "huge" to be italic, then you will need to send three |
| strings to HarfBuzz: <literal>a</literal>, in your Roman font; |
| <literal>huge</literal> using your italic font; and |
| <literal>breakfast</literal> using your Roman font again. |
| </para> |
| <para> |
| Similarly, if you change the font, font size, script, |
| language, or direction within your string, then you will |
| need to shape each run independently and output them |
| independently. HarfBuzz expects to shape a run of characters |
| that all share the same properties. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| HarfBuzz won't help you with line breaking, hyphenation, or |
| justification. As mentioned above, HarfBuzz lays out the string |
| along a <emphasis>single line</emphasis> of, notionally, |
| infinite length. If you want to find out where the potential |
| word, sentence and line break points are in your text, you |
| could use the ICU library's break iterator functions. |
| </para> |
| <para> |
| HarfBuzz can tell you how wide a shaped piece of text is, which is |
| useful input to a justification algorithm, but it knows nothing |
| about paragraphs, lines or line lengths. Nor will it adjust the |
| space between words to fit them proportionally into a line. |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> |
| As a layout-engine implementor, HarfBuzz will help you with the |
| interface between your text and your font, and that's something |
| that you'll need—what you then do with the glyphs that your font |
| returns is up to you. |
| </para> |
| </section> |
| |
| <section id="why-is-it-called-harfbuzz"> |
| <title>Why is it called HarfBuzz?</title> |
| <para> |
| HarfBuzz began its life as text-shaping code within the FreeType |
| project (and you will see references to the FreeType authors |
| within the source code copyright declarations), but was then |
| extracted out to its own project. This project is maintained by |
| Behdad Esfahbod, who named it HarfBuzz. Originally, it was a |
| shaping engine for OpenType fonts—"HarfBuzz" is |
| the Persian for "open type". |
| </para> |
| </section> |
| </chapter> |