blob: 58a1d649525a087b483aed44e82a240259978739 [file] [log] [blame] [view]
Simon Cozensf0200442023-02-23 16:38:39 +00001# The web assembly shaper
2
3If the standard OpenType shaping engine doesn't give you enough flexibility, Harfbuzz allows you to write your own shaping engine in WebAssembly and embed it into your font! Any font which contains a `Wasm` table will be passed to the WebAssembly shaper.
4
Simon Cozens56af88d2023-05-17 12:38:59 +01005## What you can and can't do: the WASM shaper's role in shaping
6
7The Harfbuzz shaping engine, unlike its counterparts CoreText and DirectWrite, is only responsible for a small part of the text rendering process. Specifically, Harfbuzz is purely responsible for *shaping*; although Harfbuzz does have APIs for accessing glyph outlines, typically other libraries in the free software text rendering stack are responsible for text segmentation into runs, outline scaling and rasterizing, setting text on lines, and so on.
8
9Harfbuzz is therefore restricted to turning a buffer of codepoints for a segmented run of the same script, language, font, and variation settings, into glyphs and positioning them. This is also all that you can do with the WASM shaper; you can influence the process of mapping a string of characters into an array of glyphs, you can determine how those glyphs are positioned and their advance widths, but you cannot manipulate outlines, variations, line breaks, or affect text layout between texts of different font, variation, language, script or OpenType feature selection.
10
11## The WASM shaper interface
12
13The WASM code inside a font is expected to export a function called `shape` which takes five int32 arguments and returns an int32 status value. (Zero for failure, any other value for success.) Three of the five arguments are tokens which can be passed to the API functions exported to your WASM code by the host shaping engine:
14
15* A *shape plan* token, which can largely be ignored.
16* A *font* token.
17* A *buffer* token.
18* A *feature* array.
19* The number of features.
20
21The general goal of WASM shaping involves receiving and manipulating a *buffer contents* structure, which is an array of *infos* and *positions* (as defined below). Initially this buffer will represent an input string in Unicode codepoints. By the end of your `shape` function, it should represent a set of glyph IDs and their positions. (User-supplied WASM code will manipulate the buffer through *buffer tokens*; the `buffer_copy_contents` and `buffer_set_contents` API functions, defined below, use these tokens to exchange buffer information with the host shaping engine.)
22
23* The `buffer_contents_t` structure
24
25| type | field | description|
26| - | - | - |
27| uint32 | length | Number of items (characters or glyphs) in the buffer
28| glyph_info_t | infos | An array of `length` glyph infos |
29| glyph_position_t | positions | An array of `length` glyph positions |
30
31* The `glyph_info_t` structure
32
33| type | field | description|
34| - | - | - |
35| uint32 | codepoint | (On input) A Unicode codepoint. (On output) A glyph ID. |
36| uint32 | mask | Unused in WASM; can be user-defined |
37| uint32 | cluster | Index of start of this graphical cluster in input string |
38| uint32 | var1 | Reserved |
39| uint32 | var2 | Reserved |
40
41The `cluster` field is used to glyphs in the output glyph stream back to characters in the input Unicode sequence for hit testing, cursor positioning, etc. It must be set to a monotonically increasing value across the buffer.
42
43* The `glyph_position_t` structure
44
45| type | field | description|
46| - | - | - |
47| int32 | x_advance | X advance of the glyph |
48| int32 | y_advance | Y advance of the glyph |
49| int32 | x_offset | X offset of the glyph |
50| int32 | y_offset | Y offset of the glyph |
51| uint32 | var | Reserved |
52
53* The `feature_t` array
54
Khaled Hosny8d99db52023-07-09 03:41:48 +030055To communicate user-selected OpenType features to the user-defined WASM shaper, the host shaping engine passes an array of feature structures:
Simon Cozens56af88d2023-05-17 12:38:59 +010056
57| type | field | description|
58| - | - | - |
59| uint32 | tag | Byte-encoded feature tag |
60| uint32 | value | Value: 0=off, 1=on, other values used for alternate selection |
61| uint32 | start | Index into the input string representing start of the active region for this feature selection (0=start of string) |
62| uint32 | end | Index into the input string representing end of the active region for this feature selection (-1=end of string) |
63
64## API functions available
65
66To assist the shaping code in mapping codepoints to glyphs, the WASM shaper exports the following functions. Note that these are the low level API functions; WASM authors may prefer to use higher-level abstractions around these functions, such as the `harfbuzz-wasm` Rust crate provided by Harfbuzz.
67
68### Sub-shaping
69
70* `shape_with`
71
72```C
73bool shape_with(
74 uint32 font_token,
75 uint32 buffer_token,
76 feature_t* features,
77 uint32 num_features,
78 char* shaper
79)
80```
81
82Run another shaping engine's shaping process on the given font and buffer. The only shaping engine guaranteed to be available is `ot`, the OpenType shaper, but others may also be available. This allows the WASM author to process a buffer "normally", before further manipulating it.
83
84### Buffer access
85
86* `buffer_copy_contents`
87
88```C
89bool buffer_copy_contents(
90 uint32 buffer_token,
91 buffer_contents_t* buffer_contents
92)
93```
94
95Retrieves the contents of the host shaping engine's buffer into the `buffer_contents` structure. This should typically be called at the beginning of shaping.
96
97* `buffer_set_contents`
98
99```C
100bool buffer_set_contents(
101 uint32 buffer_token,
102 buffer_contents_t* buffer_contents
103)
104```
105
106Copy the `buffer_contents` structure back into the host shaping engine's buffer. This should typically be called at the end of shaping.
107
108* `buffer_contents_free`
109
110```C
111bool buffer_contents_free(buffer_contents_t* buffer_contents)
112```
113
114Releases the memory taken up by the buffer contents structure.
115
116* `buffer_contents_realloc`
117
118```C
119bool buffer_contents_realloc(
120 buffer_contents_t* buffer_contents,
121 uint32 size
122)
123```
124
125Requests that the buffer contents structure be resized to the given size.
126
127* `buffer_get_direction`
128
129```C
130uint32 buffer_get_direction(uint32 buffer_token)
131```
132
133Returns the buffer's direction:
134
135* 0 = invalid
136* 4 = left to right
137* 5 = right to left
138* 6 = top to bottom
139* 7 = bottom to top
140
141* `buffer_get_script`
142
143```C
144uint32 buffer_get_script(uint32 buffer_token)
145```
146
147Returns the byte-encoded OpenType script tag of the buffer.
148
149* `buffer_reverse`
150
151```C
152void buffer_reverse(uint32 buffer_token)
153```
154
155Reverses the order of items in the buffer.
156
157* `buffer_reverse_clusters`
158
159```C
160void buffer_reverse_clusters(uint32 buffer_token)
161```
162
163Reverses the order of items in the buffer while keeping items of the same cluster together.
164
165## Font handling functions
166
167(In the following functions, a *font* is a specific instantiation of a *face* at a particular scale factor and variation position.)
168
169* `font_create`
170
171```C
172uint32 font_create(uint32 face_token)
173```
174
175Returns a new *font token* from the given *face token*.
176
177* `font_get_face`
178
179```C
180uint32 font_get_face(uint32 font_token)
181```
182
183Creates a new *face token* from the given *font token*.
184
185* `font_get_scale`
186
187```C
188void font_get_scale(
189 uint32 font_token,
190 int32* x_scale,
191 int32* y_scale
192)
193```
194
195Returns the scale of the current font.
196
197* `font_get_glyph`
198
199```C
200uint32 font_get_glyph(
201 uint32 font_token,
202 uint32 codepoint,
203 uint32 variation_selector
204)
205```
206
207Returns the nominal glyph ID for the given codepoint, using the `cmap` table of the font to map Unicode codepoint (and variation selector) to glyph ID.
208
209* `font_get_glyph_h_advance`/`font_get_glyph_v_advance`
210
211```C
212uint32 font_get_glyph_h_advance(uint32 font_token, uint32 glyph_id)
213uint32 font_get_glyph_v_advance(uint32 font_token, uint32 glyph_id)
214```
215
216Returns the default horizontal and vertical advance respectively for the given glyph ID the current scale and variations settings.
217
218* `font_get_glyph_extents`
219
220```C
221typedef struct
222{
223 uint32 x_bearing;
224 uint32 y_bearing;
225 uint32 width;
226 uint32 height;
227} glyph_extents_t;
228
229bool font_get_glyph_extents(
230 uint32 font_token,
231 uint32 glyph_id,
232 glyph_extents_t* extents
233)
234```
235
236Returns the glyph's extents for the given glyph ID at current scale and variation settings.
237
238* `font_glyph_to_string`
239
240```C
241void font_glyph_to_string(
242 uint32 font_token,
243 uint32 glyph_id,
244 char* string,
245 uint32 size
246)
247```
248
249Copies the name of the given glyph, or, if no name is available, a string of the form `gXXXX` into the given string.
250
251* `font_copy_glyph_outline`
252
253```C
254typedef struct
255{
256 float x;
257 float y;
258 uint32_t type;
259} glyph_outline_point_t;
260
261typedef struct
262{
263 uint32_t n_points;
264 glyph_outline_point_t* points;
265 uint32_t n_contours;
266 uint32_t* contours;
267} glyph_outline_t;
268
269bool font_copy_glyph_outline(
270 uint32 font_token,
271 uint32 glyph_id,
272 glyph_outline_t* outline
273);
274```
275
276Copies the outline of the given glyph ID, at current scale and variation settings, into the outline structure provided. The outline structure returns an array of points (specifying coordinates and whether the point is oncurve or offcurve) and an array of indexes into the points array representing the end of each contour, similar to the `glyf` table structure.
277
278* `font_copy_coords`/`font_set_coords`
279
280```C
281typedef struct
282{
283 uint32 length;
284 int32* coords;
285} coords_t;
286
287bool font_copy_coords(uint32 font_token, &coords_t coords);
288bool font_set_coords(uint32 font_token, &coords_t coords);
289```
290
291`font_copy_coords` copies the font's variation coordinates into the given structure; the resulting structure has `length` equal to the number of variation axes, with each member of the `coords` array being a F2DOT14 encoding of the normalized variation value.
292
293`font_set_coords` sets the font's variation coordinates. Because the WASM shaper is only responsible for shaping and positioning, not outline drawing, the user should *not* expect this to affect the rendered outlines; the function is only useful in very limited circumstances, such as when instantiating a second variable font and sub-shaping a buffer using this new font.
294
295## Face handling functions
296
297* `face_create`
298
299```C
300typedef struct
301{
302 uint32_t length;
303 char* data;
304} blob_t;
305
306uint32 font_get_face(blob_t* blob)
307```
308
309Creates a new *face token* from the given binary data.
310
311* `face_copy_table`
312
313```C
314void face_copy_table(uint32 face_token, uint32 tag, blob_t* blob)
315```
316
317Copies the binary data in the OpenType table referenced by `tag` into the supplied `blob` structure.
318
319* `face_get_upem`
320
321```C
322uint32 font_get_upem(uint32 face_token)
323```
324
325Returns the units-per-em of the font face.
326
327### Other functions
328
329* `blob_free`
330
331```C
332void blob_free(blob_t* blob)
333```
334
335Frees the memory allocated to a blob structure.
336
337* `glyph_outline_free`
338
339```C
340void glyph_outline_free(glyph_outline_t* glyph_outline)
341```
342
343Frees the memory allocated to a glyph outline structure.
344
345* `script_get_horizontal_direction`
346
347```C
348uint32 script_get_horizontal_direction(uint32 tag)
349```
350
351Returns the horizontal direction for the given ISO 15924 script tag. For return values, see `buffer_get_direction` above.
352
353* `debugprint` / `debugprint1` ... `debugprint4`
354
355```C
356void debugprint(char* str)
357void debugprint1(char* str, int32 arg1)
358void debugprint2(char* str, int32 arg1, int32 arg2)
359void debugprint3(char* str, int32 arg1, int32 arg2, int32 arg3)
Richard Dodd (dodj)02b00d72023-07-19 11:22:07 +0100360void debugprint4(
Simon Cozens56af88d2023-05-17 12:38:59 +0100361 char* str,
362 int32 arg1,
363 int32 arg2,
364 int32 arg3,
365 int32 arg4
366)
367```
368
369Produces a debugging message in the host shaper's log output; the variants `debugprint1` ... `debugprint4` suffix the message with a comma-separated list of the integer arguments.
370
Simon Cozens134cc8e2023-05-17 16:58:49 +0100371## Enabling the WASM shaper when building Harfbuzz
372
Nikolaus Waxweilerca7e7e92023-07-23 22:20:09 +0100373First, you will need the `wasm-micro-runtime` library installed on your computer. Download `wasm-micro-runtime` from [its GitHub repository](https://github.com/bytecodealliance/wasm-micro-runtime/tree/main); then follow [the instructions for building](https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/product-mini/README.md), except run the cmake command from the repository root directory and add the `-DWAMR_BUILD_REF_TYPES=1` flag to the `cmake` line. (You may want to enable "fast JIT".) Then, install it.
Simon Cozens134cc8e2023-05-17 16:58:49 +0100374
Simon Cozens2f21dc22023-05-18 07:14:28 +0100375So, for example:
376
377```
Nikolaus Waxweilerca7e7e92023-07-23 22:20:09 +0100378$ cmake -B build -DWAMR_BUILD_REF_TYPES=1 -DWAMR_BUILD_FAST_JIT=1
379$ cmake --build build --parallel
380$ sudo cmake --build build --target install
Simon Cozens2f21dc22023-05-18 07:14:28 +0100381```
382
383(If you don't want to install `wasm-micro-runtime` globally, you can copy `libiwasm.*` and `libvmlib.a` into a directory that your compiler can see when building Harfbuzz.)
384
385Once `wasm-micro-runtime` is installed, to enable the WASM shaper, you need to add the string `-Dwasm=enabled` to your meson build line. For example:
Simon Cozens134cc8e2023-05-17 16:58:49 +0100386
387```
388$ meson setup build -Dwasm=enabled
389...
390 Additional shapers
391 Graphite2 : NO
392 WebAssembly (experimental): YES
393...
394$ meson compile -C build
395```
396
Simon Cozensf0200442023-02-23 16:38:39 +0000397## How to write a shaping engine in Rust
398
Simon Cozens56af88d2023-05-17 12:38:59 +0100399You may write shaping engines in any language supported by WASM, by conforming to the API described above, but Rust is particularly easy, and we have one of those high-level interface wrappers which makes the process easier. Here are the steps to create an example shaping engine in Rust: (These examples can also be found in `src/wasm/sample/rust`)
Simon Cozensf0200442023-02-23 16:38:39 +0000400
401* First, install wasm-pack, which helps us to generate optimized WASM files. It writes some Javascript bridge code that we don't need, but it makes the build and deployment process much easier:
402
403```
Simon Cozens840b5df2023-02-25 15:20:39 +0000404$ cargo install wasm-pack
Simon Cozensf0200442023-02-23 16:38:39 +0000405```
406
407* Now let's create a new library:
408
409```
Simon Cozens840b5df2023-02-25 15:20:39 +0000410$ cargo new --lib hello-wasm
Simon Cozensf0200442023-02-23 16:38:39 +0000411```
412
Simon Cozens840b5df2023-02-25 15:20:39 +0000413* We need the target to be a dynamic library, and we're going to use `bindgen` to export our Rust function to WASM, so let's put these lines in the `Cargo.toml`. The Harfbuzz sources contain a Rust crate which makes it easy to create the shaper, so we'll specify that as a dependency as well:
Simon Cozensf0200442023-02-23 16:38:39 +0000414
415```toml
416[lib]
417crate-type = ["cdylib"]
418[dependencies]
419wasm-bindgen = "0.2"
Simon Cozens840b5df2023-02-25 15:20:39 +0000420harfbuzz-wasm = { path = "your-harfbuzz-source/src/wasm/rust/harfbuzz-wasm"}
Simon Cozensf0200442023-02-23 16:38:39 +0000421```
422
Simon Cozens840b5df2023-02-25 15:20:39 +0000423*
Simon Cozensf0200442023-02-23 16:38:39 +0000424* And now we'll create our shaper code. In `src/lib.rs`:
425
Simon Cozens840b5df2023-02-25 15:20:39 +0000426```rust
Simon Cozensf0200442023-02-23 16:38:39 +0000427use wasm_bindgen::prelude::*;
428
429#[wasm_bindgen]
Simon Cozens0c905552023-02-26 13:55:17 +0000430pub fn shape(_shape_plan:u32, font_ref: u32, buf_ref: u32, _features: u32, _num_features: u32) -> i32 {
Simon Cozens840b5df2023-02-25 15:20:39 +0000431 1 // success!
Simon Cozensf0200442023-02-23 16:38:39 +0000432}
433```
434
Simon Cozensdb789ea2023-02-25 15:35:37 +0000435This exports a shaping function which takes four arguments, tokens representing the shaping plan, the font and the buffer, and returns a status value. We can pass these tokens back to Harfbuzz in order to use its native functions on the font and buffer objects. More on native functions later - let's get this shaper compiled and added into a font:
Simon Cozensf0200442023-02-23 16:38:39 +0000436
437* To compile the shaper, run `wasm-pack build --target nodejs`:
438
439```
440INFO]: 🎯 Checking for the Wasm target...
441[INFO]: 🌀 Compiling to Wasm...
Simon Cozens840b5df2023-02-25 15:20:39 +0000442 Compiling hello-wasm v0.1.0 (...)
Simon Cozensf0200442023-02-23 16:38:39 +0000443 Finished release [optimized] target(s) in 0.20s
444[WARN]: ⚠️ origin crate has no README
445[INFO]: ⬇️ Installing wasm-bindgen...
446[INFO]: Optimizing wasm binaries with `wasm-opt`...
447[INFO]: Optional fields missing from Cargo.toml: 'description', 'repository', and 'license'. These are not necessary, but recommended
448[INFO]: ✨ Done in 0.40s
449```
450
Simon Cozens840b5df2023-02-25 15:20:39 +0000451You'll find the output WASM file in `pkg/hello_wasm_bg.wasm`
Simon Cozensf0200442023-02-23 16:38:39 +0000452
453* Now we need to get it into a font.
454
Simon Cozens840b5df2023-02-25 15:20:39 +0000455We provide a utility to do this called `addTable.py` in the `src/` directory:
Simon Cozensf0200442023-02-23 16:38:39 +0000456
457```
Simon Cozens840b5df2023-02-25 15:20:39 +0000458% python3 ~/harfbuzz/src/addTable.py test.ttf test-wasm.ttf pkg/hello_wasm_bg.wasm
Simon Cozensf0200442023-02-23 16:38:39 +0000459```
460
461And now we can run it!
462
463```
464% hb-shape test-wasm.ttf abc --shapers=wasm
Simon Cozensf0200442023-02-23 16:38:39 +0000465[cent=0|sterling=1|fraction=2]
466```
467
468(The `--shapers=wasm` isn't necessary, as any font with a `Wasm` table will be sent to the WASM shaper if it's enabled, but it proves the point.)
469
470Congratulations! Our shaper did nothing, but in Rust! Now let's do something - it's time for the Hello World of WASM shaping.
471
472* To say hello world, we're going to have to use a native function.
473
Simon Cozens840b5df2023-02-25 15:20:39 +0000474In debugging builds of Harfbuzz, we can print some output from the web assembly module to the host's standard output using the `debug` function. To make this easier, we've got the `harfbuzz-wasm` crate:
Simon Cozensf0200442023-02-23 16:38:39 +0000475
Simon Cozens840b5df2023-02-25 15:20:39 +0000476```rust
477use harfbuzz_wasm::debug;
Simon Cozensf0200442023-02-23 16:38:39 +0000478
479#[wasm_bindgen]
Simon Cozens0c905552023-02-26 13:55:17 +0000480pub fn shape(_shape_plan:u32, _font_ref: u32, _buf_ref: u32, _features: u32, _num_features: u32) -> i32 {
Simon Cozens840b5df2023-02-25 15:20:39 +0000481 debug("Hello from Rust!\n");
Simon Cozensf0200442023-02-23 16:38:39 +0000482 1
483}
484```
485
Simon Cozens840b5df2023-02-25 15:20:39 +0000486With this compiled into a WASM module, and installed into our font again, finally our fonts can talk to us!
Simon Cozensf0200442023-02-23 16:38:39 +0000487
488```
489$ hb-shape test-wasm.ttf abc
490Hello from Rust!
Simon Cozensf0200442023-02-23 16:38:39 +0000491[cent=0|sterling=1|fraction=2]
492```
Simon Cozens840b5df2023-02-25 15:20:39 +0000493
494Now let's start to do some actual, you know, *shaping*. The first thing a shaping engine normally does is (a) map the items in the buffer from Unicode codepoints into glyphs in the font, and (b) set the advance width of the buffer items to the default advance width for those glyphs. We're going to need to interrogate the font for this information, and write back to the buffer. Harfbuzz provides us with opaque pointers to the memory for the font and buffer, but we can turn those into useful Rust structures using the `harfbuzz-wasm` crate again:
495
496```rust
497use wasm_bindgen::prelude::*;
498use harfbuzz_wasm::{Font, GlyphBuffer};
499
500#[wasm_bindgen]
Simon Cozens0c905552023-02-26 13:55:17 +0000501pub fn shape(_shape_plan:u32, font_ref: u32, buf_ref: u32, _features: u32, _num_features: u32) -> i32 {
Simon Cozens840b5df2023-02-25 15:20:39 +0000502 let font = Font::from_ref(font_ref);
503 let mut buffer = GlyphBuffer::from_ref(buf_ref);
504 for mut item in buffer.glyphs.iter_mut() {
505 // Map character to glyph
Nikolaus Waxweilera32278a2023-07-23 22:19:51 +0100506 item.codepoint = font.get_glyph(item.codepoint, 0);
Simon Cozens840b5df2023-02-25 15:20:39 +0000507 // Set advance width
Nikolaus Waxweilera32278a2023-07-23 22:19:51 +0100508 item.x_advance = font.get_glyph_h_advance(item.codepoint);
Simon Cozens840b5df2023-02-25 15:20:39 +0000509 }
510 1
511}
512```
513
514The `GlyphBuffer`, unlike in Harfbuzz, combines positioning and information in a single structure, to save you having to zip and unzip all the time. It also takes care of marshalling the buffer back to Harfbuzz-land; when a GlyphBuffer is dropped, it writes its contents back through the reference into Harfbuzz's address space. (If you want a different representation of buffer items, you can have one: `GlyphBuffer` is implemented as a `Buffer<Glyph>`, and if you make your own struct which implements the `BufferItem` trait, you can make a buffer out of that instead.)
515
516One easy way to write your own shapers is to make use of OpenType shaping for the majority of your shaping work, and then make changes to the pre-shaped buffer afterwards. You can do this using the `Font.shape_with` method. Run this on a buffer reference, and then construct your `GlyphBuffer` object afterwards:
517
518```rust
519use harfbuzz_wasm::{Font, GlyphBuffer};
520use tiny_rng::{Rand, Rng};
521use wasm_bindgen::prelude::*;
522
523#[wasm_bindgen]
Simon Cozens0c905552023-02-26 13:55:17 +0000524pub fn shape(_shape_plan:u32, font_ref: u32, buf_ref: u32, _features: u32, _num_features: u32) -> i32 {
Simon Cozens840b5df2023-02-25 15:20:39 +0000525 let mut rng = Rng::from_seed(123456);
526
527 // Use the default OpenType shaper
528 let font = Font::from_ref(font_ref);
529 font.shape_with(buf_ref, "ot");
530
531 // Now we have a buffer with glyph ids, advance widths etc.
532 // already filled in.
533 let mut buffer = GlyphBuffer::from_ref(buf_ref);
534 for mut item in buffer.glyphs.iter_mut() {
535 // Randomize it!
536 item.x_offset = ((rng.rand_u32() as i32) >> 24) - 120;
537 item.y_offset = ((rng.rand_u32() as i32) >> 24) - 120;
538 }
539
540 1
541}
542```
543
544See the documentation for the `harfbuzz-wasm` crate for all the other