Andrew Hsieh | 9a7616f | 2013-05-21 20:32:42 +0800 | [diff] [blame] | 1 | |
| 2 | :mod:`stringprep` --- Internet String Preparation |
| 3 | ================================================= |
| 4 | |
| 5 | .. module:: stringprep |
| 6 | :synopsis: String preparation, as per RFC 3453 |
| 7 | :deprecated: |
| 8 | .. moduleauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 9 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 10 | |
| 11 | |
| 12 | .. versionadded:: 2.3 |
| 13 | |
| 14 | When identifying things (such as host names) in the internet, it is often |
| 15 | necessary to compare such identifications for "equality". Exactly how this |
| 16 | comparison is executed may depend on the application domain, e.g. whether it |
| 17 | should be case-insensitive or not. It may be also necessary to restrict the |
| 18 | possible identifications, to allow only identifications consisting of |
| 19 | "printable" characters. |
| 20 | |
| 21 | :rfc:`3454` defines a procedure for "preparing" Unicode strings in internet |
| 22 | protocols. Before passing strings onto the wire, they are processed with the |
| 23 | preparation procedure, after which they have a certain normalized form. The RFC |
| 24 | defines a set of tables, which can be combined into profiles. Each profile must |
| 25 | define which tables it uses, and what other optional parts of the ``stringprep`` |
| 26 | procedure are part of the profile. One example of a ``stringprep`` profile is |
| 27 | ``nameprep``, which is used for internationalized domain names. |
| 28 | |
| 29 | The module :mod:`stringprep` only exposes the tables from RFC 3454. As these |
| 30 | tables would be very large to represent them as dictionaries or lists, the |
| 31 | module uses the Unicode character database internally. The module source code |
| 32 | itself was generated using the ``mkstringprep.py`` utility. |
| 33 | |
| 34 | As a result, these tables are exposed as functions, not as data structures. |
| 35 | There are two kinds of tables in the RFC: sets and mappings. For a set, |
| 36 | :mod:`stringprep` provides the "characteristic function", i.e. a function that |
| 37 | returns true if the parameter is part of the set. For mappings, it provides the |
| 38 | mapping function: given the key, it returns the associated value. Below is a |
| 39 | list of all functions available in the module. |
| 40 | |
| 41 | |
| 42 | .. function:: in_table_a1(code) |
| 43 | |
| 44 | Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2). |
| 45 | |
| 46 | |
| 47 | .. function:: in_table_b1(code) |
| 48 | |
| 49 | Determine whether *code* is in tableB.1 (Commonly mapped to nothing). |
| 50 | |
| 51 | |
| 52 | .. function:: map_table_b2(code) |
| 53 | |
| 54 | Return the mapped value for *code* according to tableB.2 (Mapping for |
| 55 | case-folding used with NFKC). |
| 56 | |
| 57 | |
| 58 | .. function:: map_table_b3(code) |
| 59 | |
| 60 | Return the mapped value for *code* according to tableB.3 (Mapping for |
| 61 | case-folding used with no normalization). |
| 62 | |
| 63 | |
| 64 | .. function:: in_table_c11(code) |
| 65 | |
| 66 | Determine whether *code* is in tableC.1.1 (ASCII space characters). |
| 67 | |
| 68 | |
| 69 | .. function:: in_table_c12(code) |
| 70 | |
| 71 | Determine whether *code* is in tableC.1.2 (Non-ASCII space characters). |
| 72 | |
| 73 | |
| 74 | .. function:: in_table_c11_c12(code) |
| 75 | |
| 76 | Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and |
| 77 | C.1.2). |
| 78 | |
| 79 | |
| 80 | .. function:: in_table_c21(code) |
| 81 | |
| 82 | Determine whether *code* is in tableC.2.1 (ASCII control characters). |
| 83 | |
| 84 | |
| 85 | .. function:: in_table_c22(code) |
| 86 | |
| 87 | Determine whether *code* is in tableC.2.2 (Non-ASCII control characters). |
| 88 | |
| 89 | |
| 90 | .. function:: in_table_c21_c22(code) |
| 91 | |
| 92 | Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and |
| 93 | C.2.2). |
| 94 | |
| 95 | |
| 96 | .. function:: in_table_c3(code) |
| 97 | |
| 98 | Determine whether *code* is in tableC.3 (Private use). |
| 99 | |
| 100 | |
| 101 | .. function:: in_table_c4(code) |
| 102 | |
| 103 | Determine whether *code* is in tableC.4 (Non-character code points). |
| 104 | |
| 105 | |
| 106 | .. function:: in_table_c5(code) |
| 107 | |
| 108 | Determine whether *code* is in tableC.5 (Surrogate codes). |
| 109 | |
| 110 | |
| 111 | .. function:: in_table_c6(code) |
| 112 | |
| 113 | Determine whether *code* is in tableC.6 (Inappropriate for plain text). |
| 114 | |
| 115 | |
| 116 | .. function:: in_table_c7(code) |
| 117 | |
| 118 | Determine whether *code* is in tableC.7 (Inappropriate for canonical |
| 119 | representation). |
| 120 | |
| 121 | |
| 122 | .. function:: in_table_c8(code) |
| 123 | |
| 124 | Determine whether *code* is in tableC.8 (Change display properties or are |
| 125 | deprecated). |
| 126 | |
| 127 | |
| 128 | .. function:: in_table_c9(code) |
| 129 | |
| 130 | Determine whether *code* is in tableC.9 (Tagging characters). |
| 131 | |
| 132 | |
| 133 | .. function:: in_table_d1(code) |
| 134 | |
| 135 | Determine whether *code* is in tableD.1 (Characters with bidirectional property |
| 136 | "R" or "AL"). |
| 137 | |
| 138 | |
| 139 | .. function:: in_table_d2(code) |
| 140 | |
| 141 | Determine whether *code* is in tableD.2 (Characters with bidirectional property |
| 142 | "L"). |
| 143 | |