No. | Date | Rel. Note | Data | Charts | Spec | Delta | GitHub Tag | Delta DTD | CLDR JSON |
---|---|---|---|---|---|---|---|---|---|
46 | 2024-10- | Charts46 | LDML46 | Δ46 | ΔDtd46 | 46.0.0-BETA2 |
This is a beta version of CLDR v46.
The data is available at release-46-beta2, and the specification is available at tr35/proposed.html. Feedback is welcome via tickets. (The CLDR site is undergoing a migration to Markdown, so the UI for navigation is temporary.)
Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
The most significant changes in this release were:
For more details, see below.
Count | Level | Usage | Examples |
---|---|---|---|
97 | Modern | Suitable for full UI internationalization | čeština, Ελληνικά, Беларуская, ᏣᎳᎩ, Ქართული, Հայերեն, עברית, اردو, አማርኛ, नेपाली, অসমীয়া, বাংলা, ਪੰਜਾਬੀ, ગુજરાતી, ଓଡ଼ିଆ, தமிழ், తెలుగు, ಕನ್ನಡ, മലയാളം, සිංහල, ไทย, ລາວ, မြန်မာ, ខ្មែរ, 한국어, 中文, 日本語, … |
16 | Moderate | Suitable for “document content” internationalization, eg. in spreadsheet | Akan, Balóchi [Látin], brezhoneg, Cebuano, føroyskt, IsiXhosa, Māori, sardu, veneto, Wolof, татар, тоҷикӣ, कांगड़ी, … |
55 | Basic | Suitable for locale selection, eg. choice of language on mobile phone | Basa Sunda, emakhuwa, Esperanto, eʋegbe, Frysk, Malti, босански (ћирилица), କୁୱି (ଅଡ଼ିଆ), కువి (తెలుగు), ᱥᱟᱱᱛᱟᱲᱤ, ᓀᐦᐃᓇᐍᐏᐣ, ꆈꌠꉙ, … |
± | New Level | Locales |
---|---|---|
📈 | Modern | Nigerian Pidgin, Tigrinya |
📈 | Moderate | Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof |
📈 | Basic | Ewe, Ga, Kinyarwanda, Konkani (Latin), Northern Sotho, Oromo, Sichuan Yi, Southern Sotho, Tswana |
📉 | Basic* | Chuvash, Anii |
* Note: Each release, the number of items needed for Modern and Moderate increases. So locales without active contributors may drop down in coverage level.
For a full listing, see Coverage Levels
The following are the most significant changes to the specification (LDML).
There are many more changes that are important to implementations, such as changes to certain identifier syntax and various algorithms. See the Modifications section of the specification for details.
alt='official'
to represent cases where an official value differs from the customary value. Currently added for a small number of language names, decimal separators, and grouping separators.For a full listing, see Delta DTDs.
ZWG
added — because it was late in the cycle, many locales will just support the code (no symbol or name).iso8601
. This is not the same as the ISO 8601 standard format, which is designed just for data interchange: it is all ASCII, doesn't have all the options for fields (like “Sunday”, “BC”, or “AM”), and does not contain spaces. The CLDR iso8601
calendar uses patterns in the order: era, year, month, day, day-of-week, hour, minute, second, day-period, timezoneCST6CDT → America/Chicago
, EST → America/Panama
, EST5EDT → America/New_York
, MST7MDT → America/Denver
, PST8PDT → America/Los_Angeles
.portion-per-1e9
(aka per-billion), night
(for hotel stays), light-speed
(as an internal prefix for light-second, light-minute, etc.)meter-per-second
. More preference changes are planned for the next release.acy_Grek → acy_Grek_CY
is unnecessary, because the mapping acy → acy_Latn_CY
is sufficient. For the reason why, see the algorithm in Likely Subtags.ZZ
, and 001
is limited to artifical languages such as Interlingua. The only other macroregion code is in und_419 → es_Latn_419
(Spanish‧Latin‧Latin America)desired="uk" → supported="ru"
(so that Ukrainian (uk
) doesn't fall back to Russian (ru
)).desired="scn" → supported="it"
(Sicilian → Italian).gom
) to Konkani (kok
).Han → Latn
, reflecting new data in Unicode 16.0kaa
) and Konkani (kok
); defaultContent mappings have bee added for Kazakh (kk
), Ladin (lld
), Latgalian (ltg
), Mócheno (mhn
), and Chinese (Latin, China) (zh_Latn_CN
).AE
en_HK
, en_MY
, en_IL
)For a full listing, see ¤¤BCP47 Delta and ¤¤Supplemental Delta
For a full listing, see Delta Data
The CLDR Technical Committee decided to continue the tech preview phase for Message Format in version 46. The plan is to have a final version of the specification in a 46.1 release before the end of 2024.
The most significant changes since v45 were:
Implementers should be aware of the following normative changes after the start of the tech review period.
name
and literal
values, including requiring keys to use NFCbad-option
error for bad digit size options in :number
and :integer
functions:number
and :integer
functions:date
and :datetime
date formatting from short
to medium
variable
:test:function
, :test:select
and :test:format
functions for implementation testingIn addition to the above, the test suite is significantly modified and updated. There will be updated tech preview implementations available in ICU (Java and C++) and in Javascript.
The usage model for emoji search keywords is that
In this release WhatsApp emoji search keyword data has been incorporated. In the process of doing that, the maximum number of search keywords per emoji has been increased, and the keywords have been simplified in most locales by breaking up multi-word keywords. An example would be white flag (🏳️), formerly having 3 keyword phrases of [white waving flag | white flag | waving flag], now being replaced by the simpler 3 single keywords [white | waving | flag]. The simpler version typically works as well or better in practice.
There are two significant changes to the CLDR root collation (CLDR default sort order).
The DUCET is the Unicode Collation Algorithm default sort order. The CLDR root collation is a tailoring of the DUCET. These sort orders have differed in the relative order of groups of characters including extenders, currency symbols, and non-decimal-digit numeric characters.
Starting with CLDR 46 and Unicode 16.0, the order of these groups is the same. In both sort orders, non-decimal-digit numeric characters now sort after decimal digits, and the CLDR root collation no longer tailors any currency symbols (making some of them sort like letter sequences, as in the DUCET).
These changes eliminate sort order differences among almost all regular characters between the CLDR root collation and the DUCET. See the CLDR root collation documentation for details.
CLDR includes data for sorting Han (CJK) characters in radical-stroke order. It used to distinguish traditional and simplified forms of radicals on a higher level than sorting by the number of residual strokes. Starting with CLDR 46, the CLDR radical-stroke order matches that of the Unicode Radical-Stroke Index (large PDF). Its sorting algorithm is defined in UAX #38. Traditional vs. simplified forms of radicals are distinguished on a lower level than the number of residual strokes. This also has an effect on alphabetic indexes for radical-stroke sort orders, where only the traditional forms of radicals are now available as index characters.
cldr-transforms
package. The JSON file contains transform metadata, and the _rulesFile
key indicates an external (.txt
) file containing the actual rules. [CLDR-16720][].The CLDR site is in the process of being moved to markdown source (GFM), which will regularize the formatting and make it easier to maintain and extend than with Google Sites. The URLs will remain the same. This process should be completed before release.
Most files added in this release were for new locales, with some files being added for existing locales that increased coverage, such as /testData/personNameTest/ak.txt.
The following new /common/testData/ files have been added:
A few /common/testData/ files have been replaced:
TBD
light-speed
data was withdrawn from many locales, because the purpose (as an internal prefix for light-second, light-minute, etc.) was misunderstood. Implementations may hold off supporting it until the data is complete — expected for CLDR v47.Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. We'd especially like to acknowledge the work done by interns this release: Chris Pyle, Helena Aytenfisu (ህሊና የሺጥላ አይተንፍሱ), and Emiyare Cyril Ikwut-Ukwa.
The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.