Han character ordering using radical-stroke indexes.
Implements the sorting algorithm from UAX #38, computing 64-bit collation keys based on radical number, residual stroke count, simplification level, Unicode block, and code point value.
The radical data is pre-parsed from FractionalUCA.txt's
[radical N=...] entries during the build pipeline and shipped
in priv/localize/collation_table.etf. At runtime it lives in
:persistent_term and is loaded alongside the main collation
table by Localize.Collation.Table.
Summary
Functions
Get the CJK block index for a codepoint.
Compute collation elements for a Han character using radical-stroke ordering.
Compute the 64-bit sorting key per UAX #38.
Ensure the Han radical data is available.
Convert a 64-bit radical-stroke key to two collation elements.
Parse a radical definition line from FractionalUCA.txt.
Functions
@spec block_index(non_neg_integer()) :: non_neg_integer()
Get the CJK block index for a codepoint.
Arguments
cp— an integer codepoint.
Returns
An integer block index.
Examples
iex> Localize.Collation.Han.block_index(0x4E00)
0
iex> Localize.Collation.Han.block_index(0x3400)
1
@spec collation_elements(non_neg_integer()) :: [Localize.Collation.Element.t()] | nil
Compute collation elements for a Han character using radical-stroke ordering.
Arguments
codepoint— an integer codepoint for a CJK Unified Ideograph.
Returns
[element, element]— two CEs encoding the radical-stroke key.nil— if the character has no radical data.
@spec compute_key( non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer() ) :: non_neg_integer()
Compute the 64-bit sorting key per UAX #38.
Arguments
radical— the Kangxi radical number (1-214).residual_strokes— the residual stroke count after removing the radical.simplification— the simplification level (0 for traditional).block— the CJK block index.codepoint— the Unicode codepoint.
Returns
A 64-bit integer encoding all components of the radical-stroke sort key.
Examples
iex> Localize.Collation.Han.compute_key(1, 0, 0, 0, 0x4E00)
17592186064384
@spec ensure_loaded() :: :ok
Ensure the Han radical data is available.
The data is loaded as a side-effect of Localize.Collation.Table.ensure_loaded/0
because both share the same pre-generated ETF.
Returns
:ok— the radical data is loaded and ready.
@spec key_to_elements(non_neg_integer()) :: [Localize.Collation.Element.t()]
Convert a 64-bit radical-stroke key to two collation elements.
Arguments
key— a 64-bit integer radical-stroke key fromcompute_key/5.
Returns
A list of two element tuples.
@spec parse_radical_line(String.t()) :: {:ok, pos_integer(), [{non_neg_integer(), non_neg_integer(), non_neg_integer()}]} | :skip
Parse a radical definition line from FractionalUCA.txt.
This function is used by the build pipeline (data/collation.ex)
to extract radical data from the CLDR source file. It is a pure
parser and does no I/O.
Arguments
line— a trimmed line from FractionalUCA.txt.
Returns
{:ok, radical_num, members}— the radical number and member list.:skip— the line is not a radical definition.