Localize.Collation.Han (Localize v0.14.0)

Copy Markdown View Source

Han character ordering using radical-stroke indexes.

Implements the sorting algorithm from UAX #38, computing 64-bit collation keys based on radical number, residual stroke count, simplification level, Unicode block, and code point value.

The radical data is pre-parsed from FractionalUCA.txt's [radical N=...] entries during the build pipeline and shipped in priv/localize/collation_table.etf. At runtime it lives in :persistent_term and is loaded alongside the main collation table by Localize.Collation.Table.

Summary

Functions

Get the CJK block index for a codepoint.

Compute collation elements for a Han character using radical-stroke ordering.

Ensure the Han radical data is available.

Convert a 64-bit radical-stroke key to two collation elements.

Parse a radical definition line from FractionalUCA.txt.

Functions

block_index(cp)

@spec block_index(non_neg_integer()) :: non_neg_integer()

Get the CJK block index for a codepoint.

Arguments

  • cp — an integer codepoint.

Returns

An integer block index.

Examples

iex> Localize.Collation.Han.block_index(0x4E00)
0

iex> Localize.Collation.Han.block_index(0x3400)
1

collation_elements(codepoint)

@spec collation_elements(non_neg_integer()) :: [Localize.Collation.Element.t()] | nil

Compute collation elements for a Han character using radical-stroke ordering.

Arguments

  • codepoint — an integer codepoint for a CJK Unified Ideograph.

Returns

  • [element, element] — two CEs encoding the radical-stroke key.

  • nil — if the character has no radical data.

compute_key(radical, residual_strokes, simplification, block, codepoint)

Compute the 64-bit sorting key per UAX #38.

Arguments

  • radical — the Kangxi radical number (1-214).

  • residual_strokes — the residual stroke count after removing the radical.

  • simplification — the simplification level (0 for traditional).

  • block — the CJK block index.

  • codepoint — the Unicode codepoint.

Returns

A 64-bit integer encoding all components of the radical-stroke sort key.

Examples

iex> Localize.Collation.Han.compute_key(1, 0, 0, 0, 0x4E00)
17592186064384

ensure_loaded()

@spec ensure_loaded() :: :ok

Ensure the Han radical data is available.

The data is loaded as a side-effect of Localize.Collation.Table.ensure_loaded/0 because both share the same pre-generated ETF.

Returns

  • :ok — the radical data is loaded and ready.

key_to_elements(key)

@spec key_to_elements(non_neg_integer()) :: [Localize.Collation.Element.t()]

Convert a 64-bit radical-stroke key to two collation elements.

Arguments

Returns

A list of two element tuples.

parse_radical_line(line)

@spec parse_radical_line(String.t()) ::
  {:ok, pos_integer(),
   [{non_neg_integer(), non_neg_integer(), non_neg_integer()}]}
  | :skip

Parse a radical definition line from FractionalUCA.txt.

This function is used by the build pipeline (data/collation.ex) to extract radical data from the CLDR source file. It is a pure parser and does no I/O.

Arguments

  • line — a trimmed line from FractionalUCA.txt.

Returns

  • {:ok, radical_num, members} — the radical number and member list.

  • :skip — the line is not a radical definition.