Localize.Collation.Table (Localize v0.14.0)

Copy Markdown View Source

Persistent-term-backed collation element table.

Loads the pre-generated collation table from priv/localize/collation_table.etf for fast concurrent lookups using :persistent_term, which provides zero-copy reads for data that is written once and never modified.

The ETF file is generated from FractionalUCA.txt during the build pipeline by Localize.Data.Collation.generate_collation_table/0.

Handles both single codepoint mappings and contractions (multi-codepoint sequences).

Summary

Functions

Returns a specification to start this module under a supervisor.

Check if a codepoint begins any multi-codepoint contraction.

Ensure the collation table is loaded.

Find the longest matching entry for the given codepoint sequence.

Find the longest matching entry, checking a tailoring overlay first.

Look up collation elements for a codepoint or codepoint sequence.

Look up collation elements with a tailoring overlay checked first.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

contraction_starters(codepoint)

@spec contraction_starters(non_neg_integer()) :: [pos_integer()]

Check if a codepoint begins any multi-codepoint contraction.

Arguments

  • codepoint - an integer codepoint to check.

Returns

A list of contraction lengths that start with this codepoint, or [] if this codepoint does not begin any contractions.

ensure_loaded()

@spec ensure_loaded() :: :ok

Ensure the collation table is loaded.

Loads the pre-generated collation table ETF on first call. Subsequent calls are no-ops.

Returns

  • :ok - the table is loaded and ready for lookups.

Examples

iex> Localize.Collation.Table.ensure_loaded()
:ok

longest_match(codepoints)

@spec longest_match([non_neg_integer()]) ::
  {[non_neg_integer()], [Localize.Collation.Element.t()], [non_neg_integer()]}
  | {:unmapped, non_neg_integer(), [non_neg_integer()]}
  | :done

Find the longest matching entry for the given codepoint sequence.

Tries contractions from longest to shortest, falling back to a single codepoint lookup.

Arguments

  • codepoints - a list of integer codepoints to match against.

Returns

  • {matched_cps, elements, remaining_cps} - a successful match.

  • {:unmapped, codepoint, remaining_cps} - the first codepoint has no table entry.

  • :done - the input list is empty.

longest_match_with_overlay(codepoints, overlay)

@spec longest_match_with_overlay([non_neg_integer()], map() | nil) ::
  {[non_neg_integer()], [Localize.Collation.Element.t()], [non_neg_integer()]}
  | {:unmapped, non_neg_integer(), [non_neg_integer()]}
  | :done

Find the longest matching entry, checking a tailoring overlay first.

Arguments

  • codepoints - a list of integer codepoints to match.

  • overlay - a tailoring overlay map, or nil for root-only lookups.

Returns

Same as longest_match/1.

lookup(codepoint)

@spec lookup(non_neg_integer() | [non_neg_integer()]) ::
  {:ok, [Localize.Collation.Element.t()]} | :unmapped

Look up collation elements for a codepoint or codepoint sequence.

Arguments

  • codepoint - a single integer codepoint, or a list of integer codepoints (contraction).

Returns

  • {:ok, [element]} - the collation elements for the entry.

  • :unmapped - no entry found in the table.

Examples

iex> Localize.Collation.Table.ensure_loaded()
iex> {:ok, elements} = Localize.Collation.Table.lookup(0x0041)
iex> Localize.Collation.Element.primary(hd(elements)) > 0
true

iex> Localize.Collation.Table.ensure_loaded()
iex> Localize.Collation.Table.lookup(0x10FFFF)
:unmapped

lookup_with_overlay(codepoint, overlay)

@spec lookup_with_overlay(non_neg_integer() | [non_neg_integer()], map() | nil) ::
  {:ok, [Localize.Collation.Element.t()]} | :unmapped

Look up collation elements with a tailoring overlay checked first.

Arguments

  • codepoints - a single integer codepoint, or a list of integer codepoints.

  • overlay - a map of tailoring entries, or nil for root-only lookups.

Returns

Same as lookup/1, but checks the overlay map before falling back to the root table.

start_link(options \\ [])