This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text.
The XML file remains the sole normative specification of the LGR.
- Description
- Repertoire
- Variant Sets
-
Classes, Rules and Actions
- Character Classes
- Whole label evaluation and context rules
- Actions
- Table of References
Description
Number of elements in repertoire |
78 |
Number of ranges in repertoire |
0 |
Number of code point sequences |
2 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the
Unicode character database. Where the comment in the original LGR is equal to the character name, it has been suppressed.
For any code point or sequence for which a variant is defined, the link to the associated variant set, or if mapped to itself, the
variant type of that mapping is provided in the Variants column.
# |
Code Point |
Glyph |
Script |
Name |
Tags |
Required Context |
Variants |
Comment |
References |
1 |
U+0061 |
a |
Latin |
LATIN SMALL LETTER A |
sc:Latn |
|
set 1 |
Not part of repertoire |
[0] |
2 |
U+0069 |
i |
Latin |
LATIN SMALL LETTER I |
sc:Latn |
|
set 2 |
Not part of repertoire |
[0] |
3 |
U+006E |
n |
Latin |
LATIN SMALL LETTER N |
sc:Latn |
|
set 3 |
Not part of repertoire |
[0] |
4 |
U+006F |
o |
Latin |
LATIN SMALL LETTER O |
sc:Latn |
|
set 4 |
Not part of repertoire |
[0] |
5 |
U+0070 |
p |
Latin |
LATIN SMALL LETTER P |
sc:Latn |
|
set 5 |
Not part of repertoire |
[0] |
6 |
U+0073 U+0073 |
s s [ss] |
Latin |
LATIN SMALL LETTER S LATIN SMALL LETTER S |
|
|
set 6 |
Not part of repertoire |
|
7 |
U+0075 |
u |
Latin |
LATIN SMALL LETTER U |
sc:Latn |
|
set 7 |
Not part of repertoire |
[0] |
8 |
U+0076 |
v |
Latin |
LATIN SMALL LETTER V |
sc:Latn |
|
set 8 |
Not part of repertoire |
[0] |
9 |
U+0079 |
y |
Latin |
LATIN SMALL LETTER Y |
sc:Latn |
|
set 9 |
Not part of repertoire |
[0] |
10 |
U+00DF |
ß |
Latin |
LATIN SMALL LETTER SHARP S |
sc:Latn |
|
set 6 |
Not part of repertoire |
[0] |
11 |
U+00E1 |
á |
Latin |
LATIN SMALL LETTER A WITH ACUTE |
sc:Latn |
|
set 1 |
Not part of repertoire |
[0] |
12 |
U+00ED |
í |
Latin |
LATIN SMALL LETTER I WITH ACUTE |
sc:Latn |
|
set 2 |
Not part of repertoire |
[0] |
13 |
U+00EF |
ï |
Latin |
LATIN SMALL LETTER I WITH DIAERESIS |
sc:Latn |
|
set 2 |
Not part of repertoire |
[0] |
14 |
U+00F3 |
ó |
Latin |
LATIN SMALL LETTER O WITH ACUTE |
sc:Latn |
|
set 4 |
Not part of repertoire |
[0] |
15 |
U+00FA |
ú |
Latin |
LATIN SMALL LETTER U WITH ACUTE |
sc:Latn |
|
set 7 |
Not part of repertoire |
[0] |
16 |
U+00FC |
ü |
Latin |
LATIN SMALL LETTER U WITH DIAERESIS |
sc:Latn |
|
set 7 |
Not part of repertoire |
[0] |
17 |
U+0131 |
ı |
Latin |
LATIN SMALL LETTER DOTLESS I |
sc:Latn |
|
set 2 |
Not part of repertoire |
[0] |
18 |
U+0144 |
ń |
Latin |
LATIN SMALL LETTER N WITH ACUTE |
sc:Latn |
|
set 3 |
Not part of repertoire |
[0] |
19 |
U+014B |
ŋ |
Latin |
LATIN SMALL LETTER ENG |
sc:Latn |
|
set 3 |
Not part of repertoire |
[0] |
20 |
U+01A1 |
ơ |
Latin |
LATIN SMALL LETTER O WITH HORN |
sc:Latn |
|
set 10 |
Not part of repertoire |
[0] |
21 |
U+025B |
ɛ |
Latin |
LATIN SMALL LETTER OPEN E |
sc:Latn |
|
set 11 |
Not part of repertoire |
[0] |
22 |
U+0263 |
ɣ |
Latin |
LATIN SMALL LETTER GAMMA |
sc:Latn |
|
set 9 |
Not part of repertoire |
[0] |
23 |
U+0269 |
ɩ |
Latin |
LATIN SMALL LETTER IOTA |
sc:Latn |
|
set 2 |
Not part of repertoire |
[0] |
24 |
U+028B |
ʋ |
Latin |
LATIN SMALL LETTER V WITH HOOK |
sc:Latn |
|
set 7 |
Not part of repertoire |
[0] |
25 |
U+0390 |
ΐ |
Greek |
GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS |
sc:Grek |
|
set 2 |
|
[0], [101] |
26 |
U+03AC |
ά |
Greek |
GREEK SMALL LETTER ALPHA WITH TONOS |
sc:Grek |
|
set 1 |
|
[0], [101] |
27 |
U+03AD |
έ |
Greek |
GREEK SMALL LETTER EPSILON WITH TONOS |
sc:Grek |
|
set 11 |
|
[0], [101] |
28 |
U+03AE |
ή |
Greek |
GREEK SMALL LETTER ETA WITH TONOS |
sc:Grek |
|
set 3 |
|
[0], [101] |
29 |
U+03AF |
ί |
Greek |
GREEK SMALL LETTER IOTA WITH TONOS |
sc:Grek |
|
set 2 |
|
[0], [101] |
30 |
U+03B0 |
ΰ |
Greek |
GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS |
sc:Grek |
|
set 7 |
|
[0], [101] |
31 |
U+03B1 |
α |
Greek |
GREEK SMALL LETTER ALPHA |
sc:Grek |
|
set 1 |
|
[0], [101] |
32 |
U+03B2 |
β |
Greek |
GREEK SMALL LETTER BETA |
sc:Grek |
|
set 6 |
|
[0], [101] |
33 |
U+03B3 |
γ |
Greek |
GREEK SMALL LETTER GAMMA |
sc:Grek |
|
set 9 |
|
[0], [101] |
34 |
U+03B4 |
δ |
Greek |
GREEK SMALL LETTER DELTA |
sc:Grek |
|
|
|
[0], [101] |
35 |
U+03B5 |
ε |
Greek |
GREEK SMALL LETTER EPSILON |
sc:Grek |
|
set 11 |
|
[0], [101] |
36 |
U+03B6 |
ζ |
Greek |
GREEK SMALL LETTER ZETA |
sc:Grek |
|
|
|
[0], [101] |
37 |
U+03B7 |
η |
Greek |
GREEK SMALL LETTER ETA |
sc:Grek |
|
set 3 |
|
[0], [101] |
38 |
U+03B8 |
θ |
Greek |
GREEK SMALL LETTER THETA |
sc:Grek |
|
|
|
[0], [101] |
39 |
U+03B9 |
ι |
Greek |
GREEK SMALL LETTER IOTA |
sc:Grek |
|
set 2 |
|
[0], [101] |
40 |
U+03BA |
κ |
Greek |
GREEK SMALL LETTER KAPPA |
sc:Grek |
|
set 12 |
|
[0], [101] |
41 |
U+03BB |
λ |
Greek |
GREEK SMALL LETTER LAMDA |
sc:Grek |
|
|
|
[0], [101] |
42 |
U+03BC |
μ |
Greek |
GREEK SMALL LETTER MU |
sc:Grek |
|
|
|
[0], [101] |
43 |
U+03BD |
ν |
Greek |
GREEK SMALL LETTER NU |
sc:Grek |
|
set 8 |
|
[0], [101] |
44 |
U+03BE |
ξ |
Greek |
GREEK SMALL LETTER XI |
sc:Grek |
|
|
|
[0], [101] |
45 |
U+03BF |
ο |
Greek |
GREEK SMALL LETTER OMICRON |
sc:Grek |
|
set 4 |
|
[0], [101] |
46 |
U+03C0 |
π |
Greek |
GREEK SMALL LETTER PI |
sc:Grek |
|
|
|
[0], [101] |
47 |
U+03C1 |
ρ |
Greek |
GREEK SMALL LETTER RHO |
sc:Grek |
|
set 5 |
|
[0], [101] |
48 |
U+03C2 |
ς |
Greek |
GREEK SMALL LETTER FINAL SIGMA |
sc:Grek |
|
set 10 |
|
[0], [101] |
49 |
U+03C3 |
σ |
Greek |
GREEK SMALL LETTER SIGMA |
sc:Grek |
|
set 10 |
|
[0], [101] |
50 |
U+03C4 |
τ |
Greek |
GREEK SMALL LETTER TAU |
sc:Grek |
|
set 13 |
|
[0], [101] |
51 |
U+03C5 |
υ |
Greek |
GREEK SMALL LETTER UPSILON |
sc:Grek |
|
set 7 |
|
[0], [101] |
52 |
U+03C6 |
φ |
Greek |
GREEK SMALL LETTER PHI |
sc:Grek |
|
set 14 |
|
[0], [101] |
53 |
U+03C7 |
χ |
Greek |
GREEK SMALL LETTER CHI |
sc:Grek |
|
|
|
[0], [101] |
54 |
U+03C8 |
ψ |
Greek |
GREEK SMALL LETTER PSI |
sc:Grek |
|
|
|
[0], [101] |
55 |
U+03C9 |
ω |
Greek |
GREEK SMALL LETTER OMEGA |
sc:Grek |
|
set 15 |
|
[0], [101] |
56 |
U+03CA |
ϊ |
Greek |
GREEK SMALL LETTER IOTA WITH DIALYTIKA |
sc:Grek |
|
set 2 |
|
[0], [101] |
57 |
U+03CB |
ϋ |
Greek |
GREEK SMALL LETTER UPSILON WITH DIALYTIKA |
sc:Grek |
|
set 7 |
|
[0], [101] |
58 |
U+03CC |
ό |
Greek |
GREEK SMALL LETTER OMICRON WITH TONOS |
sc:Grek |
|
set 4 |
|
[0], [101] |
59 |
U+03CD |
ύ |
Greek |
GREEK SMALL LETTER UPSILON WITH TONOS |
sc:Grek |
|
set 7 |
|
[0], [101] |
60 |
U+03CE |
ώ |
Greek |
GREEK SMALL LETTER OMEGA WITH TONOS |
sc:Grek |
|
set 15 |
|
[0], [101] |
61 |
U+0430 |
а |
Cyrillic |
CYRILLIC SMALL LETTER A |
sc:Cyrl |
|
set 1 |
Not part of repertoire |
[0] |
62 |
U+043A |
к |
Cyrillic |
CYRILLIC SMALL LETTER KA |
sc:Cyrl |
|
set 12 |
Not part of repertoire |
[0] |
63 |
U+043E |
о |
Cyrillic |
CYRILLIC SMALL LETTER O |
sc:Cyrl |
|
set 4 |
Not part of repertoire |
[0] |
64 |
U+0440 |
р |
Cyrillic |
CYRILLIC SMALL LETTER ER |
sc:Cyrl |
|
set 5 |
Not part of repertoire |
[0] |
65 |
U+0442 |
т |
Cyrillic |
CYRILLIC SMALL LETTER TE |
sc:Cyrl |
|
set 13 |
Not part of repertoire |
[0] |
66 |
U+0443 |
у |
Cyrillic |
CYRILLIC SMALL LETTER U |
sc:Cyrl |
|
set 9 |
Not part of repertoire |
[0] |
67 |
U+0444 |
ф |
Cyrillic |
CYRILLIC SMALL LETTER EF |
sc:Cyrl |
|
set 14 |
Not part of repertoire |
[0] |
68 |
U+0455 U+0455 |
ѕ ѕ [ѕѕ] |
Cyrillic |
CYRILLIC SMALL LETTER DZE CYRILLIC SMALL LETTER DZE |
|
|
set 6 |
Not part of repertoire |
|
69 |
U+0456 |
і |
Cyrillic |
CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I |
sc:Cyrl |
|
set 2 |
Not part of repertoire |
[0] |
70 |
U+0457 |
ї |
Cyrillic |
CYRILLIC SMALL LETTER YI |
sc:Cyrl |
|
set 2 |
Not part of repertoire |
[0] |
71 |
U+04AF |
ү |
Cyrillic |
CYRILLIC SMALL LETTER STRAIGHT U |
sc:Cyrl |
|
set 9 |
Not part of repertoire |
[0] |
72 |
U+0572 |
ղ |
Armenian |
ARMENIAN SMALL LETTER GHAD |
sc:Armn |
|
set 3 |
Not part of repertoire |
[0] |
73 |
U+0578 |
ո |
Armenian |
ARMENIAN SMALL LETTER VO |
sc:Armn |
|
set 3 |
Not part of repertoire |
[0] |
74 |
U+057D |
ս |
Armenian |
ARMENIAN SMALL LETTER SEH |
sc:Armn |
|
set 7 |
Not part of repertoire |
[0] |
75 |
U+0582 |
ւ |
Armenian |
ARMENIAN SMALL LETTER YIWN |
sc:Armn |
|
set 2 |
Not part of repertoire |
[0] |
76 |
U+0585 |
օ |
Armenian |
ARMENIAN SMALL LETTER OH |
sc:Armn |
|
set 4 |
Not part of repertoire |
[0] |
77 |
U+1E45 |
ṅ |
Latin |
LATIN SMALL LETTER N WITH DOT ABOVE |
sc:Latn |
|
set 3 |
Not part of repertoire |
[0] |
78 |
U+1EC9 |
ỉ |
Latin |
LATIN SMALL LETTER I WITH HOOK ABOVE |
sc:Latn |
|
set 2 |
Not part of repertoire |
[0] |
Legend
- Code Point
- A code point or code point sequence.
- Name
- Shows the character or sequence name from the Unicode Character Database.
- Glyph
- The shape displayed depends on the fonts available to your browser.
- Script
- Shows the script property value from the Unicode Character Database. Combining marks may have the value Inherited and code points used with more than one script may have the value Common.
- References
- Links to the references associated with the code point or sequence, if any.
- Tags
- LGR-defined tag values. Any tags matching the Unicode script property are suppressed in this view.
- Required Context
- Link to the rule defining the required context a code point or sequence must satisfy. If prefixed by "not:", identifies a context that must not occur.
- Variants
- A link to the variant set the code point or sequence is a member of, except where a coded point or sequence maps only to itself, in which case the type of that mapping is listed.
- Comment
- If the comment in this row consists only of the code point or sequence name it is suppressed in this view.
Number of variant sets |
15 |
Largest variant set |
13 |
Ordinary Variants by Type |
out-of-repertoire-var (42) blocked (382) r-diac (11) base (11) r-final (1) nonfinal (1)
|
The following tables list all variant sets defined in this LGR, except for singleton sets. Each table lists all variant mapping pairs of the set; one per row. Mappings are assumed to be symmetric: each row documents both forward (→) and reverse (←) mapping directions. In each table, the mappings are sorted by Source value in ascending code point order; shading is used to group mappings from the same source code point or sequence.
Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mappings, as well as comments and references are listed above one another.
A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.
In any LGR with variant specifications that are well behaved, all members within each variant set are defined as variants of each other; the mappings in each set are symmetric and transitive; and all variant sets are disjoint.
Common Legend
- Source
- Source of the mapping pair.
- Target
- Destination of the mapping pair.
- Glyph
- The shape displayed for source or target depends on the fonts available to your browser.
- → - forward
- Indicates that variant Type, References and Comment apply to the mapping from source to target.
- ← - reverse
- Indicates that variant Type, References and Comment apply to the reverse mapping from target to source.
- ↔ - both
- Indicates that variant Type, References and Comment apply to both forward and reverse mapping.
- ≡ - reflexive
- Indicates that variant Type, References and Comment are for a reflexive mapping where source equals target.
- 🞩 - not in LGR
- Indicates that variant is not in LGR.
- Type
- The type of the variant mapping. There are some predefined variant types such as “allocatable” and “blocked”, while others are defined specifically for each LGR.
- References
- One or more reference IDs (optional). A "/" separates references for reverse / forward mappings, if different.
- Comment
- A descriptive comment (optional). A "/" separates comments for reverse / forward mappings, if different.
# |
Source |
Glyph |
Target |
Glyph |
|
Type(s) |
References |
Comment |
1 |
U+0076 |
v |
U+0076 |
v |
≡ |
out-of-repertoire-var |
|
Out-of-repertoire |
2 |
U+0076 |
v |
U+03BD |
ν |
↔ |
blocked |
|
|
# |
Source |
Glyph |
Target |
Glyph |
|
Type(s) |
References |
Comment |
1 |
U+03BA |
κ |
U+043A |
к |
↔ |
blocked |
|
|
2 |
U+043A |
к |
U+043A |
к |
≡ |
out-of-repertoire-var |
|
Out-of-repertoire |
# |
Source |
Glyph |
Target |
Glyph |
|
Type(s) |
References |
Comment |
1 |
U+03C4 |
τ |
U+0442 |
т |
↔ |
blocked |
|
|
2 |
U+0442 |
т |
U+0442 |
т |
≡ |
out-of-repertoire-var |
|
Out-of-repertoire |
# |
Source |
Glyph |
Target |
Glyph |
|
Type(s) |
References |
Comment |
1 |
U+03C6 |
φ |
U+0444 |
ф |
↔ |
blocked |
|
|
2 |
U+0444 |
ф |
U+0444 |
ф |
≡ |
out-of-repertoire-var |
|
Out-of-repertoire |
# |
Source |
Glyph |
Target |
Glyph |
|
Type(s) |
References |
Comment |
1 |
U+03C9 |
ω |
U+03CE |
ώ |
→ |
blocked |
|
|
← |
base |
|
|
2 |
U+03CE |
ώ |
U+03CE |
ώ |
≡ |
r-diac |
|
|
The following table lists all top-level classes with their definition and the regular expression defining their members.
Name |
Definition |
Count |
Members |
References |
Comment |
implicit |
Tag= sc:Armn |
5 |
{U+0572 U+0578 U+057D U+0582 U+0585} |
|
|
implicit |
Tag= sc:Cyrl |
10 |
{U+0430 U+043A U+043E U+0440 U+0442 U+0443 U+0444 U+0456 U+0457 U+04AF} |
|
|
implicit |
Tag= sc:Grek |
36 |
{U+0390 U+03AC U+03AD U+03AE U+03AF U+03B0 U+03B1 U+03B2 U+03B3 U+03B4 U+03B5 U+03B6 U+03B7 U+03B8 U+03B9 …} |
|
|
implicit |
Tag= sc:Latn |
25 |
{U+0061 U+0069 U+006E U+006F U+0070 U+0075 U+0076 U+0079 U+00DF U+00E1 U+00ED U+00EF U+00F3 U+00FA U+00FC …} |
|
|
Legend
- Members or Ranges
- Lists the members of the class as code points (xxx) or as ranges of code points (xxx-yyy). Any class too numerous to list in full is elided with "...".
- Tag=ttt
- An anonymous class implicitly defined based on tag value.
- [: :] - named character set
- Reference to a named character set [:name:].
- (∩,∪,\,△) - set operators
- Sets may be combined by set operators (∩ = intersection, ∪ = union, \ = difference, △ = symmetric difference).
The following table lists all the top-level, or named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).
Name |
Regular Expression |
Used as Trigger |
Used as Context |
Anchor |
References |
Comment |
leading-combining-mark |
(start) ([:class property:gc=Mn:]∪[:class property:gc=Mc:]) |
True |
False |
False |
|
Default WLE rule matching labels with leading combining marks ⍟ |
Legend
- Used as Trigger
- This rule triggers one of the actions listed below.
- Used as Context
- This rule defines a required context for a code point.
- Anchor
- This has a placeholder for the code point for which it is evaluated.
- Regular Expression
- A regular expression equivalent to the rule, shown in the standard notation with some extensions as noted:
- ⚓ - context anchor
- In a regex the ⚓ signifies a placeholder for the actual code point, when a context is evaluated. The code point must occur at the position corresponding to the anchor. Rules containing an anchor cannot be used as triggers.
- (...)← - look-behind
- If present encloses required context preceding the anchor.
- →(...) - look-ahead
- If present encloses required context following the anchor.
- (: :) - rule reference
- Non-recursive reference to a named rule.
- [: :] - character set either named, implicit or property
- Reference to a named character set [:name:], an implicit character set [:class tag=val:] or a given Unicode property [:class property:prop=val:]. A leading "^" before name or tag indicates the set complement.
- (|) - choice operator
- When there are various choices in a rule, choices are separated by the set operator (|) and each choice is represented by a set enclosed in parenthesis.
- (∩,∪,\,△) - set operators
- Sets may be combined by set operators (∩ = intersection, ∪ = union, \ = difference, △ = symmetric difference).
- Ø - empty set
- Indicated that the following set is empty because of the result of set operations or because non of its elements are part of the repertoire defined here.
- An empty set that is not optional means that a rule can never match.
- {m}, {m, n}, {m,} - count
- Indicates that the preceding element is evaluated from m to n times. Only {m} means the preceding element is evaluated exactly m times (equivalent to {m,m}), {m,} means the preceding element is evaluated at least m times.
- No count indicated the elements is evaluated once (equivalent to "{1}").
The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions.
The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
# |
Condition |
Rule / Variant Set |
|
Disposition |
References |
Comment |
1 |
if label match |
leading-combining-mark |
→ |
invalid |
|
labels with leading combining marks are invalid ⍟ |
2 |
if at least one variant is in |
{out-of-repertoire-var} |
→ |
invalid |
|
any variant label with a code point out of repertoire is invalid ⍟ |
3 |
if at least one variant is in |
{blocked} |
→ |
blocked |
|
any variant label containing blocked variants is blocked ⍟ |
4 |
if all variants are in |
{r-diac,r-final} |
→ |
valid |
|
any original label is valid |
5 |
if all variants are in |
{nonfinal,base} |
→ |
allocatable |
|
any label with all unaccented vowels and all standard sigmas is allocatable |
6 |
if all variants are in |
{r-final,base} |
→ |
allocatable |
|
any label with all unaccented vowels and sigmas as applied for is allocatable |
7 |
if all variants are in |
{r-diac,nonfinal} |
→ |
allocatable |
|
any label with all vowels as applied for and all standard sigmas is allocatable |
8 |
if at least one variant is in |
{nonfinal,base} |
→ |
blocked |
|
any variant label with a mix of vowel accents or sigma forms is blocked |
9 |
if all variants are in |
{allocatable} |
→ |
allocatable |
|
variant labels with all variants allocatable are allocatable ⍟ |
10 |
if any label (catch-all) |
|
→ |
valid |
|
catch all (default action) ⍟ |
Legend
- {...} - variant type set
- In the "Rule/Variant Set" column the notation {...} means a set of variant types.