This document doesn't pretend to be the written Torah for a font designer. It is just a compilation of many notes, both paper and electronic, which used to clutter my table and computer, and which would inevitably be lost, unless I organized them as a single document and put on the Web. The primary purpose of this document is to serve me. But since Culmus is an open-source project, and the notion of "source" for artistic items is quite obscure, I declare hereby this document to be a part of source. As such, it immediately attains the right to be published here.
Basic Hebrew characters - see chart
The Unicode standard reserves a range of 112 characters in 0x0590-0x05FF. This range includes basic Hebrew letters with final forms, diacritics and cantillation marks (Tiberian system), special Hebrew punctuation (maqaf, sof-pasuk, geresh and gershayim) and Yiddish digraphs. Please note that complete Yiddish support also requires four vowels from Alphabetic Presentation Forms.
Hebrew ligatures and special forms - see chart
Another range of Unicode is called "Alphabetic Presentation Forms" and part of it (0xFB1D - 0xFB4F) is devoted to Hebrew ligatures and special forms. This includes all letters combined with dagesh, several Yiddish and Ladino ligatures, wide letters and special forms.
Notes: (1) Yiddish language requires the following vowels: 0xFB1D, 0xFB1F, 0xFB2E, 0xFB2F.
(2) The only Ladino ligature included in Unicode is aleph-lamed (0xFB4F). Unlike this one, other common ligatures can be produced using the basic forms.
(3) The Unicode standard doesn't define the purpose of alternative ayin (0xFB20). In contrary to the official chart, I create an outline for this glyph such that it doesn't descend below the baseline. This form of the letter ayin can be utilized when you need to position a diacritical mark below it.
Miscellaneous additions
A. A New Israeli Sheqel sign is defined at 0x20AA.
B. Microsoft specification defines five combinations (0xE801 - 0xE805) in the Private Use Area, which include vav with holam haser, final kaf with shva and qamats, and lamed with holam haser with or without dagesh. All these combinations can be produced using OpenType rendering engine and therefore are not necessary, but for convenience I include final kaf with shva and qamats (0xE802 - 0xE803).
C. Microsoft also recommends to include the following characters: LTR (0x200E), RTL (0x200F) and dotted circle (0x25CC). Their reason is described at Handling Invalid Combining Marks .
D. In the spirit of Microsoft recommendation, I also include the following characters: Zero Width Non-Joiner (0x200C) and Zero Width Joiner (0x200D). Their purpose is as follows: ZWJ may force ligation of aleph and lamed into aleph-lamed ligature, which would normally not occur. ZWNJ may prevent the same conversion when using "JUD " (Ladino) language tag which would otherwise force a ligation. Microsoft has opinion on this topic too: Suggested glyphs for complex scripts.
E. Always make sure that your font includes character "zero" (0x0030), because presence of this character declares the font as ASCII-enabled. If your font is not marked as ASCII-enabled, most software will utilize only its Hebrew part, and substitute punctuation and digits from some other font.
Biblical typesetting
The task of biblical typesetting, apart from the difficulty of proper diacritics positioning and a plenty of other problems, requires several very rare glyph forms. A very good explanation of some of them can be found at the site of Mordechai Pinchas Sofer, in the section "Scribal oddities"
Broken vav
Appears only once in the entire Bible (Numeri 25:12, in the word "shalom"). An explanation of its meaning can be found at Broken Vav.Reversed nun
Appears twice in the Bible (Numeri 10:35-36). An explanation can be found at Inverted Nunim of Sefer Binsoa. Note that the tradition doesn't define whether the letter should be reversed upside-down or right-to-left. In my edition of Bible, the nun is simply turned 180°, probably because this way the typesetter could utilize a common glyph and avoid producing a special one.
Bowed lamed
Some designers argue that bowed lamed was introduced in order to squeeze more lines into a piece of paper in the times when paper was rare and expensive. For this reason, nowadays, bowed lamed makes the text look old-fashioned and sometimes is considered a bad typographic style. If you still wish to use it, please download "Frank Curled Lamed" from the Developers' area.
I decided to introduce my own grouping of glyphs, to help user easily find out which features are supported by each font.
Basic set
- 22 Hebrew letters + 5 final forms (U+05D0 - U+05EA)
- Digits (U+0030 - U+0039)
I tend sometimes to design old style digits, which usually have descenders in 34579 and ascenders in 68. When the font has heavy horizontal elements (such as Frank-Ruehl and most classic ashkenazi-style fonts), designing a digit such as 5 can be challenging as its two horizontal strokes are too close to each other and produce excessively black glyph. In this case turning to the old style form can give better results.
- Punctuation
Basic: Exclamation mark (U+0021), double quote (U+0022), single quote (U+0027), comma (U+002C), hyphen (U+002D), period (U+002E), colon (U+003A), semicolon (U+003B), question mark (U+003F), ellipsis (U+2026).
Hebrew: Geresh (U+05F3), gershayim (U+05F4).
Dashes: En-dash (U+2013), em-dash (U+2014), direct speech dash (U+2015).
Quotes: Left quote (U+2018), right quote (U+2019), single base quote (U+201A), left double quote (U+201C), right double quote (U+201D), double base quote (U+201E).In older books, which are mostly typeset with Drugulin, sometimes with Frank-Ruehl, double quotes and geresh/gershayim are usually aligned with the mean line of the letters. I deilberately choose to raise quotes considerably, as I want them to be distinctive and highly visible, just like any other punctuation. Geresh and gershayim are also raised, but to smaller extent. In their case one of my concerns is that in standalone counting geresh can be confused with "yod".
- Parentheses
Ordinary parentheses (U+0028, U+0029), left and right brackets (U+005B, U+005D)
- Mathematical symbols
Number sign (U+0023), percent sign (U+0025), asterisk (U+002A), plus (U+002B), slash and backslash (U+002F, U+005C), less, equal and greater signs (U+003C - U+003E), minus (U+2212), alternative plus (U+FB29).
People commonly use hyphen instead of minus, but hyphen is not really a minus, and hyphen is also significantly shorter. The minus has the same width and leads as the plus.
- Currency symbols
New shequel (U+20AA), dollar sign (U+0024), euro (U+20AC), pound (U+00A3).
The new shequel is naturally a must, and dollar and euro are frequently used in Hebrew internet sites too. Regarding pound, somebody once asked me for it, and I think it's nice.
Extended set
- Diacritics (nikud, shin and sin dots, dagesh, rafe and varika)
- Precomposed forms with dagesh
- Forms of sin/shin with dot and with/without dagesh
- Yiddish and Ladino letters
- Microsoft precomposed forms (0xE801-0xE803)
- Misc symbols (NIS, zero-width spaces, dotted circle, alternative ayin and alternative plus)
Biblical forms
- Cantillation marks
- Masoretic letterforms
Obsolete glyphs
- Wide forms
Hebrew romanization
Glyph UTF-16 Encoding Components UTF-16 Encodings Reference ʾ U+02BE ISO 259 ḥ U+1E25 h + ◌̣ U+0068 U+0323 ISO 259, Open Siddur ṭ U+1E6D t + ◌̣ U+0074 U+0323 ISO 259, Open Siddur ʿ U+02BF ISO 259 ṣ U+1E63 s + ◌̣ U+0073 U+0323 ISO 259 s̀ s + ◌̀ U+0073 U+0300 ISO 259 ś U+015B s + ◌́ U+0073 U+0301 ISO 259 š U+0161 s + ◌̌ U+0073 U+030C ISO 259 å U+00E5 a + ◌̊ U+0061 U+030A ISO 259 ȩ U+0229 e + ◌̧ U+0065 U+0327 ISO 259 ŵ U+0175 w + ◌̂ U+0077 U+0302 ISO 259 ẇ U+1E87 w + ◌̇ U+0077 U+0307 ISO 259 ° U+00B0 ISO 259 ă U+0103 a + ◌̆ U+0061 U+0306 ISO 259 ŏ U+014F o + ◌̆ U+006F U+0306 ISO 259 ḝ U+1E1D e + ◌̧ + ◌̆ U+0065 U+0327 U+0306 ISO 259 ḃ U+1E03 b + ◌̇ U+0062 U+0307 ISO 259-2 ḣ U+1E23 h + ◌̇ U+0068 U+0307 ISO 259-2 k̇ k + ◌̇ U+006B U+0307 ISO 259-2 ṗ U+1E57 p + ◌̇ U+0070 U+0307 ISO 259-2 â U+00E2 a + ◌̂ U+0061 U+0302 Open Siddur ō U+014D o + ◌̄ U+006F U+0304 Open Siddur é U+00E9 e + ◌́ U+0065 U+0301 Open Siddur ə U+0259 Open Siddur ū U+016B u + ◌̄ U+0075 U+0304 Open Siddur ḇ b + ◌̱ U+0062 U+0331 Open Siddur ŋ U+014B Open Siddur ḳ U+1E33 k + ◌̣ U+006B U+0323 Open Siddur
I wrote a few scripts to automate Hebrew OpenType features, such as dagesh substitutions, diacritics placement etc. See details.
Some people say that Hebrew fonts don't need kerning. Indeed, non-kerned Hebrew fonts are quite bearable. But nevertheless, kerning would never hurt, even considering that software rarely supports it. I will not present the list of most recommended kerning pairs, these could significantly vary according to style. If you want an example, take a newspaper and look for the letters "יב" or "יג" in a headline set in Haim.
Now to the work...
First you will need a couple of pages of random garbage. I created a Perl script heblorem.pl, which outputs such garbage in iso-8859-8 encoding. The special feature of this script is that its output contains every possible combination of adjacent letters, so no pair will be possibly missed. Now load the garbage into some program which supports kerned fonts (or doesn't, but just for the first pass), print it with a good laser printer, and start sqeezing your eyes. The rest is obvious.