Discussion:
Using TrueType or CID fonts
(too old to reply)
John Schnell
2005-04-21 23:56:47 UTC
Permalink
I'm trying to figure out how to access Asian glyphs in TrueType and CID
fonts using postscript. I have not been able to find anything like a
tutorial on the subject. Does anyone know of any?

I found a snipit of ps code on a Japanese site, it's apparently
designed to print a character from the Windows MS-Mincho TrueType font,
here it is:

/MS-Mincho-H
/H /CMap findresource
[/MS-Mincho /CIDFont findresource]
composefont pop
/MS-Mincho-H findfont 100 scalefont setfont
72 570 moveto
<6000> show

The composefont operator defines a 'CID' font, which is then scaled
and set to be the current font. The show command is intended to print
the character with unicode value 0x6474. This code works but, of
course, prints the wrong glyph, I presume, because the CID value 6000
does not map to Unicode 0x6000.

Is this a reasonable approach and, if so how can I get the correct CID
value?

Also, I tried substituting '6000 glyphshow' for '<6000> show' (the red
book says that glyphshow will take a CID number if a CID font is
current) but both my printer and Ghostscript 8.51 choke on this.

Can anyone shed some light on any of this?

John Schnell
Ken Sharp
2005-04-22 07:44:47 UTC
Permalink
Post by John Schnell
I'm trying to figure out how to access Asian glyphs in TrueType and CID
fonts using postscript. I have not been able to find anything like a
tutorial on the subject. Does anyone know of any?
Not really, its not a terribly hard subject, *if* you are a native
speaker....

Its a bit like ASCII; as long as you are using a keyboard with the Latin
character set, which is programmed to emit ASCII values in response to
key presses, then it all 'works out'.
Post by John Schnell
I found a snipit of ps code on a Japanese site, it's apparently
designed to print a character from the Windows MS-Mincho TrueType font,
First, you can't use a TrueType font in a PostScript interpreter, you
can use a Type 42 font (which is a TT font bundled up in a different
way), and some PostScript interpreters will allow you to use a TT font
from disk, but there is no standard for this. You can also use a CIDFont
with TrueType outlines (I forget the font type), possibly this is what
you mean here ?
Post by John Schnell
The composefont operator defines a 'CID' font,
Not really. A CIDFont exists separately, its a collection of character
outlines listed by number. The CIDSystemInfo entry tells you what those
numbers 'mean'. The 'Registry' and 'Ordering' keys in the CIDSystemInfo
dictionary essentially tell you which glyph is described by each
character ID.

A CMap is effectively the same as the Encoding array in a regular
PostScript font. It maps a numeric value to a character ID, which allows
the interpreter to find the glyph.

Don't be fooled by the fact that Latin PostScript strings look like
ASCII, its not the 'A' that the interpreter uses to draw the glyph. The
0x41 ASCII value tells the interpreter to look in the 65th (0x41)
element of the Encoding array. There is finds a glyph name (probably
/A), which it uses to extract the font outline from the CharStrings
dictionary.

A CMap has a similar purpose for a CIDFont. You use composefont to
create a CID-keyed instance of a CIDFont.
Post by John Schnell
which is then scaled
and set to be the current font. The show command is intended to print
the character with unicode value 0x6474. This code works but, of
course, prints the wrong glyph, I presume, because the CID value 6000
does not map to Unicode 0x6000.
The character ID 0x6000 is used to find the glyph in the CMap, which
then gives the character ID in the CIDfont. Whether this is the same
glyph as Unicode code point 0x6476 depends on the mapping from the CMap.
It might be, or it might not.
Post by John Schnell
Is this a reasonable approach and, if so how can I get the correct CID
value?
The 'correct' CID depends on the CMap, not the CIDFont. By the way, /H
is a pretty unusual name for a CMap. Even the identity CMaps are usually
described as /Identity-H and /Identity-V. More commonly you will find a
CMap which maps a standard language encoding, for example 83pv-RKSJ-H.

If you want to use Unicode, then you need a Unicode CMap, that is a CMap
which will convert the Unicode code points into Character IDs. Because
Unicode covers numerous languages, and a CIDFont only usually contains
glyphs for a single language, you need a CMap which is specific for
Unicode conversion to that language, it will only map a range of
possible Unicode code points.

Normally this is acheived by having the CIDSystemInfo of the CMap the
same as the CIDSystemInfo of the CIDFont. For example, if both the font
and the CMap have a Registry of 'Adobe' and an Ordering of 'Japan1' then
they will be compatible, you can use the CMap to convert from some
(unspecified) ordering, and get the 'correct' glyphs rendered.
Post by John Schnell
Also, I tried substituting '6000 glyphshow' for '<6000> show' (the red
book says that glyphshow will take a CID number if a CID font is
current) but both my printer and Ghostscript 8.51 choke on this.
What do you mean by 'choke' ? Do you get an error (if so, what) ? How
are you supplying the CIDfont ? What CMap did you use ?


Ken
Russell Lang
2005-04-22 10:31:33 UTC
Permalink
Post by John Schnell
I'm trying to figure out how to access Asian glyphs in TrueType and CID
fonts using postscript. I have not been able to find anything like a
tutorial on the subject. Does anyone know of any?
If you are using ghostscript 8.x, then you can make it substitute a TrueType
font for a CID font. This is controlled by lib/cidfmap.
If you have the Microsoft East Asian language support installed on Windows,
(Control Panel, Regional and Language Options, Languages, Install Files
for East Asian Languages) then when installing ghostscript 8.51 select
"Use Windows TrueType fonts for Chinese, Japanese and Korean".
This will update lib/cidfmap to use the installed Windows fonts.
There are some caveats with doing this. The characters may be
a different size and some may be characters missing.

Once you have done this, have a look at the example files in GS 7.07
"examples/cjk". Some of these won't work because of missing CMap
files, but something like the following does work.
/Ryumin-Light-RKSJ-H findfont 25 scalefont setfont
(-ì.M Ghostscript) show
John Schnell
2005-04-24 00:48:21 UTC
Permalink
I'm using ghostscript 8.51, and my cidfmap file seems to be correctly
set up as you've described. The examples you sited are indeed helpful.
It works as advertised. Thanks.

I was laboring under that assumption that postscript was all text, but
these examples contain bits of binary data do they not? I presume I
need to set up my keyboard properly to enter asian characters before I
can actually enter a statement like the one above into ghostscript by
hand? Or does it not work that way? Are the asian characters in the
show command above CID or unicode ... or something else I missing?

If all I know is the unicode value and the font what do I do to get the
correct postscript code? Would using glyphshow help? The docs say
gylphshow can take a numeric (CID) operand if the current font is CID.
Does this work in ghostscript? Is there an example around? Why would
one prefer glyphshow over show (or vice versa)? I've seen examples like
'/uniXXXX glyphshow' (where XXXX is some unicode value), apparently
there are ps fonts out there where some of the high order glyphs are
named in this mannor. Do you know anything about that? There's so much
to learn it's hard to know what to focus on.
Thanks for any help you can provide.

John
Ken Sharp
2005-04-25 08:37:45 UTC
Permalink
Post by John Schnell
I was laboring under that assumption that postscript was all text, but
these examples contain bits of binary data do they not?
Yes, but the 'text' in PostScript isn't text. I know it *looks* like
text, because most programs use an Encoding which happens to map the
normal ASCII values in their usual positions. But it doesn't have to be
so. Somewhere else recently I tried to explain this to another poster.
Hmm, I see that was in comp.text.pdf, here's what I said there:

-----------------------------------------------------------------------
You need to apply the Encoding in use with the font at the time. I know
it looks like the text in the strings is ASCII, but that's a false
impression. The outlines in a font are stored by *name*, usually names
like /A, /B, /C etc, but not neccessarily so, names like /G01 are not
uncommon.

The way the text string you see in the PDF file is mapped to the named
glyph in the font program is via the Encoding array. The numeric
character value is looked up in a 256 element array, which maps the
number to a name. For example, the Encoding array might contain /A at
index 0x41, /B at 0x42 etc.

This is quite common, and since these are ASCII values, it *looks* like
PDF text uses ASCII values.

However, because its not that simple, thsi will inevitably fail
eventually. A subset font, for instance, usually does not bother
Encoding its glyphs with ASCII values. A font with 8 glyphs may well
position them at Encoding positions 1 to 8. For example:

/Encoding:
Index Glyph Name
0 /A
1 /n
2 /space
3 /e
4 /x
5 /a
6 /m
7 /p
8 /l

Now the text string will look like this:

(/000/001/002/003/004/005/006/007/008/003)

but the text in Acrobat will read 'An example'.
--------------------------------------------------------

All the points above relate to PostScript as much as to PDF.

The other thing you need to bear in mind is that CJKV fonts have *far*
more than 256 glyphs in them. So you can't use a single byte (as above)
to access all the glyphs.

For old-style (type 0 'OCF') fonts, all glyphs are encoded using two
bytes. However, for CID-keyed fonts, the number of bytes required to
encode a font is variable (in fact its part of the information in the
CMap). I believe one of the Chinese foundries has a font which is
encoded using between 1 and 5 bytes.
Post by John Schnell
I presume I
need to set up my keyboard properly to enter asian characters before I
can actually enter a statement like the one above into ghostscript by
hand?
Not really, you need to enter the correct byte sequence. You may find it
easier to use a hex streing instead of binary of course.
Post by John Schnell
Or does it not work that way? Are the asian characters in the
show command above CID or unicode ... or something else I missing?
In the case above they are in 'shift JIS' which is what the RKSJ in the
CMap name means, and a horizontal writing mode, which is what the -H
means. Shift-JIS is a standard 'encoding' in Japanese, somewhat
(vaguely) akin to ASCII in latin fonts.

Don't forget that there are many possible encodings, and don't fall into
the trap of assuming that ASCII is the only latin representation, there
are a number of coding standards in 'English'. The OP in my reply on
comp.text.pdf was having problems with file made on a Mac, because it
used Mac Roman encoding, instead of WinAnsi.

If you go back further you might even recall encodings such as
EBCDIC....
Post by John Schnell
If all I know is the unicode value and the font what do I do to get the
correct postscript code?
You need to compose the CIDFont using a Unicode CMap, and then put the
Unicode bytes in the text string. That's the 'easy' way to do it.
Post by John Schnell
Would using glyphshow help?
Not really, then you would need to know what the Character IDs in the
CIDFont 'mean' and you would have to manually convert the Unicode code
point to a proper Character ID. This is what the CMap does for you
automatically. If you use a standard CMap its even better, because you
don't care if the Ordering is different in the font, the CMap will take
care of that, whereas if you do it yourself you need to handle the
Ordering directly.
Post by John Schnell
The docs say
gylphshow can take a numeric (CID) operand if the current font is CID.
Does this work in ghostscript?
I'm sure it does, but see my other reply about what is a CIDFont, and
what is a CID-keyed font.
Post by John Schnell
Is there an example around? Why would
one prefer glyphshow over show (or vice versa)? I've seen examples like
'/uniXXXX glyphshow' (where XXXX is some unicode value), apparently
there are ps fonts out there where some of the high order glyphs are
named in this mannor.
Thats a *name* ('/' introduces a name in PostScript), not a numeric,
which means its not a CIDFont. If you read the snippet from
comp.text.pdf above, you'll see that glyphs in regular fonts can be
named anything you like (glyphs in CIDFonts don't have names, they have
Character IDs).

What has happened here is that someone has embedded a font where the
glyph names are constructed from Unicode values. This is most definitely
not a standard. Probably you have an embedded TrueType font in the file,
and the application (or PostScript driver) has created the font for
download.

It doesn't help you, because glyphs in CIDFonts won't have names.



Ken

John Schnell
2005-04-24 00:27:36 UTC
Permalink
Clearly I'm not a 'native speaker', but I'm trying to learn. Thanks
very much for your time and patience.
Post by Ken Sharp
What do you mean by 'choke' ? Do you get an error (if so, what) ?
Yes, I get 'invalid type' (or something like that), as if glyphshow
doesn't like a number operand.
Post by Ken Sharp
How are you supplying the CIDfont ? What CMap did you use ?
As above with CMap H (which I got off the Adobe site) and using
composefont.

Hmmm...

There is a lot to digest here. I have the feeling from your comments
that I am not on the right track. Let me drop back a bit and ask a more
general question, closer to what I am really trying to do.

What I really want to do is generate an EPS file. For simplicity sake
let's say this EPS file's job is to print some given unicode character
from some (currently installed) font where it is defiend. Clearly the
correct postscript code to generate would depend several things -- the
type of the font for example. If we pick the simplist case, say a CID
postscript font (or an OpenType CID postscript font - I have obtained
the Korean font MunhwaHoonmin-Std-Regular from Adobe) How would I go
about diplaying the character. It sounds like its not much code, but
I'm not sure how to proceed. Can you guide me a bit?

John
Ken Sharp
2005-04-25 08:12:03 UTC
Permalink
Post by John Schnell
Post by Ken Sharp
What do you mean by 'choke' ? Do you get an error (if so, what) ?
Yes, I get 'invalid type' (or something like that), as if glyphshow
doesn't like a number operand.
typecheck would be the normal error. Of course, that could mean that you
are trying to apply a CIDFont operation (glyphshow with a number) to a
non-CIDFont...
Post by John Schnell
Post by Ken Sharp
How are you supplying the CIDfont ? What CMap did you use ?
As above with CMap H (which I got off the Adobe site) and using
composefont.
When it says a 'CIDFont', it really means a 'CIDFont', not a CID-keyed
instance of a font. If you use the font before its been composed with
CMap, I think you will find it works. After a CIDFont has been composed
with a CMap using composefont you *actually* create a type 0 font, not a
CIDFont.

glyphshow is a funny operator, it works with fonts in ways quite unlike
other operators. The normal operation (not CIDFont) is for it to extract
a glyph from a font by *name* (eg /space) instead of using a numeric
position in an Encoding (eg 0x20) to access the named glyph.

Similarly, with a CIDFont, it accesses the glyph using the Character ID,
instead of using a numeric index to the CMap to access the glyph by
Character ID. That is, the numeric parameter to glyphshow is a Character
ID, not an index to be passed through the CMap.
Post by John Schnell
What I really want to do is generate an EPS file. For simplicity sake
let's say this EPS file's job is to print some given unicode character
from some (currently installed) font where it is defiend.
OK, but is the font a CIDFont, a regular PostScript font (eg Type 0) or
a TrueType font ? Is the font installed on the printer, or will you be
downloading the font too ?
Post by John Schnell
Clearly the
correct postscript code to generate would depend several things -- the
type of the font for example.
Yes, see above. If the font is installed on the printer then you either
need to make assumptions about its type, or write some quite complex
PostScript program code to deal with the various possibilities.
Post by John Schnell
If we pick the simplist case, say a CID
postscript font (or an OpenType CID postscript font - I have obtained
the Korean font MunhwaHoonmin-Std-Regular from Adobe) How would I go
about diplaying the character. It sounds like its not much code, but
I'm not sure how to proceed. Can you guide me a bit?
Well, you need to know which character you want first. Usually you will
work with one of the standard Encodings for the language in question,
say Big Five for Chinese, os Simplified Traditional perhaps. I'm not
well up on what Encodings are used in Korean, but since this is the most
complex of the usual CJKV languages, you may like to start with
Japanese, or Chinese unless you have a good reason to work with Korean.

Anyway, once you know what 'encoding' your PostScript code is going to
use you then compose the font using the correct CMap, eg Big Five. The
CMap in this case maps the character codes in the Encoded text to a
Character ID in the font.

[aside]
You need to use the right CMap for the CIDFont of course, because the
actual Character IDs in the CIDFont are quite arbitrary (this is what
the Ordering tells you in the CIDFont dictionary and the CMap). If you
had a CIDFont with an ordering of Adobe and a CMap with an orderinf of
(say) FontWorks, then composing the two would yield a font which didn't
work as expected.
[/aside]


You are correct in that the PostScript code is quite simple to create,
the problem is in getting the character codes to match. Knowing which of
the various possible encodings is being used, and knowing how to access
a particular glyph in that encoding is the difficult part, and why its
useful to be a native speaker. For example, you know that 0x41 is a
capital 'A' in normal Latin character encodings (whether WinAsni,
MacRoman, StandardEncoding or almost anything else).

You haven't said how you are going to determine what glyphs you want to
use, but I'm assuming Unicode, which is why I said that what you need is
a Unicode CMap. This will convert the Unicode code points into the
correct Character ID for the font in question. Adobe *do* publish a
number of CMaps, and I *think* that several of them are Unicode CMaps.
I'm not an expert in the Adobe naming conventions, and I've never had to
care what the CMaps *mean*, but 'UniCNS-UCS2-H' may do what you want.


In fact, its often easier to use a TrueType font, and embed a subset as
a CIDFont with TrueType outlines (I forget the CIDFont type).

The reason for this is that you can assign any charcter IDs you like to
the glyphs, so what you do is assign them IDs which match the Unicode
code points. Then you either use glyphshow, or more usually compsoe the
font with an Identity CMap, either Identity-V or Identity-H depending on
the writing direction.

Of course that means learning how to read a TrueType collection, and how
to create a CIDFont, which isn't trivial.


There's a lot to digest there, so feel free to come back if there are
any unclear points.


Ken
Continue reading on narkive:
Loading...