Post by John SchnellI was laboring under that assumption that postscript was all text, but
these examples contain bits of binary data do they not?
Yes, but the 'text' in PostScript isn't text. I know it *looks* like
text, because most programs use an Encoding which happens to map the
normal ASCII values in their usual positions. But it doesn't have to be
so. Somewhere else recently I tried to explain this to another poster.
Hmm, I see that was in comp.text.pdf, here's what I said there:
-----------------------------------------------------------------------
You need to apply the Encoding in use with the font at the time. I know
it looks like the text in the strings is ASCII, but that's a false
impression. The outlines in a font are stored by *name*, usually names
like /A, /B, /C etc, but not neccessarily so, names like /G01 are not
uncommon.
The way the text string you see in the PDF file is mapped to the named
glyph in the font program is via the Encoding array. The numeric
character value is looked up in a 256 element array, which maps the
number to a name. For example, the Encoding array might contain /A at
index 0x41, /B at 0x42 etc.
This is quite common, and since these are ASCII values, it *looks* like
PDF text uses ASCII values.
However, because its not that simple, thsi will inevitably fail
eventually. A subset font, for instance, usually does not bother
Encoding its glyphs with ASCII values. A font with 8 glyphs may well
position them at Encoding positions 1 to 8. For example:
/Encoding:
Index Glyph Name
0 /A
1 /n
2 /space
3 /e
4 /x
5 /a
6 /m
7 /p
8 /l
Now the text string will look like this:
(/000/001/002/003/004/005/006/007/008/003)
but the text in Acrobat will read 'An example'.
--------------------------------------------------------
All the points above relate to PostScript as much as to PDF.
The other thing you need to bear in mind is that CJKV fonts have *far*
more than 256 glyphs in them. So you can't use a single byte (as above)
to access all the glyphs.
For old-style (type 0 'OCF') fonts, all glyphs are encoded using two
bytes. However, for CID-keyed fonts, the number of bytes required to
encode a font is variable (in fact its part of the information in the
CMap). I believe one of the Chinese foundries has a font which is
encoded using between 1 and 5 bytes.
Post by John SchnellI presume I
need to set up my keyboard properly to enter asian characters before I
can actually enter a statement like the one above into ghostscript by
hand?
Not really, you need to enter the correct byte sequence. You may find it
easier to use a hex streing instead of binary of course.
Post by John SchnellOr does it not work that way? Are the asian characters in the
show command above CID or unicode ... or something else I missing?
In the case above they are in 'shift JIS' which is what the RKSJ in the
CMap name means, and a horizontal writing mode, which is what the -H
means. Shift-JIS is a standard 'encoding' in Japanese, somewhat
(vaguely) akin to ASCII in latin fonts.
Don't forget that there are many possible encodings, and don't fall into
the trap of assuming that ASCII is the only latin representation, there
are a number of coding standards in 'English'. The OP in my reply on
comp.text.pdf was having problems with file made on a Mac, because it
used Mac Roman encoding, instead of WinAnsi.
If you go back further you might even recall encodings such as
EBCDIC....
Post by John SchnellIf all I know is the unicode value and the font what do I do to get the
correct postscript code?
You need to compose the CIDFont using a Unicode CMap, and then put the
Unicode bytes in the text string. That's the 'easy' way to do it.
Post by John SchnellWould using glyphshow help?
Not really, then you would need to know what the Character IDs in the
CIDFont 'mean' and you would have to manually convert the Unicode code
point to a proper Character ID. This is what the CMap does for you
automatically. If you use a standard CMap its even better, because you
don't care if the Ordering is different in the font, the CMap will take
care of that, whereas if you do it yourself you need to handle the
Ordering directly.
Post by John SchnellThe docs say
gylphshow can take a numeric (CID) operand if the current font is CID.
Does this work in ghostscript?
I'm sure it does, but see my other reply about what is a CIDFont, and
what is a CID-keyed font.
Post by John SchnellIs there an example around? Why would
one prefer glyphshow over show (or vice versa)? I've seen examples like
'/uniXXXX glyphshow' (where XXXX is some unicode value), apparently
there are ps fonts out there where some of the high order glyphs are
named in this mannor.
Thats a *name* ('/' introduces a name in PostScript), not a numeric,
which means its not a CIDFont. If you read the snippet from
comp.text.pdf above, you'll see that glyphs in regular fonts can be
named anything you like (glyphs in CIDFonts don't have names, they have
Character IDs).
What has happened here is that someone has embedded a font where the
glyph names are constructed from Unicode values. This is most definitely
not a standard. Probably you have an embedded TrueType font in the file,
and the application (or PostScript driver) has created the font for
download.
It doesn't help you, because glyphs in CIDFonts won't have names.
Ken