I am following the documentation on apple.com.
I managed to get The 'cmap' encoding subtables
. I know 100% that platformID, platformSpecificID
are correct, but offset
is suspicious. Here is the data:
array(3) {
[0]=>
array(3) {
["platform_id"]=>
int(0)
["specific_id"]=>
int(3)
["offset"]=>
int(532)
}
[1]=>
array(3) {
["platform_id"]=>
int(1)
["specific_id"]=>
int(0)
["offset"]=>
int(28)
}
[2]=>
array(3) {
["platform_id"]=>
int(3)
["specific_id"]=>
int(1)
["offset"]=>
int(532)
}
}
Offset for two tables is the same, 532
. Can anyone explain me this? And is this offset from current position or from the beginning of the file?
part 2
Ok. So I managed to get to the format
tables using this:
private function parseCmapTable($table)
{
$this->position = $table['offset'];
// http://developer.apple.com/fonts/ttrefman/RM06/Chap6cmap.html
// General table information
$data = array
(
'version' => $this->getUint16(),
'number_subtables' => $this->getUint16(),
);
$sub_tables = array();
for($i = 0; $i < $data['number_subtables']; $i++)
{
// http://developer.apple.com/fonts/ttrefman/RM06/Chap6cmap.html
// The 'cmap' encoding subtables
$sub_tables[] = array
(
'platform_id' => $this->getUint16(),
'specific_id' => $this->getUint16(),
'offset' => $this->getUint32(),
);
}
// http://developer.apple.com/fonts/ttrefman/RM06/Chap6cmap.html
// The 'cmap' formats
$formats = array();
foreach($sub_tables as $t)
{
// http://stackoverflow.com/questions/5322019/character-to-glyph-mapping-table/5322267#5322267
$this->position = $table['offset'] + $t['offset'];
$format = array
(
'format' => $this->getUint16(),
'length' => $this->getUint16(),
'language' => $this->getUint16(),
);
if($format['format'] == 4)
{
$format += array
(
'seg_count_X2' => $this->getUint16(),
'search_range' => $this->getUint16(),
'entry_selector' => $this->getUint16(),
'range_shift' => $this->getUint16(),
'end_code[segCount]' => $this->getUint16(),
'reserved_pad' => $this->getUint16(),
'start_code[segCount]' => $this->getUint16(),
'id_delta[segCount]' => $this->getUint16(),
'id_range_offset[segCount]' => $this->getUint16(),
'glyph_index_array[variable]' => $this->getUint16(),
);
$backup = $format;
$format['seg_count_X2'] = $backup['seg_count_X2']*2;
$format['search_range'] = 2 * (2 * floor(log($backup['seg_count_X2'], 2)));
$format['entry_selector'] = log($backup['search_range']/2, 2);
$format['range_shift'] = (2 * $backup['seg_count_X2']) - $backup['search_range'];
}
$formats[$t['offset']] = $format;
}
die(var_dump( $sub_tables, $formats ));
The output:
array(3) {
[0]=>
array(3) {
["platform_id"]=>
int(0)
["specific_id"]=>
int(3)
["offset"]=>
int(532)
}
[1]=>
array(3) {
["platform_id"]=>
int(1)
["specific_id"]=>
int(0)
["offset"]=>
int(28)
}
[2]=>
array(3) {
["platform_id"]=>
int(3)
["specific_id"]=>
int(1)
["offset"]=>
int(532)
}
}
array(2) {
[532]=>
array(13) {
["format"]=>
int(4)
["length"]=>
int(658)
["language"]=>
int(0)
["seg_count_X2"]=>
int(192)
["search_range"]=>
float(24)
["entry_selector"]=>
float(5)
["range_shift"]=>
int(128)
["end_code[segCount]"]=>
int(48)
["reserved_pad"]=>
int(58)
["start_code[segCount]"]=>
int(64)
["id_delta[segCount]"]=>
int(69)
["id_range_offset[segCount]"]=>
int(70)
["glyph_index_array[variable]"]=>
int(90)
}
[28]=>
array(3) {
["format"]=>
int(6)
["length"]=>
int(504)
["language"]=>
int(0)
}
}
Now, how do I get from here, to getting character Unicode codes? I tried reading the documentation, but it is too vague for a novice.
http://developer.apple.com/fonts/ttrefman/RM06/Chap6cmap.html
The offset is from the beginning of the table. What your data is saying is that the Mac table (platformId 1) starts at offset 28, while the Unicode (platformId 0) and Windows (platformId 3) mappings share the same table that starts at byte offset 532.