Discussion:
64 bit, unsigned integer arithmetic in PostScript
(too old to reply)
Terry Burton
2018-07-04 22:25:36 UTC
Permalink
Hi,

I'm currently looking at what it would take to implement (for Barcode Writer in Pure PostScript) a somewhat "unique" barcode standard that is in the process of being drafted.

In its current form, the error correction and interleaving routines of this symbology rely upon a well-defined source of stable, pseudo-randomised, 32-bit numbers generated according to a variant of Knuth's MMIX LCG. This is essentially a loop in which seed is a 64-bit, unsigned integer with well-defined overflow behaviour:

forever {
seed = 6364136223846793005 * seed + 1
emit seed >> 32
}

Firstly, it is quite a challenge to emulate this efficiently in PostScript using (typically) 32 bit, signed integers that are implicitly converted into reals upon overflow. (I'm aware that recent GhostScript has 64-bit, *signed* integers.) I wonder whether anyone has handled this kind of thing in PostScript before? Clearly creating a naive "bigint" library (i.e. do all operations on strings/arrays) would suck when it comes to performance. Nevertheless partitioning the 64-bit number into 16-bit ranges and carefully handling the tricky carries and overflows from the arithmetic multiply and add operations may ultimately yield sufficient performance. I haven't attempted this yet on the basis that something well-tested may already exist!

Secondly, prior to use each pseudorandom 32-bit, unsigned output is scaled to precise integer within the desired range (variable) by dividing by 2^32 then multiplying by the width of the range. I'm not even convinced that all C implementations would result in stable output because whilst they may implement the IEEE 754 for floating point representation they may also "cheap out" on proper rounding of division. For PostScript, synthesising the floating point division of an initial 32-bit, unsigned integer by "MAX_INT" then performing a multiply operation on the result seems like it would be a horrendous mess prone to round-off errors.

(To remedy the first point I'll be recommending that the standard change to a different PRNG compatible with shorter data types and does not rely of C-like datatypes. To remedy the second I'll recommend scaling the PRNG output to a desired range using iteration rather than division, e.g. https://stackoverflow.com/a/10984975/2568535.)

But in the meantime, advice from anyone familiar with handling C-like datatype semantics in PostScript is gratefully received :-)


Thanks,

Terry
luser droog
2018-07-05 18:25:03 UTC
Permalink
Post by Terry Burton
Hi,
I'm currently looking at what it would take to implement (for Barcode Writer in Pure PostScript) a somewhat "unique" barcode standard that is in the process of being drafted.
Some thoughts. For a "C" portability perspective, it should be reasonable
to use ghostscript's 64-bit integer as long as you do an assert() to guard
the code that relies on this assumption.

Maybe something like this

{ mark
(16#1122334455667788) token pop
16#11223344 32 bitshift 16#55667788 add eq
} stopped {
% rangecheck or other error
cleartomark
/64-bit-okay false def
}{
% 64bit seems ok
{ /64-bit-okay true def } if
} ifelse


64-bit-okay not {
CODE-REQUIRES-64-BIT-INTEGERTYPE %poor-man's error message
}{
% code that assumes 64 bit
% ...
} ifelse

Implementing a more widely portable solution is a bigger -- but separate --
problem.


For PRNGs my discovery has been that having the seed size the same as
the payload size makes the implementation very constrained. It'd be nice
if the seed could be an array of integers. This would affect 'srand' and
'rrand' but not 'rand' itself as far as interfaces.

This would make it substantially easier to write a PRNG with good
characteristics.
John Reiser
2018-07-07 02:53:13 UTC
Permalink
Post by Terry Burton
forever {
seed = 6364136223846793005 * seed + 1
emit seed >> 32
}
Split the previous seed into four digits in base 0x10000. The multiplier
6364136223846793005 == 0x5851_f42d_4c95_7f2d. Only the 0xf42d digit
needs the alternate representation 0xf42d == (0x10000 - 0x0bd3)
to enable 64 x 64 ==> 64-bit multiply by using individual multiplies
of 15 x 16 ==> 31 bits with no overflow in signed 32-bit arithmetic.
Neither the code nor the performance is horrible, even in Postscript.
Post by Terry Burton
Secondly, prior to use each pseudorandom 32-bit, unsigned output is scaled to precise integer within the desired range (variable) by dividing by 2^32 then multiplying by the width of the range. I'm not even convinced that all C implementations would result in stable output because whilst they may implement the IEEE 754 for floating point representation they may also "cheap out" on proper rounding of division. For PostScript, synthesising the floating point division of an initial 32-bit, unsigned integer by "MAX_INT" then performing a multiply operation on the result seems like it would be a horrendous mess prone to round-off errors.
Dividing by 2**32 is exact. Where is the problem of "[im]proper rounding of division"?
Please use code, not words, to specify the computation in question.
Terry Burton
2018-07-10 01:01:01 UTC
Permalink
Post by John Reiser
Post by Terry Burton
forever {
seed = 6364136223846793005 * seed + 1
emit seed >> 32
}
Split the previous seed into four digits in base 0x10000. The multiplier
6364136223846793005 == 0x5851_f42d_4c95_7f2d. Only the 0xf42d digit
needs the alternate representation 0xf42d == (0x10000 - 0x0bd3)
to enable 64 x 64 ==> 64-bit multiply by using individual multiplies
of 15 x 16 ==> 31 bits with no overflow in signed 32-bit arithmetic.
Neither the code nor the performance is horrible, even in Postscript.
Much appreciated John, and sorry for the delayed response - struggling for time in front of the screen...

Avoiding the overflow by letting the first digit steal from the second (thereby pushing it negative) was the trick that was just out of reach.

I now have some working noddy code which I will post once I've cleaned it up to make it look more like sane PostScript.
Post by John Reiser
Post by Terry Burton
Secondly, prior to use each pseudorandom 32-bit, unsigned output is scaled to precise integer within the desired range (variable) by dividing by 2^32 then multiplying by the width of the range. I'm not even convinced that all C implementations would result in stable output because whilst they may implement the IEEE 754 for floating point representation they may also "cheap out" on proper rounding of division. For PostScript, synthesising the floating point division of an initial 32-bit, unsigned integer by "MAX_INT" then performing a multiply operation on the result seems like it would be a horrendous mess prone to round-off errors.
Dividing by 2**32 is exact. Where is the problem of "[im]proper rounding of division"?
Please use code, not words, to specify the computation in question.
Good point and I'll take a look at this next. For discrete processes it never seems like a good idea to round trip between integer arithmetic and floating point. But of course if no precision is traded then it doesn't matter.

I'll pull out the specifics if I have trouble and post working PostScript if not.

Thanks!
Terry Burton
2018-07-12 08:38:08 UTC
Permalink
Post by Terry Burton
Post by John Reiser
Post by Terry Burton
forever {
seed = 6364136223846793005 * seed + 1
emit seed >> 32
}
Split the previous seed into four digits in base 0x10000. The multiplier
6364136223846793005 == 0x5851_f42d_4c95_7f2d. Only the 0xf42d digit
needs the alternate representation 0xf42d == (0x10000 - 0x0bd3)
to enable 64 x 64 ==> 64-bit multiply by using individual multiplies
of 15 x 16 ==> 31 bits with no overflow in signed 32-bit arithmetic.
Neither the code nor the performance is horrible, even in Postscript.
Much appreciated John, and sorry for the delayed response - struggling for time in front of the screen...
Avoiding the overflow by letting the first digit steal from the second (thereby pushing it negative) was the trick that was just out of reach.
I now have some working noddy code which I will post once I've cleaned it up to make it look more like sane PostScript.
Thanks again for the hints. My working "lcg64_temper" implementation based on John's advice is here:

https://github.com/bwipp/postscriptbarcode/blob/b4520c7184e08cc8ea17b4d7f50cd5ffe49f0ff2/contrib/development/lcg64_temper.ps
Post by Terry Burton
Post by John Reiser
Post by Terry Burton
Secondly, prior to use each pseudorandom 32-bit, unsigned output is scaled to precise integer within the desired range (variable) by dividing by 2^32 then multiplying by the width of the range. I'm not even convinced that all C implementations would result in stable output because whilst they may implement the IEEE 754 for floating point representation they may also "cheap out" on proper rounding of division. For PostScript, synthesising the floating point division of an initial 32-bit, unsigned integer by "MAX_INT" then performing a multiply operation on the result seems like it would be a horrendous mess prone to round-off errors.
Dividing by 2**32 is exact. Where is the problem of "[im]proper rounding of division"?
Please use code, not words, to specify the computation in question.
Good point and I'll take a look at this next. For discrete processes it never seems like a good idea to round trip between integer arithmetic and floating point. But of course if no precision is traded then it doesn't matter.
The C code provided by the specification (trimmed version below) converts the uint32_t output of lcg64_temper to a float, then divides by 2^32-1, before multiplying and recasting to an int32_t. Unless I'm missing something fundamental it seems as though this process will lead to irrecoverable loss of precision that will cause rounding differences between implementations...

----

#include <stdio.h>
#include <inttypes.h>

static uint64_t lcg64_seed;

uint32_t temper(uint32_t x) {
    x ^= x>>11;
    x ^= x<<7 & 0x9D2C5680;
    x ^= x<<15 & 0xEFC60000;
    x ^= x>>18;
    return x;
}

uint32_t lcg64_temper() {
    lcg64_seed = 6364136223846793005ULL * lcg64_seed + 1;
    return temper(lcg64_seed >> 32);
}

int main() {
    int32_t data_length=1234;  /* Dummy, up to ~27000 bits */
    lcg64_seed = 226759;
    for (int32_t i=0; i < data_length; i++) {
        int32_t pos = (int32_t)( (float)lcg64_temper() / (float)UINT32_MAX * (data_length-i) );
        printf("Swapping bit %" PRId32 " with bit %" PRId32 "\n", data_length-1-i, pos);
    }
    return 0;
}
John Reiser
2018-07-14 23:54:23 UTC
Permalink
Post by Terry Burton
https://github.com/bwipp/postscriptbarcode/blob/b4520c7184e08cc8ea17b4d7f50cd5ffe49f0ff2/contrib/development/lcg64_temper.ps
The shifting of two 16-bit [unsigned] numbers in function bitshift2() is tedious.
Instead, shift a native Postscript 32-bit signed integer in function temper(),
and correct for sign extension (AND-off the propagated sign bit) when the input
has the high bit set and the shift count is negative (thus a right shift).
It can be simpler and faster to do the AND for all right shifts, even though
a positive input will not need it.
Post by Terry Burton
% Avoid 15-bit overflow when multiplying the digits by stealing from the second digit
A major point of my earlier comment about > 15 x 16 ==> 31 bits with no overflow in signed 32-bit arithmeticis: once the multiplier 6364136223846793005 == 0x5851_f42d_4c95_7f2d
has been re-coded by replacing 0xf42d by (0x10000 - 0x0bd3) then the
previous value of the 'seed' variable needs only to be split into
four 16-bit digits, and does not require re-coding even though a digit
may have its bit 15 (0x8000) set. There are (4+3+2+1) multiplies using 16 bits
from seed and 15 bits from the multiplier; each multiply giving 31 bits
and thus no overflow in 32-bit Postscript signed integers. Only the compile-time
re-coding of the constant multiplier is necessary; the dynamic digits of 'seed'
never need re-coding.
Post by Terry Burton
The C code provided by the specification (trimmed version below) converts the uint32_t output of lcg64_temper to a float, then divides by 2^32-1, before multiplying and recasting to an int32_t. Unless I'm missing something fundamental it seems as though this process will lead to irrecoverable loss of precision that will cause rounding differences between implementations...
int main() {
    int32_t data_length=1234;  /* Dummy, up to ~27000 bits */
    lcg64_seed = 226759;
    for (int32_t i=0; i < data_length; i++) {
        int32_t pos = (int32_t)( (float)lcg64_temper() / (float)UINT32_MAX * (data_length-i) );
There are some misconceptions in the quoted plain text [above] which precedes the quoted code.
The value of (float)UINT32_MAX is exactly 2**32 because the IEEE-754 conversion is rounded.
The value of (float)lcg64_temper() also is rounded to 24 bits of precision from [upto] 32 bits.
In nearly all cases this loses some precision, but there is no ambiguity in the result
because IEEE-754 specifies the rules which generate a repeatable result.
The division of those two floats is exact (merely subtract exponents).
The multiplication by (data_length - i) incurs some rounding of the 48-bit
intermediate mantissa to the 24 bits of a 'float'. Again, IEEE-754 guarantees
the same result in all implementations. The final conversion from float to int32_t is exact.
[If the implementation of the floating-point operations used here does not conform
to IEEE-754 then the implementation is not worth anyone's time or money. Really!]
Terry Burton
2018-07-15 00:40:26 UTC
Permalink
Post by John Reiser
Post by Terry Burton
https://github.com/bwipp/postscriptbarcode/blob/b4520c7184e08cc8ea17b4d7f50cd5ffe49f0ff2/contrib/development/lcg64_temper.ps
The shifting of two 16-bit [unsigned] numbers in function bitshift2() is tedious.
Instead, shift a native Postscript 32-bit signed integer in function temper(),
and correct for sign extension (AND-off the propagated sign bit) when the input
has the high bit set and the shift count is negative (thus a right shift).
It can be simpler and faster to do the AND for all right shifts, even though
a positive input will not need it.
Thanks. My latest version (just committed) does that - the lcg64_temper procedure returns correct 32-bit patterns, albeit with PostScript representing this in signed format (and I'm currently forcing 64 bit into the same range, which might be a temporary feature):

https://github.com/bwipp/postscriptbarcode/blob/master/contrib/development/lcg64_temper.ps
Post by John Reiser
Post by Terry Burton
% Avoid 15-bit overflow when multiplying the digits by stealing from the second digit
A major point of my earlier comment about > 15 x 16 ==> 31 bits with no overflow in signed 32-bit arithmeticis: once the multiplier 6364136223846793005 == 0x5851_f42d_4c95_7f2d
has been re-coded by replacing 0xf42d by (0x10000 - 0x0bd3) then the
previous value of the 'seed' variable needs only to be split into
four 16-bit digits, and does not require re-coding even though a digit
may have its bit 15 (0x8000) set. There are (4+3+2+1) multiplies using 16 bits
from seed and 15 bits from the multiplier; each multiply giving 31 bits
and thus no overflow in 32-bit Postscript signed integers. Only the compile-time
re-coding of the constant multiplier is necessary; the dynamic digits of 'seed'
never need re-coding.
I need to read this carefully (not at 1:30AM!) but I think my code does what you state. The "borrowing" that I refer to in the comment is a representation of the multiplier in the source rather than a repeated runtime activity. (I do also borrow elsewhere from each digit but that's to force the digits representing the sum of the products to be positive to simplify the carry code, so there might be an optimisation/simplification here.)
Post by John Reiser
Post by Terry Burton
The C code provided by the specification (trimmed version below) converts the uint32_t output of lcg64_temper to a float, then divides by 2^32-1, before multiplying and recasting to an int32_t. Unless I'm missing something fundamental it seems as though this process will lead to irrecoverable loss of precision that will cause rounding differences between implementations...
int main() {
    int32_t data_length=1234;  /* Dummy, up to ~27000 bits */
    lcg64_seed = 226759;
    for (int32_t i=0; i < data_length; i++) {
        int32_t pos = (int32_t)( (float)lcg64_temper() / (float)UINT32_MAX * (data_length-i) );
There are some misconceptions in the quoted plain text [above] which precedes the quoted code.
The value of (float)UINT32_MAX is exactly 2**32 because the IEEE-754 conversion is rounded.
The value of (float)lcg64_temper() also is rounded to 24 bits of precision from [upto] 32 bits.
In nearly all cases this loses some precision, but there is no ambiguity in the result
because IEEE-754 specifies the rules which generate a repeatable result.
The division of those two floats is exact (merely subtract exponents).
The multiplication by (data_length - i) incurs some rounding of the 48-bit
intermediate mantissa to the 24 bits of a 'float'. Again, IEEE-754 guarantees
the same result in all implementations. The final conversion from float to int32_t is exact.
[If the implementation of the floating-point operations used here does not conform
to IEEE-754 then the implementation is not worth anyone's time or money. Really!]
Thanks again! I'll digest this and echo back my understanding with code.


All the best,

Terry
Terry Burton
2018-07-16 23:03:45 UTC
Permalink
<...snip...>
Post by Terry Burton
Post by John Reiser
Post by Terry Burton
The C code provided by the specification (trimmed version below) converts the uint32_t output of lcg64_temper to a float, then divides by 2^32-1, before multiplying and recasting to an int32_t. Unless I'm missing something fundamental it seems as though this process will lead to irrecoverable loss of precision that will cause rounding differences between implementations...
int main() {
    int32_t data_length=1234;  /* Dummy, up to ~27000 bits */
    lcg64_seed = 226759;
    for (int32_t i=0; i < data_length; i++) {
        int32_t pos = (int32_t)( (float)lcg64_temper() / (float)UINT32_MAX * (data_length-i) );
There are some misconceptions in the quoted plain text [above] which precedes the quoted code.
The value of (float)UINT32_MAX is exactly 2**32 because the IEEE-754 conversion is rounded.
The value of (float)lcg64_temper() also is rounded to 24 bits of precision from [upto] 32 bits.
In nearly all cases this loses some precision, but there is no ambiguity in the result
because IEEE-754 specifies the rules which generate a repeatable result.
The division of those two floats is exact (merely subtract exponents).
The multiplication by (data_length - i) incurs some rounding of the 48-bit
intermediate mantissa to the 24 bits of a 'float'. Again, IEEE-754 guarantees
the same result in all implementations. The final conversion from float to int32_t is exact.
[If the implementation of the floating-point operations used here does not conform
to IEEE-754 then the implementation is not worth anyone's time or money. Really!]
Thanks again! I'll digest this and echo back my understanding with code.
Who knew computers could be so dependable... :-)

Indeed, with my latest code (link below) I get output identical to the C reference implementation when using PostScript with either 32-bit or 64-bit unsigned integers performing floating point calculations on lcg64_temper return values in ranges [-2^31,2^31-1] and [0,2^32-1], respectively, correcting for the two's complement sign bit:

lcg64_temper
dup 0 lt {16#80000000 xor 2147483648.0 add} if
4294967295.0 div maxrand mul cvi

John: Thanks so much for your clear thinking and careful explanations.

Implementation below (for archiving)...


%!PS

% Implement source of pseudorandom numbers by tempering the output of an LCG:
%
% Xn+1 = (6364136223846793005 * Xn + 1) mod 2^64

% uint32_t temper(uint32_t x) {
% x ^= x>>11;
% x ^= x<<7 & 0x9D2C5680;
% x ^= x<<15 & 0xEFC60000;
% x ^= x>>18;
% return x;
% }
%
% uint32_t lcg64_temper(uint64_t* seed) {
% *seed = 6364136223846793005ULL * *seed + 1;
% return temper(*seed >> 32);
% }

/lcg64_temper {

% Multiply seed by 6364136223846793005 then add 1

/p00 m0 s0 mul def /p01 m0 s1 mul def /p02 m0 s2 mul def /p03 m0 s3 mul def
/p10 m1 s0 mul def /p11 m1 s1 mul def /p12 m1 s2 mul def /p13 m1 s3 mul def
/p20 m2 s0 mul def /p21 m2 s1 mul def /p22 m2 s2 mul def /p23 m2 s3 mul def
/p30 m3 s0 mul def /p31 m3 s1 mul def /p32 m3 s2 mul def /p33 m3 s3 mul def

/s3 p33 16#10000 mod 1 add def % 1 added here
/s2 p32 16#10000 mod p33 16#10000 idiv add
p23 16#10000 mod add def
/s1 p31 16#10000 mod p32 16#10000 idiv add
p22 16#10000 mod p23 16#10000 idiv add add
p13 16#10000 mod add def
/s0 p30 16#10000 mod p31 16#10000 idiv add
p21 16#10000 mod p22 16#10000 idiv add add
p12 16#10000 mod p13 16#10000 idiv add add
p03 16#10000 mod add def

% Make each digit positive by borrowing from the more significant digit
/s3 s3 16#10000 add def
/s2 s2 16#ffff add def
/s1 s1 16#ffff add def
/s0 s0 16#ffff add def

% Carry
/s2 s3 16#10000 idiv s2 add def /s3 s3 16#10000 mod def
/s1 s2 16#10000 idiv s1 add def /s2 s2 16#10000 mod def
/s0 s1 16#10000 idiv s0 add def /s1 s1 16#10000 mod def
/s0 s0 16#10000 mod def

% Temper most significant 32 bits
s0 16#8000 sub 16#10000 mul s1 add 16#80000000 xor
16#ffffffff and
dup -11 bitshift xor
dup 7 bitshift 16#62d3a980 neg and xor % 0x9D2C5680 - 0x100000000
dup 15 bitshift 16#103a0000 neg and xor % 0xEFC60000 - 0x100000000
16#ffffffff and
dup -18 bitshift xor

} bind def


% Avoid 15-bit overflow when multiplying the digits by stealing from the second digit
% m = 6364136223846793005 = 0x 5851 f42d 4c95 7f2d
% s = 226759 = 0x 0000 0000 0003 75C7

/m0 16#5851 16#1 add def /m1 16#f42d 16#10000 sub def /m2 16#4c95 def /m3 16#7f2d def
/s0 16#0000 def /s1 16#0000 def /s2 16#0003 def /s3 16#75C7 def

/maxrand 50000 def
{
lcg64_temper
dup 0 lt {16#80000000 xor 2147483648.0 add} if
4294967295.0 div maxrand mul cvi
==
} loop
John Reiser
2018-07-17 04:19:11 UTC
Permalink
Post by Terry Burton
Post by Terry Burton
        int32_t pos = (int32_t)( (float)lcg64_temper() / (float)UINT32_MAX * (data_length-i) );
... The final conversion from float to int32_t is exact. ...
Oops; it is rounded, not exact. The int32 has 32 bits of precision in contrast
to 24 bits for the float, but the exponent of the float usually causes some of its
mantissa bits to have place value less than 1.
Post by Terry Burton
4294967295.0 div maxrand mul cvi
"4294967295.0 div" always produces the same value as "/ (float)UINT32_MAX"
only because Postscript uses [IEEE-754] 32-bit single-precision to represent
large integers; the encoded value of both literal constants is exactly 2**32.
If Postscript chose [IEEE-754] 64-bit double precision
to represent large integers, then the encoded values would be different
and the formula (with eventual rounding) would produce different answers
in some rare cases.
The specification could be clearer by using "( 65536.0 * 65536 )"
to emphasize that the divisor actually is exactly 2**32.

Loading...