razorborg essays

Long Ints from Data Streams in Javascript

September 2, 2015 by Jan Martin Borgersen

This was interesting, and I wasn’t able to find this specific issue discussed online, so here I go attempting to add to the general knowledge of the Internet.

The challenge: In Javascript, read 8 bytes out of an array (for instance, a file or a network packet) and turn it into a number. 53-bit precision is okay (Javascript stores numbers as 64-bit floats with a 53-bit mantissa). Use Uint8Array as our view into the byte array.

Other languages have 64-bit long ints, and this is straightforward. Other languages also have uints, which simplify dealing with 1’s in the most significant bit. However, Javascript treats all numbers as floats with 53-bit precision (the remaining 11 bits are exponent). To get around this, there are a variety of Big Integer libraries that have popped up to enable 64-bit or string-backed arbitrary length large numbers. However, I’m working with code that began its life in Java, and still has to work in Java, so I don’t want to refactor much. Also, 53-bit precision is okay for this problem because these numbers are really only timestamps. I want to handle this math in an abstraction layer that gives me a regular Javascript number to use elsewhere.

The classic approach: bit shifting

The bit shifting approach works great in languages with 64-bit long ints and looks something like this:

for (var i = 0; i < 8; i++) {
if (i > 0) num <<=8;
num |= arr[i] & 0xFF;
}

However, bit shifting in Javascript turns numbers into 32-bit integers (they fit nicely inside a 64 bit float). So this approach shifts your most significant four bytes out of the number entirely.

This won’t work for us.

The higher-level math approach: multiply and add

Multiplying the number by 256 (0x100) and adding the next byte is the mathematical equivalent of left-shifting by 8, and we won’t lose the upper 21 bits of precision. Moreover, since bit-shifting doesn’t buy us any better performance in Javascript, using regular math is just fine. This approach looks like this:

for (var i = 0; i < 8; i++) {
if (i > 0) num *= 0x100;
num += arr[i] & 0xFF;
}

However, this produces some very wrong results when the number is negative and padded with 1’s all the way to the left. After enough runs through the loop, num starts to look negative, and adding arr[i] to a negative number moves num in the wrong direction.

The two 32-bit ints approach

I eventually decided that it was a safer to read the 8 bytes into two 4 byte number words and calculate the correct 53-bit number. This way I wasn’t going to lose any bits until I needed to throw away the upper 21 bits.

Much like the multiply-by-256-and-add approach, this works great (and easily) for positive numbers:

for (var i = 0; i < 8; i++) {
if (i < 4) {
if (i > 0) high = high * 0x100;
high += arr[i] & 0xFF;
} else {
if (i > 4) low = low * 0x100;
low += arr[i] & 0xFF;
}
}
num = high * 0x100000000 + low;

However, negatives are a problem now because we’re only using the bottom 32 bits of a 53-bit float mantissa, so neither the high nor the low int word look like negative numbers until that multiplication step. (Remember this, it’ll be important again later.) During the multiplication, if high * 0x100000000 gives us a 1 in bit 53, it suddenly looks negative to Javascript, adding a positive low value to it, well, it’s the wrong answer.

We have to be smarter.

Two’s Complement Math

Negative numbers are represented in two’s complement form, which, if you remember your CS101, gives us one extra number of representation than simply using a sign bit. It also makes for some elegant bit-level math. But we’re in Javascript, so bit-level math isn’t elegant anymore; in fact, it keeps getting us into trouble here.

Converting to a two’s complement number is simple on paper: flip the bits and add one. Now you have the negative representation of your value. So this feels like it should work:

if (high >= 0x80000000) { // see if 64-bit number is negative
high ^= 0xFFFFFFFF;
low ^= 0xFFFFFFFF;
num = -(high * 0x100000000 + low + 1);
}

But here’s where Javascript gets weird again, and it’s all about bit-level math suddenly turning numbers into 32-bit signed ints. Fire up a console or node and try this:

> var num = 123;
> num ^= 0xFFFFFFFF;
-124

See that negative number? If you flip the bits on the low word and there’s a 1 at bit 32, the number goes negative. high * 0x100000000 + low is now a subtraction instead of an addition, and we get the wrong answer. We have the same problem we hit the last time we tried this line of code, except this time low is the negative value.

Special case the low word

So the way around this is to make sure, when we flip the bits of the low word, we never have a negative number. This seems to work:

if (high >= 0x80000000) {
high ^= 0xFFFFFFFF;
var lowIsPositive = false;
if (low < 0)
low ^= 0xFFFFFFFF;
else {
lowIsPositive = true;
low ^= 0x7FFFFFFF;
}
num = -(high * 0x100000000
+ low
+ 1
+ (lowIsPositive ? 0x80000000 : 0));
else
num = high * 0x100000000 + low;

One curious note: the last line of code still works without worrying about the sign of low because, when we read the bytes into the number, we used the multiply-and-add approach instead of the bit-shift approach. Even if the 32nd bit of low is a 1, it is still considered a positive number. Until, of course, we do some bitwise math and Javascript turns it into a 32 bit signed int.

Morals

My takeaways from this exercise are: