Wednesday 21 January 2015

Bucket Based Compression for Data Transmission

This approach to transferring more information than what is being transported is not time based, it's also possible to achieve much more stable and higher transmission rates.

A basic scenario is as follows, a person is given a binary stream of bits that needs to be transferred over some distance, we can say that each bit is a colored ball, a black ball is a binary number zero and a white ball is a binary number one. At the destination, another person has placed four buckets, each bucket is numbered with one of the four possible combinations of two bits, 00-01-10-11. 

The person that is transmitting data takes the first 3 balls from the stream that he is given. The procedure for getting those three bits over to the destination is simple, the first bit is thrown into a bucket that has the same label as the next two bits. For instance if the three balls are black, black, white (001), a black ball is thrown into the second bucket (01). The receiver just checks what color the ball that has arrived is and appends the bucket label to it.

In this instance one ball has been thrown and three balls worth of information have been received. The transmission to transportation ratio is 3:1, that is a 66% compression rate, something that is very hard to achieve with file compression. And this method is independent of actual data. 

Best thing about this approach is that the compression scales upwards with more buckets, so for every 2n buckets you get n bits of compression. With 128 buckets you get 7 bits of free data, that is a 8:1 transmission to transport ratio, with only 12.5% of data actually being transported, you are transferring 8x more data than what is being sent.

Problem with this is that I can't think of a way to adopt this to TCP/IP transmission, there are a few reasons, first the TCP/IP packet already contains the port number that would be used as the bucket number, thus there is no real compression as you are just compressing the packet information. Second reason is that a TCP/IP packet's minimum size is over 128 bits, that means that the ratios would be at 128 bits instead of one, so with 128 buckets the compression gain would be at around 6% instead of 800%.