|Table of Contents||Nanex Compression on an OPRA Direct Feed|
One day of data from a direct OPRA feed, October 18,
2006, was compressed with nanex compression
technology and analysed in
detail using a PowerEdge 1750 with dual 3.2GHz Xeon processors, 2 GB of
RAM and Fujitsu 146GB SCSI hard disks on a 533MHz front side bus.
Results are shown for the time period 9:30am to 4:00pm which is the
open and closing of trading. Data before and after this period is not
shown so that the most important time periods can be scaled to show
important details. In general, the off-hours period has similar
characteristics, but at much much lower and negligible values.
Figure 1 below illustrates the incredible size reduction
achieved by nanex compression.
Note how the uncompressed OPRA data peaks in the 90mbps range,
while the nanex compressed data is easily contained in 5 mbps. A standard T-1
Telco circuit is 1.5 mbps. Typical cable modems are capable of between
2 and 6 mbps.
2 below shows the same data in Figure 1, but plotted
on a logarithmic scale
to show compression details. Note how closely the compressed stream
tracks the source data even though the data from opening rotation
differs significantly from data after that period. This is due to
the fact that nanex
compression automatically and quickly adapts to characteristic changes in the input
stream; an important
feature for real-time financial data compression.
3 below shows the Compression Ratio which is simply the ratio of the
two lines in Figure 1. The compression ratio plotted over time
clearly illustrates the compression engine's ability to adapt to changing market
conditions. The inverse
compression ratio subtracted from 1.0 and expressed as a percentage is
another common way to describe the amount of compression. A Ratio of 20 to 1, shown simply as 20 in the
graph below, would become 95% (1.0 - 1/20)
The majority of OPRA packets during trading hours are 991 bytes in length. The second most common size is 941 bytes. The distribution of packets sizes with significant occurrences is shown in Figure 4 below. In this chart, note that there were approximately 42 million OPRA packets with a length of 991 bytes. Each OPRA packet contains multiple messages -- typically 14 to 15 quotes. A regular quote update is 65 bytes. If the NBBO (National Best Bid/Offer) changes and that information is not part of the quote message, there will be one or two 16 byte appendages for a total message size of 81 or 97 bytes respectively. Because OPRA packets cannot exceed 1000 bytes by definition, and messages are never split between packets, there is always some wasted transport space. As a general rule of thumb, the average OPRA quote message size is 66 bytes.
In one millisecond, there are 1,000 microseconds or 1 million nanoseconds. A 1 GHz machine has an instruction clock that ticks once per nanosecond or 1 million times per millisecond. In terms of cpu processing, a millisecond is a very, very long period of time and many software instructions (a few hundred thousand) can be executed during this interval.
Notes from the developer of nanex:
The time to
compress one OPRA packet is a remarkably steady 12 microseconds. The chart in
figure 6 shows the average compression time per packet in 1 second
intervals. A typical OPRA packet contains 15 quote messages.
Nanex compression can be used at the message level (it does not need to read the entire packet to begin compressing). However, since most existing software will processes OPRA data at the packet level, the analysis for this paper focused on the packet level for easier comparison and understanding. Nonetheless, figure 7 is presented below, showing the time spent compressing on a message level. Note the scale is in nanoseconds.
Figure 8 below, compares cpu times other common operations for reference and context. All operations were compiled for maximum performance and executed on the same hardware. The exception is the entry for the Interrupt response time of the newly released SLERT, Suse Linux Realtime, product which is included for context. Note that nanex compression time is a little over two inline memory copy operations.
Figure 9 shows the distribution of compression timings (in
microseconds) for one day of OPRA packets. Note the surprisingly narrow
range of timing results which is a rare and unexpected characteristic
of compression algorithms in general. Nanex
compression time is remarkably predictable: an
invaluable benefit for time sensitive tasks. For
example, a task that accumulates multiple OPRA streams into one could
better determine whether there was enough time to compress and include
another packet in the outbound stream or not.
Another task that benefits from consistent and predictable compression processing time is the determination of the maximum compression rate for a given processor. Figure 10, shows the estimated maximum processing rate in messages per second compared to the actual messages per second on October 18, 2006. The estimated maximum sustained rate is over 6 times the actual peak rate.
11 presents a pie chart view of the distribution of
packet compression times. Note that 99.9% of packets are
compressed in 20 microseconds or less, and 25% of packets are
compressed in less than 10 microseconds.
As expected, the time to uncompress a packet is much shorter than the time to compress it. Furthermore, uncompression time is even more constant and predictable than compression time. Figure 12 shows the distribution of uncompression times for one trading day. The most frequent uncompression time is just 7 microseconds. The uncompression time would smaller by 2 microseconds or more if the packets were uncompressed into a more efficient and ready to use binary format rather than the ASCII format used in the OPRA specification.
algorithms used by programs such as Winzip, zlib,
and gzip, are not written for real-time streaming applications. They
require a large "window" or sample of data before compression begins.
Having a large sample of data to scan for redundancy is a significant
to compression so it's not really a fair comparison. Nonetheless,
Figure 13 shows the results of compressing all the OPRA packets as if
it were a single file compared to nanex compression. The raw OPRA Direct packets would require
a file over 83 gigabytes. Winzip's compresses the file to just under 20
gigabytes (20 billion bytes), yielding a ratio of just over 4 to 1.
Nanex compression produces a file that is just 4.4 gigabytes, a ratio
of 19 to 1.
Figure 14 shows the comparison of compression times between
WinZip and nanex compression on a day's worth of OPRA packets in a
file. Not only is nanex compression 4 times smaller than WinZip, it's
also nearly 4 times faster.
compression produces very small packet sizes quickly and consistently.
It can easily handle OPRA's rapid growth in message update rates. With
it's ability to automatically adapt to changing input data, it is
likely to continue to run for many years without maintenance or changes
to the algorithm.
Copyright 2006 Nanex, LLC. All Rights Reserved.