Starting from:

$30

Programming Trade Flow Solved

1 Introduction
Here you will assess trade flow as means of generating profit opportunities in 3 cryptotoken markets. We stress the word “opportunity” because at high data rates like these, and given the markets’ price-time priority, it is far easier to identify desirable trades in the data stream than it is to inject oneself profitably into the fray.

2   Data
We have preprocessed level 2 exchange messages from the Coinbase WebSocket API for you into a more digestible format.

2.1   Treatment
Load the 2021 data for all 3 pairs from the class website. For each one, split it into test and training sets, with your training set containing the first 20% of the data and the test set containing the remainder.

2.2   Format
The data has the following structure[1]

2.2.1     Trades


          1618090137140737000                            1618090137157544000                                35690
1000000
-1
 
          1618090137851379000                            1618090137864544000                                35700
29801980
2
 
          1618270615253262000                            1618270615358639000                                35760
2926932560
-1
 
          1618270616012160000                            1618270616105583000                                35760

The Side is actually a sum of trade sides at the same price and time.

2.2.2     Book
 
16673940
-1
 
Ask1PriceMillionths                                                                       35700                                             35700
 
35770
35770
Bid1PriceMillionths                                                                       35690                                             35690
 
35760
35760
Ask1SizeBillionths                                                             11872084060                                  11872084060
1255039420
1255039420
Bid1SizeBillionths                                                              32957203990                                  32957203990
24752612680
24752612680
Ask2PriceMillionths                                                                       35710                                             35710
35780
35780
Bid2PriceMillionths                                                                       35680                                             35680
35750
35750
Ask2SizeBillionths                                                             31032423370                                  30332423370
31011776970
31011776970
Bid2SizeBillionths                                                              45284575470                                  45284575470
41785630850
41785630850
 received utc nanoseconds

3   Exercise
Write code to find τ-interval trade flow  just prior[2] to each trade data point[3][4] i. Compute T-second forward returns . Regress them against each other in your training set, to find a coefficient β of regression.

For each data point in your test set you already have , so your return prediction is ˆ  . Define a threshold j for ˆri and assume you might attempt to trade whenever j < |rˆi| .

4  Analysis
Assess the trading opportunities arising from using these return predictions in your test set. As part of this assessment, comment on the reliability of β, how you chose j, and what you might expect from using much longer training and test periods.

[1] Note that inaccuracies in clock settings, i.e. “clock skew”, can cause timestamps to appear later than the time at which they are recorded as having been received.

[2] We do not include the trade i data itself, because we are evaluating trade i in terms of the flow we would have been aware of just before it happened.
[3] NOTE: the trade data series does not necessarily have strictly increasing timestamps. Be sure not to include other trades at the same timestamp in your computation of Fi.
[4] It is not necessary to handle latency in your homework, but for your edification: a more careful implementation would account for lags. For a pessimistic approach we could choose L as, say, twice the 99th percentile of computational and communications lag. Then, it would use book data (not just trade data) to help compute return from time ti+L to ti+L+T and run regressions using that. The idea here is that it takes approximately time L to “do anything” about trade information.

More products