基于协整的配对交易策略 | 宽客秀 Quant Show

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

谈吐大方的啤酒 · UnicodeDecodeError ...· 1 周前 ·

风流倜傥的山寨机 · ReferenceError: ...· 4 天前 ·

刀枪不入的领结 · Updating FF, using ...· 2 月前 ·

年轻有为的猴子 · 冯小刚为何炮轰“小鲜肉”-文章-新闻教育培训 ...· 4 月前 ·

细心的板栗 · 证书信任链 - Let's Encrypt· 5 月前 ·

潇洒的西瓜 · 如何将PDF中的A3页面裁剪为2个A4的PD ...· 6 月前 ·

憨厚的苦咖啡 · 曼岛tt历届冠军？ - 爱美摔· 7 月前 ·

import numpy as np
import pandas as pd
import statsmodels
from statsmodels.tsa.stattools import coint
import matplotlib.pyplot as plt
# just set the seed for the random number generator
np.random.seed(107)
# Generate the daily returns
Xreturns = np.random.normal(0, 1, 100) 
# sum them and shift all the prices up
X = pd.Series(np.cumsum(Xreturns), name='X') + 50
X.plot(figsize=(15,7))
plt.show()

noise = np.random.normal(0, 1, 100)
Y = X + 5 + noise
Y.name = 'Y'
pd.concat([X, Y], axis=1).plot(figsize=(15,7))
plt.show()

(Y/X).plot(figsize=(15,7)) 
plt.axhline((Y/X).mean(), color='red', linestyle='--') 
plt.xlabel('Time')
plt.legend(['Price Ratio', 'Mean'])
plt.show()

stock_list = ['600015.XSHG', '601169.XSHG']
sh = get_price(stock_list, start_date="2019-01-01", end_date="2020-04-30", frequency="daily", fields=['close'])['close']
PAf=sh['600015.XSHG']
PBf=sh['601169.XSHG']
PAflog=np.log(PAf)
PBflog=np.log(PBf)

from arch.unitroot import ADF
adfA=ADF(PAflog)
print(adfA.summary().as_text())

   Augmented Dickey-Fuller Results   
=====================================
Test Statistic                 -0.972
P-value                         0.763
Lags                                0
-------------------------------------
Trend: Constant
Critical Values: -3.45 (1%), -2.87 (5%), -2.57 (10%)
Null Hypothesis: The process contains a unit root.
Alternative Hypothesis: The process is weakly stationary.

retA=PAflog.diff()[1:]
adfretA=ADF(retA)
print(adfretA.summary().as_text())

 Augmented Dickey-Fuller Results   
=====================================
Test Statistic                -17.403
P-value                         0.000
Lags                                0
-------------------------------------
Trend: Constant
Critical Values: -3.45 (1%), -2.87 (5%), -2.57 (10%)
Null Hypothesis: The process contains a unit root.
Alternative Hypothesis: The process is weakly stationary.

import statsmodels.api as sm
model = sm.OLS(PBflog, sm.add_constant(PAflog))
results = model.fit()
print(results.summary())

#回归截距项
alpha=results.params[0]
#提取回归系数
beta=results.params[1]
spread=PBflog-beta*PAflog-alpha
#绘制残差序列时序图
spread.plot()
plt.title('价差序列')

#对残差做平稳性检验
adfSpread = ADF(spread,trend='nc') #残差均值为0，所以trend设为nc
print(adfSpread.summary().as_text())

Augmented Dickey-Fuller Results   
=====================================
Test Statistic                 -3.555
P-value                         0.000
Lags                                3
-------------------------------------
Trend: No Trend
Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%)
Null Hypothesis: The process contains a unit root.
Alternative Hypothesis: The process is weakly stationary.

# compute the p-value of the cointegration test
# will inform us as to whether the ratio between the 2 timeseries is stationary
# around its mean
score, pvalue, _ = coint(X,Y)

def find_cointegrated_pairs(data):
    # data的长度
    n = data.shape[1]
    # 初始化
    score_matrix = np.zeros((n, n))
    pvalue_matrix = np.ones((n, n))
    # 抽取列的名称
    keys = data.keys()
    # 初始化强协整组
    pairs = []
    for i in range(n):
        for j in range(i+1, n):
            S1 = data[keys[i]]
            S2 = data[keys[j]]
            result = coint(S1, S2)
            score = result[0]
            pvalue = result[1]
            score_matrix[i, j] = score
            pvalue_matrix[i, j] = pvalue
            if pvalue < 0.05:
                pairs.append((keys[i], keys[j]))
    return score_matrix, pvalue_matrix, pairs

stock_list = ["600000.XSHG", "600015.XSHG", "600016.XSHG", "600036.XSHG", "601009.XSHG","601166.XSHG", "601169.XSHG", "601328.XSHG", "601398.XSHG", "601988.XSHG", "601998.XSHG"]
data = get_price(stock_list, start_date="2019-01-01", end_date="2020-04-30", frequency="daily", fields=["close"])["close"]
# Heatmap to show the p-values of the cointegration test
# between each pair of stocks
scores, pvalues, pairs = find_cointegrated_pairs(data)
import seaborn
m = [0,0.2,0.4,0.6,0.8,1]
seaborn.heatmap(pvalues, 
                xticklabels=stock_list, 
                yticklabels=stock_list, 
                cmap='RdYlGn_r' , 
                mask = (pvalues >= 0.98))
plt.show()
print(pairs)

8266e4bfeda1bd42d8f9794eb4ea0a13 — [('600015.XSHG', '601169.XSHG'), ('600036.XSHG', '601328.XSHG'), ('601166.XSHG', '601398.XSHG')]

def zscore(series):
    return (series - series.mean()) / np.std(series)

zscore(ratios).plot(figsize=(15,7))
plt.axhline(zscore(ratios).mean(), color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.legend(['Ratio z-score', 'Mean', '+1', '-1'])
plt.show()

ratios = data['600015.XSHG'] / data['601169.XSHG']
print(len(ratios))
le = int(len(ratios)*7/10)
train = ratios[:le]
test = ratios[le:]

ratios_mavg5 = train.rolling(window=5,center=False).mean()
ratios_mavg60 = train.rolling(window=60,center=False).mean()
std_60 = train.rolling(window=60,center=False).std()
zscore_60_5 = (ratios_mavg5 - ratios_mavg60)/std_60
plt.figure(figsize=(15,7))
plt.plot(train.index, train.values)
plt.plot(ratios_mavg5.index, ratios_mavg5.values)
plt.plot(ratios_mavg60.index, ratios_mavg60.values)
plt.legend(['Ratio','5d Ratio MA', '60d Ratio MA'])
plt.ylabel('Ratio')
plt.show()

# Take a rolling 60 day standard deviation
std_60 = train.rolling(window=60,center=False).std()
std_60.name = 'std 60d'
# Compute the z score for each day
zscore_60_5 = (ratios_mavg5 - ratios_mavg60)/std_60
zscore_60_5.name = 'z-score'
plt.figure(figsize=(15,7))
zscore_60_5.plot()
plt.axhline(0, color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.legend(['Rolling Ratio z-Score', 'Mean', '+1', '-1'])
plt.show()

# Plot the ratios and buy and sell signals from z score
plt.figure(figsize=(15,7))
train[60:].plot()
buy = train.copy()
sell = train.copy()
buy[zscore_60_5>-1] = 0
sell[zscore_60_5<1] = 0
buy[60:].plot(color='g', linestyle='None', marker='^')
sell[60:].plot(color='r', linestyle='None', marker='^')
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,ratios.min(),ratios.max()))
plt.legend(['Ratio', 'Buy Signal', 'Sell Signal'])
plt.show()

# Plot the prices and buy and sell signals from z score
plt.figure(figsize=(18,9))
S1 = data['600015.XSHG'].iloc[:le]
S2 = data['601169.XSHG'].iloc[:le]
S1[60:].plot(color='b')
S2[60:].plot(color='c')
buyR = 0*S1.copy()
sellR = 0*S1.copy()
# When buying the ratio, buy S1 and sell S2
buyR[buy!=0] = S1[buy!=0]
sellR[buy!=0] = S2[buy!=0]
# When selling the ratio, sell S1 and buy S2 
buyR[sell!=0] = S2[sell!=0]
sellR[sell!=0] = S1[sell!=0]
buyR[60:].plot(color='g', linestyle='None', marker='^')
sellR[60:].plot(color='r', linestyle='None', marker='^')
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,min(S1.min(),S2.min())*0.95,max(S1.max(),S2.max())*1.05))
plt.legend(['华夏银行','北京银行', 'Buy Signal', 'Sell Signal'])
plt.show()

def trade(S1, S2, window1, window2):
    # If window length is 0, algorithm doesn't make sense, so exit
    if (window1 == 0) or (window2 == 0):
        return 0
    # Compute rolling mean and rolling standard deviation
    ratios = S1/S2
    ma1 = ratios.rolling(window=window1,center=False).mean()
    ma2 = ratios.rolling(window=window2,center=False).mean()
    std = ratios.rolling(window=window2,center=False).std()
    zscore = (ma1 - ma2)/std
    # Simulate trading
    # Start with no money and no positions
    money = 0
    countS1 = 0
    countS2 = 0
    for i in range(len(ratios)):
        # Sell short if the z-score is > 1
        if zscore[i] > 1:
            money += S1[i] - S2[i] * ratios[i]
            countS1 -= 1
            countS2 += ratios[i]
        # Buy long if the z-score is < 1
        elif zscore[i] < -1:
            money -= S1[i] - S2[i] * ratios[i]
            countS1 += 1
            countS2 -= ratios[i]
        # Clear positions if the z-score between -.5 and .5
        elif abs(zscore[i]) < 0.5:
            money += countS1*S1[i] + S2[i] * countS2
            countS1 = 0
            countS2 = 0
#         print('Z-score: '+ str(zscore[i]), countS1, countS2, S1[i] , S2[i])
    return money
trade(data['600015.XSHG'].iloc[:le], data['601169.XSHG'].iloc[:le], 5, 60)

trade(data['600015.XSHG'].iloc[le:], data['601169.XSHG'].iloc[le:], 5, 60)

# Find the window length 0-254 
# that gives the highest returns using this strategy
length_scores = [trade(data['600015.XSHG'].iloc[:le], 
                data['601169.XSHG'].iloc[:le], 5, l) 
                for l in range(255)]
best_length = np.argmax(length_scores)
print ('Best window length:', best_length)

# Find the returns for test data
# using what we think is the best window length
length_scores2 = [trade(data['600015.XSHG'].iloc[le:], 
                  data['601169.XSHG'].iloc[le:],5, l) 
                  for l in range(255)]
# 用上述计算得出的best_length来计算length_scores2
print (best_length, 'day window:', length_scores2[best_length])
# Find the best window length based on this dataset, 
# and the returns using this window length
best_length2 = np.argmax(length_scores2)
print (best_length2, 'day window:', length_scores2[best_length2])

plt.figure(figsize=(15,7))
plt.plot(length_scores)
plt.plot(length_scores2)

1.Explaining the Concept—start by generating two fake securities

2.Cointegration

3. Testing for Cointegration

方法1：按照定义去判断是否具有协整性。

方法2：There is a convenient test that lives in statsmodels.tsa.stattools. We should see a very low p-value, as we've artifically created two series that are as cointegrated as physically possible.

4. How to make a pairs trade?

5. Using Data to find securities that behave like this

6.Simple Strategy:

7. Avoid Overfitting