대감집
[FinRL] CH4. 예제 코드(2) 본문
Contents
- Installation
- Framework Overview
- Main Component
- Dataset
- Train
- Backtest
- Examples
CH4. Examples - FinRL_PortfolioAllocation_NeurIPS_2020
Introduction
- FinRL_PortfolioAllocation_NeurlPS_2020을 정리하고 실습할 예정
- PortfolioAllocation은 StockTrading에서 아래 두가지 데이터를 추가해서 실험한 결과임
- return_lookback: 단일 종목에서 1년간 종가 변화율(시간에 따른 관계성 추가)
- covs: 다른 주식간 종가 공분산(공간에 따른 관계성 추가)
- 코드에서 발생하는 Module Error 및 데이터 최신화를 위해 코드를 다소 수정 하였음
- 수정 1: pandas module error로 pandas version == 1.5.3으로 변경
- 수정 2: Training/Test 기간 2008/01/01~2021/10/31 에서 2008/01/01~2023/12/31로 변경
- 수정 3: End Results를 보다 쉽게 비교/분석을위해 최종 결과 데이터들 Merge 진행
- Baseline(DJI)보다 FinRL이 항상 높은 수익률을 보이며 편차가 비슷해짐(수익률도 낮아짐)
- Baseline(DJI) 연평균 수익률/Sharp Ratio: 13.64% / 1.19
- FinRL-A2C 연평균 수익률/Sharp Ratio: 17.01% / 1.39
- FinRL-DDPG 연평균 수익률/Sharp Ratio: 16.33% / 1.43
- FinRL-PPO 연평균 수익률/Sharp Ratio: 16.01% / 1.35
- FinRL-SAC 연평균 수익률/Sharp Ratio: 14.24% / 1.20
- FinRL-TD3 연평균 수익률/Sharp Ratio: 15.76% / 1.34
GitHub - AI4Finance-Foundation/FinRL-Tutorials: Tutorials. Please star.
Tutorials. Please star. . Contribute to AI4Finance-Foundation/FinRL-Tutorials development by creating an account on GitHub.
github.com
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price,
arxiv.org
FinRL_PortfolioAllocation_NeurIPS_2020
1.1. Import Packages
import os
import sys
import pandas as pd
import numpy as np
import itertools
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
from finrl import config
from finrl import config_tickers
from finrl.meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts
from pypfopt.efficient_frontier import EfficientFrontier
import pyfolio
import plotly.graph_objs as go
from pyfolio import timeseries
from stable_baselines3 import A2C, DDPG, PPO, SAC, TD3
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
1.2. Create Folders
if not os.path.exists("./" + config.DATA_SAVE_DIR):
os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
os.makedirs("./" + config.RESULTS_DIR)
2. Download Data
print(f"DOW_30_TICKER: {config_tickers.DOW_30_TICKER}")
TRAIN_START_DATE = '2008-01-01'
TRAIN_END_DATE = '2022-12-31'
TRADE_START_DATE = '2023-01-01'
TRADE_END_DATE = '2023-12-31'
df_raw = YahooDownloader(start_date = TRAIN_START_DATE, end_date = TRADE_END_DATE, ticker_list = config_tickers.DOW_30_TICKER).fetch_data()
fe = FeatureEngineer(use_technical_indicator=True,
tech_indicator_list = config.INDICATORS,
use_vix=False,
use_turbulence=False,
user_defined_feature = False)
processed = fe.preprocess_data(df_raw)
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))
processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])
processed_full = processed_full.fillna(0)
processed_full.to_csv("./datasets/dataset_indicators.csv", index=False)
print("Shape of DataFrame: ", processed_full.shape)
print(processed_full.head())
3. Preprocess Data ★중요: 공간 관계성(Covariance)와 시간 관계성(lookback return) 데이터 추가
# add covariance matrix as states
df = pd.read_csv("./datasets/dataset_indicators.csv")
df=df.sort_values(['date','tic'], ignore_index=True)
df.index = df.date.factorize()[0]
# look back is one year
# add covariance matrix as states
lookback=252
cov_list = []
return_list = []
for i in range(lookback, len(df.index.unique())):
data_lookback = df.loc[i-lookback:i,:]
price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
return_lookback = price_lookback.pct_change().dropna()
return_list.append(return_lookback)
covs = return_lookback.cov().values
cov_list.append(covs)
df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list,'return_list':return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)
print("Shape of DataFrame: ", df.shape)
print(df.head())
4. Design Environment
# Environment for Portfolio Allocation
df_train = data_split(df, "2008-01-01", "2022-12-31")
stock_dimension = len(df_train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")
env_kwargs = {
"hmax": 100,
"initial_amount": 1000000,
"transaction_cost_pct": 0.001,
"state_space": state_space,
"stock_dim": stock_dimension,
"tech_indicator_list": config.INDICATORS,
"action_space": stock_dimension,
"reward_scaling": 1e-4,
}
e_train_gym = StockPortfolioEnv(df=df_train, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()
5. Implement DRL Algorithms
# Model 1: A2C
agent = DRLAgent(env=env_train)
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c", model_kwargs=A2C_PARAMS)
trained_a2c = agent.train_model(model=model_a2c, tb_log_name="a2c", total_timesteps=50000)
trained_a2c.save("./trained_models/trained_a2c.zip")
# Model 2: PPO
agent = DRLAgent(env=env_train)
PPO_PARAMS = {
"n_steps": 2048,
"ent_coef": 0.005,
"learning_rate": 0.0001,
"batch_size": 128,
}
model_ppo = agent.get_model("ppo", model_kwargs=PPO_PARAMS)
trained_ppo = agent.train_model(model=model_ppo, tb_log_name="ppo", total_timesteps=80000)
trained_ppo.save("./trained_models/trained_ppo.zip")
# Model 3: DDPG
agent = DRLAgent(env=env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}
model_ddpg = agent.get_model("ddpg", model_kwargs=DDPG_PARAMS)
trained_ddpg = agent.train_model(model=model_ddpg, tb_log_name="ddpg", total_timesteps=50000)
trained_ddpg.save("./trained_models/trained_ddpg.zip")
# Model 4: SAC
agent = DRLAgent(env=env_train)
SAC_PARAMS = {
"batch_size": 128,
"buffer_size": 100000,
"learning_rate": 0.0003,
"learning_starts": 100,
"ent_coef": "auto_0.1",
}
model_sac = agent.get_model("sac", model_kwargs=SAC_PARAMS)
trained_sac = agent.train_model(model=model_sac, tb_log_name="sac", total_timesteps=50000)
trained_sac.save("./trained_models/trained_sac.zip")
# Model 5: TD3
agent = DRLAgent(env=env_train)
TD3_PARAMS = {"batch_size": 100, "buffer_size": 1000000, "learning_rate": 0.001}
model_td3 = agent.get_model("td3", model_kwargs=TD3_PARAMS)
trained_td3 = agent.train_model(model=model_td3, tb_log_name="td3", total_timesteps=30000)
trained_td3.save("./trained_models/trained_td3.zip")
6. Test
df_test = data_split(df, "2023-01-01", "2023-12-31")
print("Shape of Trade DataFrame: ", df_test.shape)
env_portfolio_kwargs = {
"hmax": 100,
"initial_amount": 1000000,
"transaction_cost_pct": 0.001,
"state_space": state_space,
"stock_dim": stock_dimension,
"tech_indicator_list": config.INDICATORS,
"action_space": stock_dimension,
"reward_scaling": 1e-4,
}
e_trade_gym = StockPortfolioEnv(df=df_test, **env_portfolio_kwargs)
a2c_agent = A2C.load(config.TRAINED_MODEL_DIR + "/trained_a2c")
ddpg_agent = DDPG.load(config.TRAINED_MODEL_DIR + "/trained_ddpg")
ppo_agent = PPO.load(config.TRAINED_MODEL_DIR + "/trained_ppo")
sac_agent = SAC.load(config.TRAINED_MODEL_DIR + "/trained_sac")
td3_agent = TD3.load(config.TRAINED_MODEL_DIR + "/trained_td3")
a2c_daily_return, a2c_actions = DRLAgent.DRL_prediction(model=a2c_agent, environment=e_trade_gym)
ddpg_daily_return, ddpg_actions = DRLAgent.DRL_prediction(model=ddpg_agent, environment=e_trade_gym)
ppo_daily_return, ppo_actions = DRLAgent.DRL_prediction(model=ppo_agent, environment=e_trade_gym)
sac_daily_return, sac_actions = DRLAgent.DRL_prediction(model=sac_agent, environment=e_trade_gym)
td3_daily_return, td3_actions = DRLAgent.DRL_prediction(model=td3_agent, environment=e_trade_gym)
DJI_df = get_baseline(ticker="^DJI", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
GSPC_df = get_baseline(ticker="^GSPC", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
KS11_df = get_baseline(ticker="^KS11", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
KQ11_df = get_baseline(ticker="^KQ11", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
DJI_returns = get_daily_return(DJI_df, value_col_name="close")
GSPC_returns = get_daily_return(GSPC_df, value_col_name="close")
KS11_returns = get_daily_return(KS11_df, value_col_name="close")
KQ11_returns = get_daily_return(KQ11_df, value_col_name="close")
a2c_daily_return.to_csv("./results/a2c_daily_return.csv")
ddpg_daily_return.to_csv("./results/ddpg_daily_return.csv")
ppo_daily_return.to_csv("./results/ppo_daily_return.csv")
sac_daily_return.to_csv("./results/sac_daily_return.csv")
td3_daily_return.to_csv("./results/td3_daily_return.csv")
a2c_actions.to_csv("./results/s2c_actions.csv")
ddpg_actions.to_csv("./results/ddpg_actions.csv")
ppo_actions.to_csv("./results/ppo_actions.csv")
sac_actions.to_csv("./results/sac_actions.csv")
td3_actions.to_csv("./results/td3_actions.csv")
7.1. BackTest-Stats
A2C_strat = convert_daily_return_to_pyfolio_ts(a2c_daily_return)
ddpg_strat = convert_daily_return_to_pyfolio_ts(ddpg_daily_return)
ppo_strat = convert_daily_return_to_pyfolio_ts(ppo_daily_return)
sac_strat = convert_daily_return_to_pyfolio_ts(sac_daily_return)
td3_strat = convert_daily_return_to_pyfolio_ts(td3_daily_return)
perf_func = timeseries.perf_stats
A2C_stats = perf_func(returns=A2C_strat, factor_returns=A2C_strat,positions=None, transactions=None, turnover_denom="AGB",)
DDPG_stats = perf_func(returns=ddpg_strat, factor_returns=ddpg_strat,positions=None, transactions=None, turnover_denom="AGB",)
PPO_stats = perf_func(returns=ppo_strat, factor_returns=ppo_strat,positions=None, transactions=None, turnover_denom="AGB",)
SAC_stats = perf_func(returns=sac_strat, factor_returns=sac_strat,positions=None, transactions=None, turnover_denom="AGB",)
TD3_stats = perf_func(returns=td3_strat, factor_returns=td3_strat,positions=None, transactions=None, turnover_denom="AGB",)
DJI_stats = backtest_stats(DJI_df, value_col_name="close")
GSPC_stats = backtest_stats(GSPC_df, value_col_name="close")
KS11_stats = backtest_stats(KS11_df, value_col_name="close")
KQ11_stats = backtest_stats(KQ11_df, value_col_name="close")
print("==============DRL Strategy Stats===========")
DRL_stats_all = pd.DataFrame()
DRL_stats_all = pd.concat([DRL_stats_all, A2C_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, DDPG_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, PPO_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, SAC_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, TD3_stats], axis=1)
DRL_stats_all.columns = ["A2C", "DDPG", "PPO", "SAC", "TD3"]
print(DRL_stats_all)
# baseline stats
print("==============Get Baseline Stats===========")
Baseline_stats_all = pd.DataFrame()
Baseline_stats_all = pd.concat([Baseline_stats_all, DJI_stats], axis=1)
Baseline_stats_all = pd.concat([Baseline_stats_all, GSPC_stats], axis=1)
Baseline_stats_all = pd.concat([Baseline_stats_all, KS11_stats], axis=1)
Baseline_stats_all = pd.concat([Baseline_stats_all, KQ11_stats], axis=1)
Baseline_stats_all.columns = ["DJI", "S&P500", "KOSPI", "KOSDAQ"]
print(Baseline_stats_all)
7.2. BackTest-Plot
# Min-Variance Portfolio Allocation
unique_tic = df_test.tic.unique()
unique_trade_date = df_test.date.unique()
# Calculate_portfolio_minimum_variance
portfolio = pd.DataFrame(index=range(1), columns=unique_trade_date)
initial_capital = 1000000
portfolio.loc[0, unique_trade_date[0]] = initial_capital
for i in range(len(unique_trade_date) - 1):
df_temp = df[df.date == unique_trade_date[i]].reset_index(drop=True)
df_temp_next = df[df.date == unique_trade_date[i + 1]].reset_index(drop=True)
# calculate covariance matrix
Sigma = df_temp.return_list[0].cov()
# portfolio allocation
ef_min_var = EfficientFrontier(None, Sigma, weight_bounds=(0, 0.1))
# minimum variance
raw_weights_min_var = ef_min_var.min_volatility()
# get weights
cleaned_weights_min_var = ef_min_var.clean_weights()
# current capital
cap = portfolio.iloc[0, i]
# current cash invested for each stock
current_cash = [element * cap for element in list(cleaned_weights_min_var.values())]
# current held shares
current_shares = list(np.array(current_cash) / np.array(df_temp.close))
# next time period price
next_price = np.array(df_temp_next.close)
##next_price * current share to calculate next total account value
portfolio.iloc[0, i + 1] = np.dot(current_shares, next_price)
portfolio = pd.DataFrame(index=range(1), columns=unique_trade_date)
portfolio = portfolio.T
portfolio.columns = ["account_value"]
print(a2c_daily_return)
a2c_cumpod = (a2c_daily_return.daily_return + 1).cumprod() - 1
ddpg_cumpod = (ddpg_daily_return.daily_return + 1).cumprod() - 1
ppo_cumpod = (ppo_daily_return.daily_return + 1).cumprod() - 1
sac_cumpod = (sac_daily_return.daily_return + 1).cumprod() - 1
td3_cumpod = (td3_daily_return.daily_return + 1).cumprod() - 1
min_var_cumpod = (portfolio.account_value.pct_change() + 1).cumprod() - 1
dji_cumpod = (DJI_returns + 1).cumprod() - 1
GSPC_cumpod = (GSPC_returns + 1).cumprod() - 1
KS11_cumpod = (KS11_returns + 1).cumprod() - 1
KQ11_cumpod = (KQ11_returns + 1).cumprod() - 1
# Plotly: DRL, Min-Variance, DJIA
time_ind = pd.Series(a2c_daily_return.date)
trace0_portfolio = go.Scatter(x=time_ind, y=a2c_cumpod, mode="lines", name="A2C (Portfolio Allocation)")
trace1_portfolio = go.Scatter(x=time_ind, y=ddpg_cumpod, mode="lines", name="DDPG (Portfolio Allocation)")
trace2_portfolio = go.Scatter(x=time_ind, y=ppo_cumpod, mode="lines", name="PPO (Portfolio Allocation)")
trace3_portfolio = go.Scatter(x=time_ind, y=sac_cumpod, mode="lines", name="SAC (Portfolio Allocation)")
trace4_portfolio = go.Scatter(x=time_ind, y=td3_cumpod, mode="lines", name="TD3 (Portfolio Allocation)")
trace5_portfolio = go.Scatter(x=time_ind, y=min_var_cumpod, mode="lines", name="Min-Variance")
trace6_portfolio = go.Scatter(x=time_ind, y=dji_cumpod, mode="lines", name="DJIA")
# trace7_portfolio = go.Scatter(x=time_ind, y=GSPC_cumpod, mode="lines", name="S&P500")
# trace8_portfolio = go.Scatter(x=time_ind, y=KS11_cumpod, mode="lines", name="KOSPI")
# trace9_portfolio = go.Scatter(x=time_ind, y=KQ11_cumpod, mode="lines", name="KOSDAQ")
fig = go.Figure()
fig.add_trace(trace0_portfolio)
fig.add_trace(trace1_portfolio)
fig.add_trace(trace2_portfolio)
fig.add_trace(trace3_portfolio)
fig.add_trace(trace4_portfolio)
fig.add_trace(trace5_portfolio)
fig.add_trace(trace6_portfolio)
# fig.add_trace(trace7_portfolio)
# fig.add_trace(trace8_portfolio)
# fig.add_trace(trace9_portfolio)
fig.update_layout(
legend=dict(
x=0,
y=1,
traceorder="normal",
font=dict(family="sans-serif", size=10, color="black"),
bgcolor="White",
bordercolor="white",
borderwidth=2,
),
)
fig.update_layout(
title={
#'text': "Cumulative Return using FinRL",
"y": 0.85,
"x": 0.5,
"xanchor": "center",
"yanchor": "top",
}
)
# with Transaction cost
fig.update_layout(
# margin=dict(l=20, r=20, t=20, b=20),
paper_bgcolor="rgba(1,1,0,0)",
plot_bgcolor="rgba(1, 1, 0, 0)",
# xaxis_title="Date",
yaxis_title="Cumulative Return",
xaxis={
"type": "date",
"tick0": time_ind[0],
"tickmode": "linear",
"dtick": 86400000.0 * 80,
},
)
fig.update_xaxes(
showline=True,
linecolor="black",
showgrid=True,
gridwidth=1,
gridcolor="LightSteelBlue",
mirror=True,
)
fig.update_yaxes(
showline=True,
linecolor="black",
showgrid=True,
gridwidth=1,
gridcolor="LightSteelBlue",
mirror=True,
)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor="LightSteelBlue")
fig.write_image("images/all_PortfolioAllocation.webp")
fig.write_image("images/all_PortfolioAllocation.pdf")
fig.show()
'퀀트 투자 > FinRL' 카테고리의 다른 글
[FinRL] 가설(1)-2. KOSDAQ 시장 검증 (0) | 2024.01.20 |
---|---|
[FinRL] 가설(1)-1. KOSPI 시장 검증 (2) | 2024.01.14 |
[FinRL] CH4. 예제 코드(1) (0) | 2024.01.14 |
[FinRL] CH4. 예제 Overview (3) | 2024.01.14 |
[FinRL] CH3. 플랫폼 구성-(3)Backtest (1) | 2024.01.11 |