대감집

[FinRL] CH4. 예제 코드(2) 본문

퀀트 투자/FinRL

[FinRL] CH4. 예제 코드(2)

SKY-STONE 2024. 1. 14. 20:35

Financial Reinforcement Learning(FinRL) Platform

Contents

  1. Installation
  2. Framework Overview
  3. Main Component
    • Dataset
    • Train
    • Backtest
  4. Examples

 

CH4. Examples - FinRL_PortfolioAllocation_NeurIPS_2020

Introduction
  • FinRL_PortfolioAllocation_NeurlPS_2020을 정리하고 실습할 예정
  • PortfolioAllocation은 StockTrading에서 아래 두가지 데이터를 추가해서 실험한 결과임
    • return_lookback: 단일 종목에서 1년간 종가 변화율(시간에 따른 관계성 추가)
    • covs: 다른 주식간 종가 공분산(공간에 따른 관계성 추가)
  • 코드에서 발생하는 Module Error 및 데이터 최신화를 위해 코드를 다소 수정 하였음
    • 수정 1: pandas module error로 pandas version == 1.5.3으로 변경
    • 수정 2: Training/Test 기간  2008/01/01~2021/10/31 에서 2008/01/01~2023/12/31로 변경
    • 수정 3: End Results를 보다 쉽게 비교/분석을위해 최종 결과 데이터들 Merge 진행
  • Baseline(DJI)보다 FinRL이 항상 높은 수익률을 보이며 편차가 비슷해짐(수익률도 낮아짐)
    • Baseline(DJI) 연평균 수익률/Sharp Ratio: 13.64% / 1.19
    • FinRL-A2C 연평균 수익률/Sharp Ratio: 17.01% / 1.39
    • FinRL-DDPG 연평균 수익률/Sharp Ratio: 16.33% / 1.43
    • FinRL-PPO 연평균 수익률/Sharp Ratio: 16.01% / 1.35
    • FinRL-SAC 연평균 수익률/Sharp Ratio: 14.24% / 1.20
    • FinRL-TD3 연평균 수익률/Sharp Ratio: 15.76% / 1.34

 

GitHub - AI4Finance-Foundation/FinRL-Tutorials: Tutorials. Please star.

Tutorials. Please star. . Contribute to AI4Finance-Foundation/FinRL-Tutorials development by creating an account on GitHub.

github.com

 

FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price,

arxiv.org


FinRL_PortfolioAllocation_NeurIPS_2020

1.1. Import Packages
import os
import sys
import pandas as pd
import numpy as np
import itertools
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)

from finrl import config
from finrl import config_tickers
from finrl.meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts
from pypfopt.efficient_frontier import EfficientFrontier

import pyfolio
import plotly.graph_objs as go
from pyfolio import timeseries
from stable_baselines3 import A2C, DDPG, PPO, SAC, TD3
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split

 

1.2. Create Folders
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

 

2. Download Data
print(f"DOW_30_TICKER: {config_tickers.DOW_30_TICKER}")

TRAIN_START_DATE = '2008-01-01'
TRAIN_END_DATE = '2022-12-31'
TRADE_START_DATE = '2023-01-01'
TRADE_END_DATE = '2023-12-31'

df_raw = YahooDownloader(start_date = TRAIN_START_DATE, end_date = TRADE_END_DATE, ticker_list = config_tickers.DOW_30_TICKER).fetch_data()
fe = FeatureEngineer(use_technical_indicator=True,
                     tech_indicator_list = config.INDICATORS,
                     use_vix=False,
                     use_turbulence=False,
                     user_defined_feature = False)

processed = fe.preprocess_data(df_raw)
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

processed_full.to_csv("./datasets/dataset_indicators.csv", index=False)
print("Shape of DataFrame: ", processed_full.shape)
print(processed_full.head())

 

Shape of DataFrame & Dataframe Head()

 

3. Preprocess Data ★중요: 공간 관계성(Covariance)와 시간 관계성(lookback return) 데이터 추가 
# add covariance matrix as states
df = pd.read_csv("./datasets/dataset_indicators.csv")
df=df.sort_values(['date','tic'], ignore_index=True)
df.index = df.date.factorize()[0]

# look back is one year
# add covariance matrix as states
lookback=252
cov_list = []
return_list = []
for i in range(lookback, len(df.index.unique())):
  data_lookback = df.loc[i-lookback:i,:]
  price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)
  covs = return_lookback.cov().values 
  cov_list.append(covs)
  
df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list,'return_list':return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)
print("Shape of DataFrame: ", df.shape)
print(df.head())

 

Shape of DataFrame & Dataframe Head()

 

4. Design Environment
# Environment for Portfolio Allocation
df_train = data_split(df, "2008-01-01", "2022-12-31")
stock_dimension = len(df_train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "transaction_cost_pct": 0.001,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": config.INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4,
}

e_train_gym = StockPortfolioEnv(df=df_train, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()

 

Stock Dimension & State Space

5. Implement DRL Algorithms
# Model 1: A2C
agent = DRLAgent(env=env_train)
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c", model_kwargs=A2C_PARAMS)
trained_a2c = agent.train_model(model=model_a2c, tb_log_name="a2c", total_timesteps=50000)
trained_a2c.save("./trained_models/trained_a2c.zip")

# Model 2: PPO
agent = DRLAgent(env=env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo", model_kwargs=PPO_PARAMS)
trained_ppo = agent.train_model(model=model_ppo, tb_log_name="ppo", total_timesteps=80000)
trained_ppo.save("./trained_models/trained_ppo.zip")

# Model 3: DDPG
agent = DRLAgent(env=env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}
model_ddpg = agent.get_model("ddpg", model_kwargs=DDPG_PARAMS)
trained_ddpg = agent.train_model(model=model_ddpg, tb_log_name="ddpg", total_timesteps=50000)
trained_ddpg.save("./trained_models/trained_ddpg.zip")

# Model 4: SAC
agent = DRLAgent(env=env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 100000,
    "learning_rate": 0.0003,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}
model_sac = agent.get_model("sac", model_kwargs=SAC_PARAMS)
trained_sac = agent.train_model(model=model_sac, tb_log_name="sac", total_timesteps=50000)
trained_sac.save("./trained_models/trained_sac.zip")

# Model 5: TD3
agent = DRLAgent(env=env_train)
TD3_PARAMS = {"batch_size": 100, "buffer_size": 1000000, "learning_rate": 0.001}
model_td3 = agent.get_model("td3", model_kwargs=TD3_PARAMS)
trained_td3 = agent.train_model(model=model_td3, tb_log_name="td3", total_timesteps=30000)
trained_td3.save("./trained_models/trained_td3.zip")

 

Training Log

 

6. Test
df_test = data_split(df, "2023-01-01", "2023-12-31")
print("Shape of Trade DataFrame: ", df_test.shape)

env_portfolio_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "transaction_cost_pct": 0.001,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": config.INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4,
}

e_trade_gym = StockPortfolioEnv(df=df_test, **env_portfolio_kwargs)

a2c_agent = A2C.load(config.TRAINED_MODEL_DIR + "/trained_a2c")
ddpg_agent = DDPG.load(config.TRAINED_MODEL_DIR + "/trained_ddpg")
ppo_agent = PPO.load(config.TRAINED_MODEL_DIR + "/trained_ppo")
sac_agent = SAC.load(config.TRAINED_MODEL_DIR + "/trained_sac")
td3_agent = TD3.load(config.TRAINED_MODEL_DIR + "/trained_td3")

a2c_daily_return, a2c_actions = DRLAgent.DRL_prediction(model=a2c_agent, environment=e_trade_gym)
ddpg_daily_return, ddpg_actions = DRLAgent.DRL_prediction(model=ddpg_agent, environment=e_trade_gym)
ppo_daily_return, ppo_actions = DRLAgent.DRL_prediction(model=ppo_agent, environment=e_trade_gym)
sac_daily_return, sac_actions = DRLAgent.DRL_prediction(model=sac_agent, environment=e_trade_gym)
td3_daily_return, td3_actions = DRLAgent.DRL_prediction(model=td3_agent, environment=e_trade_gym)


DJI_df = get_baseline(ticker="^DJI", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
GSPC_df = get_baseline(ticker="^GSPC", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
KS11_df = get_baseline(ticker="^KS11", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
KQ11_df = get_baseline(ticker="^KQ11", start=a2c_daily_return.loc[0, "date"], end=a2c_daily_return.loc[len(a2c_daily_return) - 1, "date"])
DJI_returns = get_daily_return(DJI_df, value_col_name="close")
GSPC_returns = get_daily_return(GSPC_df, value_col_name="close")
KS11_returns = get_daily_return(KS11_df, value_col_name="close")
KQ11_returns = get_daily_return(KQ11_df, value_col_name="close")

a2c_daily_return.to_csv("./results/a2c_daily_return.csv")
ddpg_daily_return.to_csv("./results/ddpg_daily_return.csv")
ppo_daily_return.to_csv("./results/ppo_daily_return.csv")
sac_daily_return.to_csv("./results/sac_daily_return.csv")
td3_daily_return.to_csv("./results/td3_daily_return.csv")

a2c_actions.to_csv("./results/s2c_actions.csv")
ddpg_actions.to_csv("./results/ddpg_actions.csv")
ppo_actions.to_csv("./results/ppo_actions.csv")
sac_actions.to_csv("./results/sac_actions.csv")
td3_actions.to_csv("./results/td3_actions.csv")

 

7.1. BackTest-Stats 
A2C_strat = convert_daily_return_to_pyfolio_ts(a2c_daily_return)
ddpg_strat = convert_daily_return_to_pyfolio_ts(ddpg_daily_return)
ppo_strat = convert_daily_return_to_pyfolio_ts(ppo_daily_return)
sac_strat = convert_daily_return_to_pyfolio_ts(sac_daily_return)
td3_strat = convert_daily_return_to_pyfolio_ts(td3_daily_return)

perf_func = timeseries.perf_stats
A2C_stats = perf_func(returns=A2C_strat, factor_returns=A2C_strat,positions=None, transactions=None, turnover_denom="AGB",)
DDPG_stats = perf_func(returns=ddpg_strat, factor_returns=ddpg_strat,positions=None, transactions=None, turnover_denom="AGB",)
PPO_stats = perf_func(returns=ppo_strat, factor_returns=ppo_strat,positions=None, transactions=None, turnover_denom="AGB",)
SAC_stats = perf_func(returns=sac_strat, factor_returns=sac_strat,positions=None, transactions=None, turnover_denom="AGB",)
TD3_stats = perf_func(returns=td3_strat, factor_returns=td3_strat,positions=None, transactions=None, turnover_denom="AGB",)
DJI_stats = backtest_stats(DJI_df, value_col_name="close")
GSPC_stats = backtest_stats(GSPC_df, value_col_name="close")
KS11_stats = backtest_stats(KS11_df, value_col_name="close")
KQ11_stats = backtest_stats(KQ11_df, value_col_name="close")

print("==============DRL Strategy Stats===========")
DRL_stats_all = pd.DataFrame()
DRL_stats_all = pd.concat([DRL_stats_all, A2C_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, DDPG_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, PPO_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, SAC_stats], axis=1)
DRL_stats_all = pd.concat([DRL_stats_all, TD3_stats], axis=1)
DRL_stats_all.columns = ["A2C", "DDPG", "PPO", "SAC", "TD3"]
print(DRL_stats_all)

# baseline stats
print("==============Get Baseline Stats===========")
Baseline_stats_all = pd.DataFrame()
Baseline_stats_all = pd.concat([Baseline_stats_all, DJI_stats], axis=1)
Baseline_stats_all = pd.concat([Baseline_stats_all, GSPC_stats], axis=1)
Baseline_stats_all = pd.concat([Baseline_stats_all, KS11_stats], axis=1)
Baseline_stats_all = pd.concat([Baseline_stats_all, KQ11_stats], axis=1)
Baseline_stats_all.columns = ["DJI", "S&P500", "KOSPI", "KOSDAQ"]
print(Baseline_stats_all)

DRL Strategy Stats
Baseline Stats

 

7.2. BackTest-Plot
# Min-Variance Portfolio Allocation
unique_tic = df_test.tic.unique()
unique_trade_date = df_test.date.unique()

# Calculate_portfolio_minimum_variance
portfolio = pd.DataFrame(index=range(1), columns=unique_trade_date)
initial_capital = 1000000
portfolio.loc[0, unique_trade_date[0]] = initial_capital
for i in range(len(unique_trade_date) - 1):
    df_temp = df[df.date == unique_trade_date[i]].reset_index(drop=True)
    df_temp_next = df[df.date == unique_trade_date[i + 1]].reset_index(drop=True)
    # calculate covariance matrix
    Sigma = df_temp.return_list[0].cov()
    # portfolio allocation
    ef_min_var = EfficientFrontier(None, Sigma, weight_bounds=(0, 0.1))
    # minimum variance
    raw_weights_min_var = ef_min_var.min_volatility()
    # get weights
    cleaned_weights_min_var = ef_min_var.clean_weights()
    # current capital
    cap = portfolio.iloc[0, i]
    # current cash invested for each stock
    current_cash = [element * cap for element in list(cleaned_weights_min_var.values())]
    # current held shares
    current_shares = list(np.array(current_cash) / np.array(df_temp.close))
    # next time period price
    next_price = np.array(df_temp_next.close)
    ##next_price * current share to calculate next total account value
    portfolio.iloc[0, i + 1] = np.dot(current_shares, next_price)

portfolio = pd.DataFrame(index=range(1), columns=unique_trade_date)
portfolio = portfolio.T
portfolio.columns = ["account_value"]
print(a2c_daily_return)
a2c_cumpod = (a2c_daily_return.daily_return + 1).cumprod() - 1
ddpg_cumpod = (ddpg_daily_return.daily_return + 1).cumprod() - 1
ppo_cumpod = (ppo_daily_return.daily_return + 1).cumprod() - 1
sac_cumpod = (sac_daily_return.daily_return + 1).cumprod() - 1
td3_cumpod = (td3_daily_return.daily_return + 1).cumprod() - 1
min_var_cumpod = (portfolio.account_value.pct_change() + 1).cumprod() - 1
dji_cumpod = (DJI_returns + 1).cumprod() - 1
GSPC_cumpod = (GSPC_returns + 1).cumprod() - 1
KS11_cumpod = (KS11_returns + 1).cumprod() - 1
KQ11_cumpod = (KQ11_returns + 1).cumprod() - 1

# Plotly: DRL, Min-Variance, DJIA
time_ind = pd.Series(a2c_daily_return.date)
trace0_portfolio = go.Scatter(x=time_ind, y=a2c_cumpod, mode="lines", name="A2C (Portfolio Allocation)")
trace1_portfolio = go.Scatter(x=time_ind, y=ddpg_cumpod, mode="lines", name="DDPG (Portfolio Allocation)")
trace2_portfolio = go.Scatter(x=time_ind, y=ppo_cumpod, mode="lines", name="PPO (Portfolio Allocation)")
trace3_portfolio = go.Scatter(x=time_ind, y=sac_cumpod, mode="lines", name="SAC (Portfolio Allocation)")
trace4_portfolio = go.Scatter(x=time_ind, y=td3_cumpod, mode="lines", name="TD3 (Portfolio Allocation)")
trace5_portfolio = go.Scatter(x=time_ind, y=min_var_cumpod, mode="lines", name="Min-Variance")
trace6_portfolio = go.Scatter(x=time_ind, y=dji_cumpod, mode="lines", name="DJIA")
# trace7_portfolio = go.Scatter(x=time_ind, y=GSPC_cumpod, mode="lines", name="S&P500")
# trace8_portfolio = go.Scatter(x=time_ind, y=KS11_cumpod, mode="lines", name="KOSPI")
# trace9_portfolio = go.Scatter(x=time_ind, y=KQ11_cumpod, mode="lines", name="KOSDAQ")

fig = go.Figure()
fig.add_trace(trace0_portfolio)
fig.add_trace(trace1_portfolio)
fig.add_trace(trace2_portfolio)
fig.add_trace(trace3_portfolio)
fig.add_trace(trace4_portfolio)
fig.add_trace(trace5_portfolio)
fig.add_trace(trace6_portfolio)
# fig.add_trace(trace7_portfolio)
# fig.add_trace(trace8_portfolio)
# fig.add_trace(trace9_portfolio)

fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(family="sans-serif", size=10, color="black"),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2,
    ),
)

fig.update_layout(
    title={
        #'text': "Cumulative Return using FinRL",
        "y": 0.85,
        "x": 0.5,
        "xanchor": "center",
        "yanchor": "top",
    }
)

# with Transaction cost
fig.update_layout(
    #    margin=dict(l=20, r=20, t=20, b=20),
    paper_bgcolor="rgba(1,1,0,0)",
    plot_bgcolor="rgba(1, 1, 0, 0)",
    # xaxis_title="Date",
    yaxis_title="Cumulative Return",
    xaxis={
        "type": "date",
        "tick0": time_ind[0],
        "tickmode": "linear",
        "dtick": 86400000.0 * 80,
    },
)
fig.update_xaxes(
    showline=True,
    linecolor="black",
    showgrid=True,
    gridwidth=1,
    gridcolor="LightSteelBlue",
    mirror=True,
)
fig.update_yaxes(
    showline=True,
    linecolor="black",
    showgrid=True,
    gridwidth=1,
    gridcolor="LightSteelBlue",
    mirror=True,
)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor="LightSteelBlue")

fig.write_image("images/all_PortfolioAllocation.webp")
fig.write_image("images/all_PortfolioAllocation.pdf")
fig.show()

End Result