대감집
[FinRL] CH3. 플랫폼 구성-(2)Train 본문
Contents
- Installation
- Framework Overview
- Main Component
- Data
- Train
- Backtest
- Examples
CH3. Main Component - (2)Train
Introduction
- CH3-(1)에서 가공된 데이터를 로드하고 강화학습 진행
- API 로드(Part1): 각종 코드에서 사용될 API를 설치하고 Import
- 데이터 로드(Part2): CH3-(1)에서 가공된 데이터를 로드하고 강화학습 환경 세팅
- 에이전트 학습(Part3): 환경에 따라서 각종 에이전트를 학습 진행
GitHub - AI4Finance-Foundation/FinRL-Tutorials: Tutorials. Please star.
Tutorials. Please star. . Contribute to AI4Finance-Foundation/FinRL-Tutorials development by creating an account on GitHub.
github.com
Stock NeurIPS2018 Part 2. Train¶
This series is a reproduction of the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading.
This is the second part of the NeurIPS2018 series, introducing how to use FinRL to make data into the gym form environment, and train DRL agents on it.
Other demos can be found at the repo of FinRL-Tutorials).
Part 1. Install Packages¶
## install required packages
!pip install swig
!pip install wrds
!pip install pyportfolioopt
## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
import os
import pandas as pd
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from stable_baselines3.common.logger import configure
from finrl import config_tickers
from finrl.main import check_and_make_directories
from finrl.config import INDICATORS, TRAINED_MODEL_DIR, RESULTS_DIR
check_and_make_directories([TRAINED_MODEL_DIR])
Part 2. Build A Market Environment in OpenAI Gym-style¶
The core element in reinforcement learning are agent and environment. You can understand RL as the following process:
The agent is active in a world, which is the environment. It observe its current condition as a state, and is allowed to do certain actions. After the agent execute an action, it will arrive at a new state. At the same time, the environment will have feedback to the agent called reward, a numerical signal that tells how good or bad the new state is. As the figure above, agent and environment will keep doing this interaction.
The goal of agent is to get as much cumulative reward as possible. Reinforcement learning is the method that agent learns to improve its behavior and achieve that goal.
To achieve this in Python, we follow the OpenAI gym style to build the stock data into environment.
state-action-reward are specified as follows:
State s: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes the price data and technical indicators based on the past data. It will learn by interacting with the market environment (usually by replaying historical data).
Action a: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively
Reward function r(s, a, s′): Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively
Market environment: 30 constituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.
Read data¶
We first read the .csv file of our training data into dataframe.
train = pd.read_csv('train_data.csv')
# If you are not using the data generated from part 1 of this tutorial, make sure
# it has the columns and index in the form that could be make into the environment.
# Then you can comment and skip the following two lines.
train = train.set_index(train.columns[0])
train.index.names = ['']
Construct the environment¶
Calculate and specify the parameters we need for constructing the environment.
stock_dimension = len(train.tic.unique())
state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")
Stock Dimension: 29, State Space: 291
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension
env_kwargs = {
"hmax": 100,
"initial_amount": 1000000,
"num_stock_shares": num_stock_shares,
"buy_cost_pct": buy_cost_list,
"sell_cost_pct": sell_cost_list,
"state_space": state_space,
"stock_dim": stock_dimension,
"tech_indicator_list": INDICATORS,
"action_space": stock_dimension,
"reward_scaling": 1e-4
}
e_train_gym = StockTradingEnv(df = train, **env_kwargs)
Environment for training¶
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))
<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>
Part 3: Train DRL Agents¶
- Here, the DRL algorithms are from Stable Baselines 3. It's a library that implemented popular DRL algorithms using pytorch, succeeding to its old version: Stable Baselines.
- Users are also encouraged to try ElegantRL and Ray RLlib.
agent = DRLAgent(env = env_train)
# Set the corresponding values to 'True' for the algorithms that you want to use
if_using_a2c = True
if_using_ddpg = True
if_using_ppo = True
if_using_td3 = True
if_using_sac = True
Agent Training: 5 algorithms (A2C, DDPG, PPO, TD3, SAC)¶
Agent 1: A2C¶
agent = DRLAgent(env = env_train)
model_a2c = agent.get_model("a2c")
if if_using_a2c:
# set up logger
tmp_path = RESULTS_DIR + '/a2c'
new_logger_a2c = configure(tmp_path, ["stdout", "csv", "tensorboard"])
# Set new logger
model_a2c.set_logger(new_logger_a2c)
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007} Using cpu device Logging to results/a2c
trained_a2c = agent.train_model(model=model_a2c,
tb_log_name='a2c',
total_timesteps=50000) if if_using_a2c else None
trained_a2c.save(TRAINED_MODEL_DIR + "/agent_a2c") if if_using_a2c else None
Agent 2: DDPG¶
agent = DRLAgent(env = env_train)
model_ddpg = agent.get_model("ddpg")
if if_using_ddpg:
# set up logger
tmp_path = RESULTS_DIR + '/ddpg'
new_logger_ddpg = configure(tmp_path, ["stdout", "csv", "tensorboard"])
# Set new logger
model_ddpg.set_logger(new_logger_ddpg)
{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001} Using cpu device Logging to results/ddpg
trained_ddpg = agent.train_model(model=model_ddpg,
tb_log_name='ddpg',
total_timesteps=50000) if if_using_ddpg else None
trained_ddpg.save(TRAINED_MODEL_DIR + "/agent_ddpg") if if_using_ddpg else None
Agent 3: PPO¶
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
"n_steps": 2048,
"ent_coef": 0.01,
"learning_rate": 0.00025,
"batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)
if if_using_ppo:
# set up logger
tmp_path = RESULTS_DIR + '/ppo'
new_logger_ppo = configure(tmp_path, ["stdout", "csv", "tensorboard"])
# Set new logger
model_ppo.set_logger(new_logger_ppo)
{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128} Using cpu device Logging to results/ppo
trained_ppo = agent.train_model(model=model_ppo,
tb_log_name='ppo',
total_timesteps=200000) if if_using_ppo else None
trained_ppo.save(TRAINED_MODEL_DIR + "/agent_ppo") if if_using_ppo else None
Agent 4: TD3¶
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100,
"buffer_size": 1000000,
"learning_rate": 0.001}
model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)
if if_using_td3:
# set up logger
tmp_path = RESULTS_DIR + '/td3'
new_logger_td3 = configure(tmp_path, ["stdout", "csv", "tensorboard"])
# Set new logger
model_td3.set_logger(new_logger_td3)
{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001} Using cpu device Logging to results/td3
trained_td3 = agent.train_model(model=model_td3,
tb_log_name='td3',
total_timesteps=50000) if if_using_td3 else None
trained_td3.save(TRAINED_MODEL_DIR + "/agent_td3") if if_using_td3 else None
Agent 5: SAC¶
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
"batch_size": 128,
"buffer_size": 100000,
"learning_rate": 0.0001,
"learning_starts": 100,
"ent_coef": "auto_0.1",
}
model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)
if if_using_sac:
# set up logger
tmp_path = RESULTS_DIR + '/sac'
new_logger_sac = configure(tmp_path, ["stdout", "csv", "tensorboard"])
# Set new logger
model_sac.set_logger(new_logger_sac)
{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'} Using cpu device Logging to results/sac
trained_sac = agent.train_model(model=model_sac,
tb_log_name='sac',
total_timesteps=70000) if if_using_sac else None
trained_sac.save(TRAINED_MODEL_DIR + "/agent_sac") if if_using_sac else None
Save the trained agent¶
Trained agents should have already been saved in the "trained_models" drectory after you run the code blocks above.
For Colab users, the zip files should be at "./trained_models" or "/content/trained_models".
For users running on your local environment, the zip files should be at "./trained_models".
'퀀트 투자 > FinRL' 카테고리의 다른 글
[FinRL] CH4. 예제 Overview (2) | 2024.01.14 |
---|---|
[FinRL] CH3. 플랫폼 구성-(3)Backtest (1) | 2024.01.11 |
[FinRL] CH3. 플랫폼 구성-(1)Data (0) | 2024.01.11 |
[FinRL] CH2. 플랫폼 Overview (0) | 2024.01.11 |
[FinRL] CH1. 설치 (1) | 2024.01.11 |