[FinRL] CH3. 플랫폼 구성-(1)Data
Contents
- Installation
- Framework Overview
- Main Component
- Data
- Train
- Backtest
- Examples
CH3. Main Component - (1)Data
Introduction
- 강화 학습을 위해 yahoo finance API를 통해 주가 데이터를 다운로드 후 데이터 가공을 진행
- API 로드(Part1): 각종 코드에서 사용될 API를 설치하고 Import
- 데이터 다운로드(Part2): yahoo finance API를 통해 OHLCV* 데이터를 다운로드
- 데이터 가공(Part3): OHLCV* 데이터에 MACD, RSI, Turbulence Index를 추가 및 가공
- 데이터 저장(Part4): 가공된 데이터를 CSV 파일로 저장
*OHLCV: Open, High, Low, Close, Volume
GitHub - AI4Finance-Foundation/FinRL-Tutorials: Tutorials. Please star.
Tutorials. Please star. . Contribute to AI4Finance-Foundation/FinRL-Tutorials development by creating an account on GitHub.
github.com
Stock NeurIPS2018 Part 1. Data¶
This series is a reproduction of paper the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading.
This is the first part of the NeurIPS2018 series, introducing how to use FinRL to fetch and process data that we need for ML/RL trading.
Other demos can be found at the repo of FinRL-Tutorials).
Part 1. Install Packages¶
## install required packages
!pip install swig
!pip install wrds
!pip install pyportfolioopt
## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
import pandas as pd
import numpy as np
import datetime
import yfinance as yf
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl import config_tickers
from finrl.config import INDICATORS
import itertools
Part 2. Fetch data¶
yfinance is an open-source library that provides APIs fetching historical data form Yahoo Finance. In FinRL, we have a class called YahooDownloader that use yfinance to fetch data from Yahoo Finance.
OHLCV: Data downloaded are in the form of OHLCV, corresponding to open, high, low, close, volume, respectively. OHLCV is important because they contain most of numerical information of a stock in time series. From OHLCV, traders can get further judgement and prediction like the momentum, people's interest, market trends, etc.
Data for a single ticker¶
Here we provide two ways to fetch data with single ticker, let's take Apple Inc. (AAPL) as an example.
Using yfinance¶
aapl_df_yf = yf.download(tickers = "aapl", start='2023-01-01', end='2023-12-31')
[*********************100%%**********************] 1 of 1 completed
aapl_df_yf.head()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2023-01-03 | 130.279999 | 130.899994 | 124.169998 | 125.070000 | 124.374802 | 112117500 |
2023-01-04 | 126.889999 | 128.660004 | 125.080002 | 126.360001 | 125.657639 | 89113600 |
2023-01-05 | 127.129997 | 127.769997 | 124.760002 | 125.019997 | 124.325081 | 80962700 |
2023-01-06 | 126.010002 | 130.289993 | 124.889999 | 129.619995 | 128.899521 | 87754700 |
2023-01-09 | 130.470001 | 133.410004 | 129.889999 | 130.149994 | 129.426575 | 70790800 |
Using FinRL¶
In FinRL's YahooDownloader, we modified the data frame to the form that convenient for further data processing process. We use adjusted close price instead of close price, and add a column representing the day of a week (0-4 corresponding to Monday-Friday).
aapl_df_finrl = YahooDownloader(start_date = '2023-01-01',
end_date = '2023-12-31',
ticker_list = ['aapl']).fetch_data()
[*********************100%%**********************] 1 of 1 completed
Shape of DataFrame: (250, 8)
aapl_df_finrl.head()
date | open | high | low | close | volume | tic | day | |
---|---|---|---|---|---|---|---|---|
0 | 2023-01-03 | 130.279999 | 130.899994 | 124.169998 | 124.374802 | 112117500 | aapl | 1 |
1 | 2023-01-04 | 126.889999 | 128.660004 | 125.080002 | 125.657639 | 89113600 | aapl | 2 |
2 | 2023-01-05 | 127.129997 | 127.769997 | 124.760002 | 124.325081 | 80962700 | aapl | 3 |
3 | 2023-01-06 | 126.010002 | 130.289993 | 124.889999 | 128.899521 | 87754700 | aapl | 4 |
4 | 2023-01-09 | 130.470001 | 133.410004 | 129.889999 | 129.426575 | 70790800 | aapl | 0 |
Data for the chosen tickers¶
config_tickers.DOW_30_TICKER
['AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WBA', 'WMT', 'DIS', 'DOW']
TRAIN_START_DATE = '2009-01-01'
TRAIN_END_DATE = '2020-07-01'
TRADE_START_DATE = '2020-07-01'
TRADE_END_DATE = '2021-10-29'
df_raw = YahooDownloader(start_date = TRAIN_START_DATE,
end_date = TRADE_END_DATE,
ticker_list = config_tickers.DOW_30_TICKER).fetch_data()
[*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed
Shape of DataFrame: (94301, 8)
df_raw.head()
date | open | high | low | close | volume | tic | day | |
---|---|---|---|---|---|---|---|---|
0 | 2009-01-02 | 3.067143 | 3.251429 | 3.041429 | 2.747390 | 746015200 | AAPL | 4 |
1 | 2009-01-02 | 58.590000 | 59.080002 | 57.750000 | 42.737900 | 6547900 | AMGN | 4 |
2 | 2009-01-02 | 18.570000 | 19.520000 | 18.400000 | 15.144921 | 10955700 | AXP | 4 |
3 | 2009-01-02 | 42.799999 | 45.560001 | 42.779999 | 33.941090 | 7010200 | BA | 4 |
4 | 2009-01-02 | 44.910000 | 46.980000 | 44.709999 | 31.093384 | 7117200 | CAT | 4 |
Part 3: Preprocess Data¶
We need to check for missing data and do feature engineering to convert the data point into a state.
- Adding technical indicators. In practical trading, various information needs to be taken into account, such as historical prices, current holding shares, technical indicators, etc. Here, we demonstrate two trend-following technical indicators: MACD and RSI.
- Adding turbulence index. Risk-aversion reflects whether an investor prefers to protect the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the turbulence index that measures extreme fluctuation of asset price.
Hear let's take MACD as an example. Moving average convergence/divergence (MACD) is one of the most commonly used indicator showing bull and bear market. Its calculation is based on EMA (Exponential Moving Average indicator, measuring trend direction over a period of time.)
fe = FeatureEngineer(use_technical_indicator=True,
tech_indicator_list = INDICATORS,
use_vix=True,
use_turbulence=True,
user_defined_feature = False)
processed = fe.preprocess_data(df_raw)
Successfully added technical indicators
[*********************100%%**********************] 1 of 1 completed
Shape of DataFrame: (3228, 8) Successfully added vix Successfully added turbulence index
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))
processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])
processed_full = processed_full.fillna(0)
processed_full.head()
date | tic | open | high | low | close | volume | day | macd | boll_ub | boll_lb | rsi_30 | cci_30 | dx_30 | close_30_sma | close_60_sma | vix | turbulence | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2009-01-02 | AAPL | 3.067143 | 3.251429 | 3.041429 | 2.747390 | 746015200.0 | 4.0 | 0.0 | 2.969345 | 2.641386 | 100.0 | 66.666667 | 100.0 | 2.747390 | 2.747390 | 39.189999 | 0.0 |
1 | 2009-01-02 | AMGN | 58.590000 | 59.080002 | 57.750000 | 42.737900 | 6547900.0 | 4.0 | 0.0 | 2.969345 | 2.641386 | 100.0 | 66.666667 | 100.0 | 42.737900 | 42.737900 | 39.189999 | 0.0 |
2 | 2009-01-02 | AXP | 18.570000 | 19.520000 | 18.400000 | 15.144921 | 10955700.0 | 4.0 | 0.0 | 2.969345 | 2.641386 | 100.0 | 66.666667 | 100.0 | 15.144921 | 15.144921 | 39.189999 | 0.0 |
3 | 2009-01-02 | BA | 42.799999 | 45.560001 | 42.779999 | 33.941090 | 7010200.0 | 4.0 | 0.0 | 2.969345 | 2.641386 | 100.0 | 66.666667 | 100.0 | 33.941090 | 33.941090 | 39.189999 | 0.0 |
4 | 2009-01-02 | CAT | 44.910000 | 46.980000 | 44.709999 | 31.093384 | 7117200.0 | 4.0 | 0.0 | 2.969345 | 2.641386 | 100.0 | 66.666667 | 100.0 | 31.093384 | 31.093384 | 39.189999 | 0.0 |
Part 4: Save the Data¶
Split the data for training and trading¶
train = data_split(processed_full, TRAIN_START_DATE,TRAIN_END_DATE)
trade = data_split(processed_full, TRADE_START_DATE,TRADE_END_DATE)
print(len(train))
print(len(trade))
83897 9715
Save data to csv file¶
For Colab users, you can open the virtual directory in colab and manually download the files.
For users running on your local environment, the csv files should be at the same directory of this notebook.
train.to_csv('train_data.csv')
trade.to_csv('trade_data.csv')