TensorTrade 强化学习算法交易包
TensorTrade是一个基于强化学习的算法交易Python库,它是开源的,高度灵活,并且可以与我们常用的Python工具无缝协作
我最近发现了一个名为TensorTrade的Python库。如果你曾经了解过强化学习,并且心想:“嗯……挺酷的,但还是别用了”,那么这个库会让你觉得整个主题不再那么令人生畏。它是开源的,高度灵活,并且可以与我们常用的Python工具无缝协作:NumPy、Pandas、Gym、TensorFlow、Keras……你应该明白我的意思了。
总之,让我带你了解一下我使用它时学到的东西。没什么特别的——只是足以让你了解各个部分是如何连接的。
1、获取价格数据
在深入学习强化学习之前,显然你需要一些历史数据。我从 yfinance 获取数据,因为它速度快,而且不需要签署 12 份不同的协议。
以下是我使用的完整代码:
import yfinance
import pandas_ta as ta
TICKER = 'TTRD' # Replace this with whatever ticker you're actually trading
TRAIN_START_DATE = '2021-02-09'
TRAIN_END_DATE = '2021-09-30'
EVAL_START_DATE = '2021-10-01'
EVAL_END_DATE = '2021-11-12'
yf_ticker = yfinance.Ticker(ticker=TICKER)
# Training set
df_training = yf_ticker.history(start=TRAIN_START_DATE, end=TRAIN_END_DATE, interval='60m')
df_training.drop(['Dividends', 'Stock Splits'], axis=1, inplace=True)
df_training["Volume"] = df_training["Volume"].astype(int)
df_training.ta.log_return(append=True, length=16)
df_training.ta.rsi(append=True, length=14)
df_training.ta.macd(append=True, fast=12, slow=26)
df_training.to_csv('training.csv', index=False)
# Evaluation set
df_evaluation = yf_ticker.history(start=EVAL_START_DATE, end=EVAL_END_DATE, interval='60m')
df_evaluation.drop(['Dividends', 'Stock Splits'], axis=1, inplace=True)
df_evaluation["Volume"] = df_evaluation["Volume"].astype(int)
df_evaluation.ta.log_return(append=True, length=16)
df_evaluation.ta.rsi(append=True, length=14)
df_evaluation.ta.macd(append=True, fast=12, slow=26)
df_evaluation.to_csv('evaluation.csv', index=False)这里没什么特别的——只是数据清理和一些指标,这样智能体就不会盲目交易。
2、构建环境
TensorTrade 的功能有点像简化的 Gym 环境。你提供价格数据,它会为你的智能体提供一个沙箱,让它从错误中学习。
以下是环境函数。别担心,它并不复杂:
import pandas as pd
from tensortrade.feed.core import DataFeed, Stream
from tensortrade.oms.instruments import Instrument
from tensortrade.oms.exchanges import Exchange, ExchangeOptions
from tensortrade.oms.services.execution.simulated import execute_order
from tensortrade.oms.wallets import Wallet, Portfolio
import tensortrade.env.default as default
def create_env(config):
dataset = pd.read_csv(config["csv_filename"], parse_dates=['Datetime']).fillna(method='backfill')
commission = 0.0035
price = Stream.source(list(dataset["Close"]), dtype="float").rename("USD-TTRD")
exchange_options = ExchangeOptions(commission=commission)
ttse_exchange = Exchange("TTSE", service=execute_order, options=exchange_options)(price)
USD = Instrument("USD", 2, "US Dollar")
TTRD = Instrument("TTRD", 2, "TensorTrade Corp")
cash = Wallet(ttse_exchange, 1000 * USD)
asset = Wallet(ttse_exchange, TTRD)
portfolio = Portfolio(USD, [cash, asset])
renderer_feed = DataFeed([
Stream.source(list(dataset["Datetime"])).rename("date"),
Stream.source(list(dataset["Open"]), dtype="float").rename("open"),
Stream.source(list(dataset["High"]), dtype="float").rename("high"),
Stream.source(list(dataset["Low"]), dtype="float").rename("low"),
Stream.source(list(dataset["Close"]), dtype="float").rename("close"),
Stream.source(list(dataset["Volume"]), dtype="float").rename("volume")
])
features = []
for c in dataset.columns[1:]:
features.append(
Stream.source(list(dataset[c]), dtype="float").rename(c)
)
feed = DataFeed(features)
feed.compile()
reward_scheme = default.rewards.SimpleProfit(window_size=config["reward_window_size"])
action_scheme = default.actions.BSH(cash=cash, asset=asset)
env = default.create(
feed=feed,
portfolio=portfolio,
action_scheme=action_scheme,
reward_scheme=reward_scheme,
renderer_feed=renderer_feed,
renderer=[],
window_size=config["window_size"],
max_allowed_loss=config["max_allowed_loss"]
)
return env目前,您实际上拥有一个小型交易环境,您的代理可以在其中买入、卖出或持有并等待最佳结果。
3、使用 Ray 运行训练
如果你喜欢自讨苦吃,可以手动训练强化学习智能体,但 Ray 简化了超参数调优过程。
以下是设置:
import ray
import os
from ray import tune
from ray.tune.registry import register_env
FC_SIZE = tune.grid_search([[256, 256], [1024], [128, 64, 32]])
LEARNING_RATE = tune.grid_search([0.001, 0.0005, 0.00001])
MINIBATCH_SIZE = tune.grid_search([5, 10, 20])
cwd = os.getcwd()
ray.init()
register_env("MyTrainingEnv", create_env)
env_config_training = {
"window_size": 14,
"reward_window_size": 7,
"max_allowed_loss": 0.10,
"csv_filename": os.path.join(cwd, 'training.csv'),
}
env_config_evaluation = {
"max_allowed_loss": 1.00,
"csv_filename": os.path.join(cwd, 'evaluation.csv'),
}
analysis = tune.run(
run_or_experiment="PPO",
name="MyExperiment1",
metric="episode_reward_mean",
mode="max",
stop={"training_iteration": 5},
config={
"env": "MyTrainingEnv",
"env_config": env_config_training,
"framework": "torch",
"num_workers": 1,
"lr": LEARNING_RATE,
"model": {"fcnet_hiddens": FC_SIZE},
"sgd_minibatch_size": MINIBATCH_SIZE,
"evaluation_interval": 1,
"evaluation_config": {
"env_config": env_config_evaluation,
"explore": False
}
},
num_samples=1,
checkpoint_freq=1
)Ray 本质上是尝试各种组合,多次运行你的环境,并告诉你哪种组合效果最佳,而无需你整晚坐在那里调整参数。
4、创建你自己的奖励函数
这部分让我很惊喜——TensorTrade 比大多数强化学习库都更简化了自定义奖励的创建过程。
以下是奖励类的示例:
from tensortrade.env.default.rewards import TensorTradeRewardScheme
from tensortrade.feed.core import DataFeed, Stream
class PBR(TensorTradeRewardScheme):
registered_name = "pbr"
def __init__(self, price: Stream):
super().__init__()
self.position = -1
r = Stream.sensor(price, lambda p: p.value, dtype="float").diff()
position = Stream.sensor(self, lambda rs: rs.position, dtype="float")
reward = (r * position).fillna(0).rename("reward")
self.feed = DataFeed([reward])
self.feed.compile()
def on_action(self, action: int):
self.position = 1 if action == 0 else -1
def get_reward(self, portfolio):
return self.feed.next()["reward"]
def reset(self):
self.position = -1
self.feed.reset()没什么特别的——只是将价格变化与你的多空仓位关联起来。
5、结束语
TensorTrade 并不完美,没错,如果你的奖励函数很糟糕,你的智能体肯定会崩溃。但作为一个通过交易这种实际操作来学习强化学习的框架,它确实是我尝试过的最简单、最有趣的工具之一。
原文链接:TensorTrade: Build AI-Driven Trading Algorithms with Reinforcement Learning
汇智网翻译整理,转载请标明出处