基于软件的深度强化学习智能体具有巨大的潜力,体现在智能体可以不知疲倦且完美 无缺地执行交易策略,而不像人类交易员一样容易受到记忆容量、速度、效率和情感干扰等 因素的限制。在股票市场中进行获利的交易需要使用股票代码谨慎地执行买入 /卖出交易, 在此过程中,要考虑多种市场因素(如交易条件、宏观和微观市场条件等),还要考虑社会、 政治和公司的具体变化。在解决现实世界中具有挑战性的问题时,深度强化学习智能体具 有很大的潜力,并且存在很多机遇。 然而,由于在现实世界中部署强化学习智能体面临各种挑战,因此,在游戏场景外的 现实世界中只有少数几个在游戏之外使用深度强化学习智能体的成功案例。本章的内容 主要目的是开发强化学习智能体,用于解决一个有趣且有益的现实问题:股票市场交易。 本章提供的内容包含如何实现与 OpenAI Gym兼容的,具有离散和连续动作空间的自定 义股票市场仿真环境。此外,还介绍了如何在股票交易学习环境中构建和训练强化学习智 能体。 具体来说,本章将涵盖以下内容: .使用真实的证券交易所数据搭建一个股票市场交易强化学习平台; .使用价格图表搭建一个股票市场交易强化学习平台; .搭建一个高级的股票交易强化学习平台以训练智能体模仿专业交易员。 5.1技术要求 本书的代码已经在 Ubuntu 18.04和 Ubuntu 20.04上进行了广泛的测试,而且可以 在安装了 Python 3.6+的 Ubuntu后续版本中正常工作。在安装 Python 3.6的情况下, 搭配每项内容开始时列出的必要 Python工具包,本书的代码也同样可以在 Windows和 macOS X上运行。建议读者创建和使用一个命名为 tf2rl-cookbook的 Python虚拟环境来 安装工具包以及运行本书的代码。推荐读者安装 Miniconda或 Anaconda来管理 Python 虚拟环境。 5.2 使用真实的证券交易所数据搭建一个股票市场交易强化学习 平台 股票市场为任何人提供了一个可参与并极具利润潜力的机会。虽然股票市场准入标准 低,但并非所有人都能做出持续稳定盈利的交易。主要原因是市场的动态特性以及可能影 响人们行为的情感因素,强化学习智能体将情感排除在外,并且可以通过训练来实现持续 盈利。本节将实现一个股票市场交易环境,该环境将引导强化学习智能体如何使用真实的 股票市场数据进行股票交易。在对它们进行了足够的训练后,就可以部署它们,让它们自 动进行交易(和盈利)。 5.2.1前期准备 为成功运行代码,请确保已经更新到最新版本。需要激活命名为 tf2rl-cookbook的 Python/Conda虚拟环境。确保更新的环境与书中代码库中最新的 Conda环境规范文件 (tfrl-cookbook.yml)相匹配。如果以下 import语句运行没有问题,就可以准备开始了: import os import random from typing import Dict import gym import numpy as np import pandas as pd from gym import spaces from trading_utils import TradeVisualizer 5.2.2实现步骤 请按照以下步骤实现股票市场交易环境。 (1)初始化环境的可配置参数: env_config = { "ticker": "TSLA", "opening_account_balance": 1000, # Number of steps (days) of data provided to the # agent in one observation "observation_horizon_sequence_length": 30, "order_size": 1, # Number of shares to buy per # buy/sell order } (2)初始化 StockTradingEnv()类,并为配置的股票代码加载股票市场数据: class StockTradingEnv(gym.Env): def __init__(self, env_config: Dict = env_config): """Stock trading environment for RL agents Args: ticker (str, optional): Ticker symbol for the stock. Defaults to "MSFT". env_config (Dict): Env configuration values """ super(StockTradingEnv, self).__init__() self.ticker = env_config.get("ticker", "MSFT") data_dir = os.path.join(os.path.dirname(os.path.\ realpath(__file__)), "data") self.ticker_file_stream = os.path.join(f"{ data_dir}", f"{self.ticker}.csv") (3)确保股市数据源存在,然后加载数据流: assert os.path.isfile( self.ticker_file_stream ), f"Historical stock data file stream not found at: data/{self.ticker}.csv" # Stock market data stream. An offline file # stream is used. Alternatively, a web # API can be used to pull live data. # DataFrame: Date Open High Low Close AdjClose # Volume self.ohlcv_df = \ pd.read_csv(self.ticker_file_stream) (4)定义观测和动作空间/环境,以完成初始化函数的定义: self.opening_account_balance = \ env_config["opening_account_balance"] # Action: 0> Hold. 1> Buy. 2 > Sell. self.action_space = spaces.Discrete(3) self.observation_features = [ "Open", "High", "Low", "Close", "Adj Close", "Volume", ] self.horizon = env_config.get( "observation_horizon_sequence_length") self.observation_space = spaces.Box( low=0, high=1, shape=(len(self.observation_features), self.horizon + 1), dtype=np.float, ) self.order_size = env_config.get("order_size") (5)实现 get_observation()函数,以便收集观测: def get_observation(self): # Get stock price info data table from input # (file/live) stream observation = ( self.ohlcv_df.loc[ self.current_step : self.current_step + \ self.horizon, self.observation_features, ] .to_numpy() .T ) return observation (6)为了执行交易订单,需要准备好所需的交易内容,所以接下来添加相应的逻辑: def execute_trade_action(self, action): if action == 0: # Hold position return order_type = "buy" if action == 1 else "sell" # Stochastically determine the current stock # price based on Market Open & Close current_price = random.uniform( self.ohlcv_df.loc[self.current_step, "Open"], self.ohlcv_df.loc[self.current_step, "Close"], ) (7)初始化完成后,添加买入股票的内容: if order_type == "buy": allowable_shares = \ int(self.cash_balance / current_price) if allowable_shares < self.order_size: # Not enough cash to execute a buy order # return # Simulate a BUY order and execute it at # current_price num_shares_bought = self.order_size current_cost = self.cost_basis * \ self.num_shares_held additional_cost = num_shares_bought * \ current_price self.cash_balance = additional_cost self.cost_basis = (current_cost + \ additional_cost) / ( self.num_shares_held + num_shares_bought ) self.num_shares_held += num_shares_bought self.trades.append( { "type": "buy", "step": self.current_step, "shares": num_shares_bought, "proceeds": additional_cost, } ) (8)同样,添加卖出股票的内容: elif order_type == "sell": # Simulate a SELL order and execute it at # current_price if self.num_shares_held < self.order_size: # Not enough shares to execute a sell # order return num_shares_sold = self.order_size self.cash_balance += num_shares_sold * \ current_price self.num_shares_held = num_shares_sold sale_proceeds = num_shares_sold * current_price self.trades.append( { "type": "sell", "step": self.current_step, "shares": num_shares_sold, "proceeds": sale_proceeds, } ) (9)更新账户余额: # Update account value self.account_value = self.cash_balance + \ self.num_shares_held * \ current_price (10)启动并检查新环境: if __name__ == "__main__": env = StockTradingEnv() obs = env.reset() for _ in range(600): action = env.action_space.sample() next_obs, reward, done, _ = env.step(action) env.render() 5.2.3工作原理 观测值是在 env_config中指定的某个时间范围内的股票价格信息,包括开盘价、最高 价、最低价、收盘价和成交量( OHLCV)。动作空间是离散的,允许执行买入 /卖出/持有 的交易操作。这是强化学习智能体学习股票市场交易的入门环境。 5.3使用价格图表搭建一个股票市场交易强化学习平台 人类交易员会查看其价格显视器上的几个指标,以审查和识别潜在的交易。是否可以 让智能体也直观地查看价格 K线图来进行股票交易,而不仅仅是提供表格 /CSV表示?答 案是肯定的,本节就介绍如何为强化学习智能体搭建一个具有丰富视觉信息的交易环境。 5.3.1前期准备 为成功运行代码,请确保已经更新到最新版本。需要激活命名为 tf2rl-cookbook的 Python/Conda虚拟环境。确保更新的环境与书中代码库中最新的 Conda环境规范文件 (tfrl-cookbook.yml)相匹配。如果以下 import语句运行没有问题,就可以准备开始了: import os import random from typing import Dict import cv2 import gym import numpy as np import pandas as pd from gym import spaces from trading_utils import TradeVisualizer 5.3.2实现步骤 跟随本节内容,即可搭建出一个完整的股票交易强化学习环境,该环境允许智能体处 理可视的股票图表并做出交易决策。 (1)配置学习环境如下: env_config = { "ticker": "TSLA", "opening_account_balance": 100000, # Number of steps (days) of data provided to the # agent in one observation "observation_horizon_sequence_length": 30, "order_size": 1, # Number of shares to buy per # buy/sell order } (2)实现 StockTradingVisualEnv()类的初始化步骤: class StockTradingVisualEnv(gym.Env): def __init__(self, env_config: Dict = env_config): """Stock trading environment for RL agents Args: ticker (str, optional): Ticker symbol for the stock. Defaults to "MSFT". env_config (Dict): Env configuration values """ super(StockTradingVisualEnv, self).__init__() self.ticker = env_config.get("ticker", "MSFT") data_dir = os.path.join(os.path.dirname(os.path.\ realpath(__file__)), "data") self.ticker_file_stream = os.path.join( f"{data_dir}", f"{self.ticker}.csv") assert os.path.isfile( self.ticker_file_stream ), f"Historical stock data file stream not found\ at: data/{self.ticker}.csv" # Stock market data stream. An offline file # stream is used. Alternatively, a web # API can be used to pull live data. # DataFrame: Date Open High Low Close AdjClose # Volume self.ohlcv_df = \ pd.read_csv(self.ticker_file_stream) (3)实现 __init__()函数: self.opening_account_balance = \ env_config["opening_account_balance"] self.action_space = spaces.Discrete(3) self.observation_features = [ "Open", "High", "Low", "Close", "Adj Close", "Volume", ] self.obs_width, self.obs_height = 128, 128 self.horizon = env_config.get( "observation_horizon_sequence_length") self.observation_space = spaces.Box( low=0, high=255, shape=(128, 128, 3), dtype=np.uint8, ) self.order_size = env_config.get("order_size") self.viz = None # Visualizer (4)定义环境的 step()函数: def step(self, action): # Execute one step within the trading environment self.execute_trade_action(action) self.current_step += 1 reward = self.account_value \ self.opening_account_balance # Profit (loss) done = self.account_value <= 0 or \ self.current_step >= len( self.ohlcv_df.loc[:, "Open"].values ) obs = self.get_observation() return obs, reward, done, {} (5)实现第( 4)步中使用的两个未定义的函数。要实现 get_observation()函数,需 要初始化 TradeVisualizer()函数。因此,先实现 reset()函数: def reset(self): # Reset the state of the environment to an # initial state self.cash_balance = self.opening_account_balance self.account_value = self.opening_account_balance self.num_shares_held = 0 self.cost_basis = 0 self.current_step = 0 self.trades = [] if self.viz is None: self.viz = TradeVisualizer( self.ticker, self.ticker_file_stream, "TFRLCookbook Ch4StockTradingVisualEnv", ) return self.get_observation() (6)实现 get_observation()函数: def get_observation(self): """Return a view of the Ticker price chart as image observation Returns: img_observation (np.ndarray): Image of ticker candle stick plot with volume bars as observation """ img_observation = \ self.viz.render_image_observation( self.current_step, self.horizon ) img_observation = cv2.resize( img_observation, dsize=(128, 128), interpolation=cv2.INTER_CUBIC ) return img_observation (7)在实现智能体所采取的交易动作的执行内容时,可以把交易执行逻辑的实现拆分 为接下来的 3个步骤: def execute_trade_action(self, action): if action == 0: # Hold position return order_type = "buy" if action == 1 else "sell" # Stochastically determine the current stock # price based on Market Open & Close current_price = random.uniform( self.ohlcv_df.loc[self.current_step, "Open"], self.ohlcv_df.loc[self.current_step, \ "Close"], ) (8)实现执行“buy”订单的内容: if order_type == "buy": allowable_shares = \ int(self.cash_balance / current_price) if allowable_shares < self.order_size: return num_shares_bought = self.order_size current_cost = self.cost_basis * \ self.num_shares_held additional_cost = num_shares_bought * \ current_price self.cash_balance = additional_cost self.cost_basis = (current_cost + \ additional_cost)/ \