# PROJECT DETAILS

This project is on stock trading, specifically the SP500 & some very in demand stocks such as Apple, Amazon, and Google. The script will show the path that I took in order to get the most from the dataset visually and then explore a model that I found that could predict Apple’s stock to 92% accuracy. I then used the same model to the SP500 to show the ability to an index by using an LSTM model in keras. I hope you enjoy it and check out the code at my Github with the button above or share it with your network on Linkedin!

# Math (Skip if you only want to see the fun stuff)

Ridge Regression is a way to create a model, when the number of predicator variables exceed the number of observations, or when the dataset has **multicollinearity** 🔔 (correlations between predictor variables). Also, the Ridge Regression is a L2 regression which add a penalty. The penalty is equal to the squares of the magnitude of coefficients. **Penalty = Losing Money**

So it is a perfect fit!

LSTM & Time Series LSTM is a recurrent neural network (RNN) that is trained by using Backpropagation through time and overcomes the vanishing gradient problem.

Long-Strong-Term Memory (LSTM) is the next generation of Recurrent Neural Network (RNN) used in deep learning for its optimized architecture to easily capture the pattern in sequential data aka **STOCKS**

# 🍻

Cheers! Now lets begin!!

# IMPORT DATASETS AND LIBRARIES

```
# Data Maniupulation
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Data Visualization
import plotly.figure_factory as ff
import plotly.express as px
# Modeling
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from tensorflow import keras
# Additional
from copy import copy
from scipy import stats
```

```
# Stock prices data
stocks_df = pd.read_csv('/Users/andrewdarmond/Documents/FinanceML/stock.csv')
# Stocks volume data
stocks_vol_df = pd.read_csv('/Users/andrewdarmond/Documents/FinanceML/stock_volume.csv')
```

```
# Sort the data based on Date
stocks_df = stocks_df.sort_values('Date')
```

```
# Sort the volume data based on Date
stocks_vol_df = stocks_vol_df.sort_values('Date')
```

# PERFORM EXPLORATORY DATA ANALYSIS AND VISUALIZATION

```
# Function to normalize stock prices based on their initial price
def normalize(df):
x = df.copy()
for i in x.columns[1:]:
x[i] = x[i]/x[i][0]
return x
```

```
# Function to plot interactive plots using Plotly Express
def interactive_plot(df, title):
fig = px.line(title = title)
for i in df.columns[1:]:
fig.add_scatter(x = df['Date'], y = df[i], name =i)
fig.show()
```

```
# plot interactive chart for stocks data
#interactive_plot(stocks_df, 'Stock Prices')
```

```
#interactive_plot(normalize(stocks_df), 'Normalize Stock Prices')
```

```
#interactive_plot(stocks_vol_df, 'Stocks Volume')
```

```
#interactive_plot(normalize(stocks_vol_df), 'Normalizes Stock Volume')
```

# PREPARE THE DATA BEFORE TRAINING THE MODEL

```
# Function to concatenate the date, stock price, and volume in one dataframe
def individual_stock(price_df, vol_df, name):
return pd.DataFrame({'Date': price_df['Date'], 'Close': price_df[name], 'Volume': vol_df[name]})
```

```
# Function to return the input/output (target) data for model
# Note that our goal is to predict the future stock price
# Target stock price today will be tomorrow's price
def trading_window(data):
n = 1
data['Target'] = data[['Close']].shift(-n)
return data
```

#### If you want to view SP 500 / AMZN / ETC: Change ‘APPL’ HERE!

```
# Let's test the functions and get individual stock prices and volumes for AAPL
price_volume_df = individual_stock(stocks_df, stocks_vol_df, 'AAPL')
price_volume_df
```

Date | Close | Volume | |
---|---|---|---|

0 | 2012-01-12 | 60.198570 | 53146800 |

1 | 2012-01-13 | 59.972858 | 56505400 |

2 | 2012-01-17 | 60.671429 | 60724300 |

3 | 2012-01-18 | 61.301430 | 69197800 |

4 | 2012-01-19 | 61.107143 | 65434600 |

... | ... | ... | ... |

2154 | 2020-08-05 | 440.250000 | 30498000 |

2155 | 2020-08-06 | 455.609985 | 50607200 |

2156 | 2020-08-07 | 444.450012 | 49453300 |

2157 | 2020-08-10 | 450.910004 | 53100900 |

2158 | 2020-08-11 | 437.500000 | 46871100 |

2159 rows × 3 columns

```
price_volume_target_df = trading_window(price_volume_df)
price_volume_target_df
```

Date | Close | Volume | Target | |
---|---|---|---|---|

0 | 2012-01-12 | 60.198570 | 53146800 | 59.972858 |

1 | 2012-01-13 | 59.972858 | 56505400 | 60.671429 |

2 | 2012-01-17 | 60.671429 | 60724300 | 61.301430 |

3 | 2012-01-18 | 61.301430 | 69197800 | 61.107143 |

4 | 2012-01-19 | 61.107143 | 65434600 | 60.042858 |

... | ... | ... | ... | ... |

2154 | 2020-08-05 | 440.250000 | 30498000 | 455.609985 |

2155 | 2020-08-06 | 455.609985 | 50607200 | 444.450012 |

2156 | 2020-08-07 | 444.450012 | 49453300 | 450.910004 |

2157 | 2020-08-10 | 450.910004 | 53100900 | 437.500000 |

2158 | 2020-08-11 | 437.500000 | 46871100 | NaN |

2159 rows × 4 columns

```
# Remove the last row as it will be a null value
price_volume_target_df = price_volume_target_df[:-1]
price_volume_target_df
```

Date | Close | Volume | Target | |
---|---|---|---|---|

0 | 2012-01-12 | 60.198570 | 53146800 | 59.972858 |

1 | 2012-01-13 | 59.972858 | 56505400 | 60.671429 |

2 | 2012-01-17 | 60.671429 | 60724300 | 61.301430 |

3 | 2012-01-18 | 61.301430 | 69197800 | 61.107143 |

4 | 2012-01-19 | 61.107143 | 65434600 | 60.042858 |

... | ... | ... | ... | ... |

2153 | 2020-08-04 | 438.660004 | 43267900 | 440.250000 |

2154 | 2020-08-05 | 440.250000 | 30498000 | 455.609985 |

2155 | 2020-08-06 | 455.609985 | 50607200 | 444.450012 |

2156 | 2020-08-07 | 444.450012 | 49453300 | 450.910004 |

2157 | 2020-08-10 | 450.910004 | 53100900 | 437.500000 |

2158 rows × 4 columns

```
# Scale the data
sc = MinMaxScaler(feature_range = (0,1))
price_volume_target_scaled_df = sc.fit_transform(price_volume_target_df.drop(columns = ['Date']))
```

```
# Create Feature and Target
X = price_volume_target_scaled_df[:, :2]
y = price_volume_target_scaled_df[:, 2:]
```

```
price_volume_target_scaled_df.shape
```

```
(2158, 3)
```

```
X.shape, y.shape
```

```
((2158, 2), (2158, 1))
```

### Spliting the data this way, since order is important in time-series

### Note that we did not use train test split with it’s default settings since it shuffles the data

```
split = int(0.75 * len(X))
X_train = X[:split]
y_train = y[:split]
X_test = X[split:]
y_test = y[split:]
```

```
X_train.shape, y_train.shape
```

```
((1618, 2), (1618, 1))
```

```
X_test.shape, y_test.shape
```

```
((540, 2), (540, 1))
```

```
# Define a data plotting function
print('''
APPLE
''')
def show_plot(data, title):
plt.figure(figsize = (13, 5))
plt.plot(data, linewidth = 3)
plt.title(title)
plt.xlabel(xlabel= 'Data Variable')
plt.ylabel(ylabel= 'Accuracy Relativity to 1' )
plt.grid()
show_plot(X_train, 'Training Data')
show_plot(X_test, 'Testing Data')
```

```
APPLE
```

# BUILD AND TRAIN A RIDGE LINEAR REGRESSION MODEL

```
regression_model = Ridge()
# Test the model and calculate its accuracy
regression_model.fit(X_train, y_train)
# Make Prediction
lr_accuracy = regression_model.score(X_test, y_test)
print('Ridge Regression Score:', lr_accuracy)
```

```
Ridge Regression Score: 0.9311227075637692
```

```
# Append the predicted values into a list
predicted_prices = regression_model.predict(X)
```

```
predicted = []
for i in predicted_prices:
predicted.append(i[0])
```

```
# Append the close values to the list
close = []
for i in price_volume_target_scaled_df:
close.append(i[0])
```

```
# Create a dataframe based on the dates in the individual stock data
df_predicted = price_volume_target_df[['Date']]
```

```
# Add the close values to the dataframe
df_predicted['Close'] = close
```

```
# Add the predicted values to the dataframe
df_predicted['Prediction'] = predicted
df_predicted
```

Date | Close | Prediction | |
---|---|---|---|

0 | 2012-01-12 | 0.011026 | 0.026286 |

1 | 2012-01-13 | 0.010462 | 0.025428 |

2 | 2012-01-17 | 0.012209 | 0.026527 |

3 | 2012-01-18 | 0.013785 | 0.027022 |

4 | 2012-01-19 | 0.013299 | 0.026992 |

... | ... | ... | ... |

2153 | 2020-08-04 | 0.957606 | 0.866550 |

2154 | 2020-08-05 | 0.961583 | 0.871436 |

2155 | 2020-08-06 | 1.000000 | 0.903353 |

2156 | 2020-08-07 | 0.972088 | 0.878730 |

2157 | 2020-08-10 | 0.988245 | 0.892666 |

2158 rows × 3 columns

```
# Plot the results
#interactive_plot(df_predicted, 'Original Vs. Predictions: Apple Stock(AAPL)')
```

# TRAIN AN LSTM TIME SERIES MODEL

#### If you want to view APPL / AMZN / ETC: Change ‘sp500 HERE!

```
# Let's test the functions and get individual stock prices and volumes for sp500
price_volume_df = individual_stock(stocks_df, stocks_vol_df, 'sp500')
```

```
# Get the close and volume data as training data (Input)
training_data = price_volume_df.iloc[:, 1:3].values
```

```
# Normalize the data
sc = MinMaxScaler(feature_range= (0,1))
training_set_scaled = sc.fit_transform(training_data)
```

```
# Create the training and testing data, training data contains present day and previous day values
X = []
y = []
for i in range(1, len(price_volume_df)):
X.append(training_set_scaled[i-1:i, 0])
y.append(training_set_scaled[i, 0])
```

```
# Convert the data into array format
X = np.array(X)
y = np.array(y)
```

```
# Split the data
split = int(0.7 * len(X))
X_train = X[:split]
y_train = y[:split]
X_test = X[split:]
y_test = y[split:]
```

```
# Reshape the 1D arrays to 3D arrays to feed in the model
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
```

```
# Create the model
inputs = keras.layers.Input(shape=(X_train.shape[1], X_train.shape[2]))
x = keras.layers.LSTM(150, return_sequences= True)(inputs)
x = keras.layers.Dropout(0.3)(x)
x = keras.layers.LSTM(150, return_sequences=True)(x)
x = keras.layers.Dropout(0.3)(x)
x = keras.layers.LSTM(150)(x)
outputs = keras.layers.Dense(1, activation='linear')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
```

```
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 1, 1)] 0
_________________________________________________________________
lstm (LSTM) (None, 1, 150) 91200
_________________________________________________________________
dropout (Dropout) (None, 1, 150) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 1, 150) 180600
_________________________________________________________________
dropout_1 (Dropout) (None, 1, 150) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 150) 180600
_________________________________________________________________
dense (Dense) (None, 1) 151
=================================================================
Total params: 452,551
Trainable params: 452,551
Non-trainable params: 0
_________________________________________________________________
```

```
# Train the model
history = model.fit(X_train, y_train, epochs= 20, batch_size= 32, validation_split= 0.2)
```

```
Epoch 1/20
38/38 [==============================] - 7s 55ms/step - loss: 0.0539 - val_loss: 0.0653
Epoch 2/20
38/38 [==============================] - 0s 8ms/step - loss: 0.0095 - val_loss: 0.0055
Epoch 3/20
38/38 [==============================] - 0s 11ms/step - loss: 0.0012 - val_loss: 5.9507e-04
Epoch 4/20
38/38 [==============================] - 0s 10ms/step - loss: 3.8284e-04 - val_loss: 2.3346e-04
Epoch 5/20
38/38 [==============================] - 0s 9ms/step - loss: 3.5239e-04 - val_loss: 8.1833e-05
Epoch 6/20
38/38 [==============================] - 0s 10ms/step - loss: 3.5036e-04 - val_loss: 6.2046e-05
Epoch 7/20
38/38 [==============================] - 0s 8ms/step - loss: 3.0313e-04 - val_loss: 4.0566e-05
Epoch 8/20
38/38 [==============================] - 0s 9ms/step - loss: 2.8564e-04 - val_loss: 6.2951e-05
Epoch 9/20
38/38 [==============================] - 0s 9ms/step - loss: 3.1342e-04 - val_loss: 5.7098e-05
Epoch 10/20
26/38 [===================>..........] - ETA: 0s - loss: 2.9808e-04
```

```
# Make prediction
predicted = model.predict(X)
```

```
test_predicted = []
for i in predicted:
test_predicted.append(i[0])
```

```
df_predicted = price_volume_df[1:][['Date']]
```

```
df_predicted['predictions'] = test_predicted
```

```
close = []
for i in training_set_scaled:
close.append(i[0])
```

```
df_predicted['Close'] = close[1:]
df_predicted
```

```
# Plot the results
#interactive_plot(df_predicted, 'Original Vs Predictions: SP500')
```