Stock Market Analysis for Tech Stocks
In this project, we'll analyse data from the stock market for some technology stocks.
Again, we'll use Pandas to extract and analyse the information, visualise it, and look at different ways to analyse the risk of a stock, based on its performance history.
Here are the questions we'll try to answer:
- What was the change in a stock's price over time?
- What was the daily return average of a stock?
- What was the moving average of various stocks?
- What was the correlation between daily returns of different stocks?
- How much value do we put at risk by investing in a particular stock?
- How can we attempt to predict future stock behaviour?
1#Python Data Analysis imports2import pandas as pd3from pandas import Series,DataFrame4import numpy as np5
6#Visualisation imports7import matplotlib.pyplot as plt8import seaborn as sns9sns.set_style('whitegrid')10%matplotlib inline11
12#To grab stock data13from pandas.io.data import DataReader14from datetime import datetime15
16#To handle floats in Python 217from __future__ import division
We're going to analyse some tech stocks, and it seems like a good idea to look at their performance over the last year. We can create a list with the stock names, for future looping.
1#We're going to analyse stock info for Apple, Google, Microsoft, and Amazon2tech_list = ['AAPL','GOOG','MSFT','AMZN','YHOO']3
4#Setting the end date to today5end = datetime.now()6
7#Start date set to 1 year back8start = datetime(end.year-1,end.month,end.day) 9
10#Using Yahoo Finance to grab the stock data11for stock in tech_list:12 globals()[stock] = DataReader(stock,'yahoo',start,end) #The globals method sets the stock name to a global variable
Thanks to the globals method, Apple's stock data will be stored in the AAPL global variable dataframe. Let's see if that worked.
1AAPL.head()
Open | High | Low | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
Date | ||||||
2015-09-23 | 113.629997 | 114.720001 | 113.300003 | 114.320000 | 35756700 | 111.926895 |
2015-09-24 | 113.250000 | 115.500000 | 112.370003 | 115.000000 | 50219500 | 112.592660 |
2015-09-25 | 116.440002 | 116.690002 | 114.019997 | 114.709999 | 56151900 | 112.308730 |
2015-09-28 | 113.849998 | 114.570000 | 112.440002 | 112.440002 | 52109000 | 110.086252 |
2015-09-29 | 112.830002 | 113.510002 | 107.860001 | 109.059998 | 73365400 | 106.777002 |
1#Basic stats for Apple's Stock2AAPL.describe()
Open | High | Low | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
count | 253.000000 | 253.000000 | 253.000000 | 253.000000 | 2.530000e+02 | 253.000000 |
mean | 104.824941 | 105.777510 | 103.881146 | 104.858498 | 4.179762e+07 | 103.707287 |
std | 8.073718 | 8.110392 | 8.019398 | 8.075914 | 1.749642e+07 | 7.735402 |
min | 90.000000 | 91.669998 | 89.470001 | 90.339996 | 1.304640e+07 | 89.853242 |
25% | 97.320000 | 98.209999 | 96.580002 | 97.139999 | 2.944520e+07 | 96.348065 |
50% | 105.519997 | 106.309998 | 104.879997 | 105.790001 | 3.695570e+07 | 104.701886 |
75% | 110.629997 | 111.769997 | 109.410004 | 110.779999 | 4.896780e+07 | 109.220001 |
max | 123.129997 | 123.820000 | 121.620003 | 122.570000 | 1.333697e+08 | 120.004194 |
And that easily, we can make out what the stock's minimum, maximum, and average price was for the last year.
1#Some basic info about the dataframe2AAPL.info()
<class 'pandas.core.frame.DataFrame'>DatetimeIndex: 253 entries, 2015-09-23 to 2016-09-22Data columns (total 6 columns):Open 253 non-null float64High 253 non-null float64Low 253 non-null float64Close 253 non-null float64Volume 253 non-null int64Adj Close 253 non-null float64dtypes: float64(5), int64(1)memory usage: 13.8 KB
No missing info in the dataframe above, so we can go about our business.
What's the change in stock's price over time?
1#Plotting the stock's adjusted closing price using pandas2AAPL['Adj Close'].plot(legend=True,figsize=(12,5))
Similarily, we can plot change in a stock's volume being traded, over time.
1#Plotting the total volume being traded over time2AAPL['Volume'].plot(legend=True,figsize=(12,5))
What was the moving average of various stocks?
Let's check out the moving average for stocks over a 10, 20 and 50 day period of time. We'll add that information to the stock's dataframe.
1ma_day = [10,20,50]2
3for ma in ma_day:4 column_name = "MA for %s days" %(str(ma))5 6 AAPL[column_name] = AAPL['Adj Close'].rolling(window=ma,center=False).mean()7
8AAPL.tail()
Open | High | Low | Close | Volume | Adj Close | MA for 10 days | MA for 20 days | MA for 50 days | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
2016-09-16 | 115.120003 | 116.129997 | 114.040001 | 114.919998 | 79886900 | 114.919998 | 108.808999 | 108.1500 | 104.992706 |
2016-09-19 | 115.190002 | 116.180000 | 113.250000 | 113.580002 | 47023000 | 113.580002 | 109.393999 | 108.3610 | 105.341124 |
2016-09-20 | 113.050003 | 114.120003 | 112.510002 | 113.570000 | 34514300 | 113.570000 | 109.980999 | 108.6140 | 105.683375 |
2016-09-21 | 113.849998 | 113.989998 | 112.440002 | 113.550003 | 36003200 | 113.550003 | 110.499999 | 108.8490 | 106.016473 |
2016-09-22 | 114.349998 | 114.940002 | 114.000000 | 114.620003 | 31011700 | 114.620003 | 111.410000 | 109.1785 | 106.381911 |
1AAPL[['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(subplots=False,figsize=(12,5))
1<matplotlib.axes._subplots.AxesSubplot at 0x114a9c390>
Moving averages for more days have a smoother plot, as they're less reliable on daily fluctuations. So even though, Apple's stock has a slight dip near the start of September, it's generally been on an upward trend since mid-July.
What was the daily return average of a stock?
1#The daily return column can be created by using the percentage change over the adjusted closing price2AAPL['Daily Return'] = AAPL['Adj Close'].pct_change()3
4AAPL['Daily Return'].tail()
Date2016-09-16 -0.0056242016-09-19 -0.0116602016-09-20 -0.0000882016-09-21 -0.0001762016-09-22 0.009423Name: Daily Return, dtype: float64
1#Plotting the daily return2AAPL['Daily Return'].plot(figsize=(14,5),legend=True,linestyle='--',marker='o')
1sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='red')
Positive daily returns seem to be slightly more frequent than negative returns for Apple.
What was the correlation between daily returns of different stocks?
1#Reading just the 'Adj Close' column this time2close_df = DataReader(tech_list,'yahoo',start,end)['Adj Close']3
4close_df.tail()
AAPL | AMZN | GOOG | MSFT | YHOO | |
---|---|---|---|---|---|
Date | |||||
2016-09-16 | 114.919998 | 778.520020 | 768.880005 | 57.250000 | 43.669998 |
2016-09-19 | 113.580002 | 775.099976 | 765.700012 | 56.930000 | 43.189999 |
2016-09-20 | 113.570000 | 780.219971 | 771.409973 | 56.810001 | 42.790001 |
2016-09-21 | 113.550003 | 789.739990 | 776.219971 | 57.759998 | 44.139999 |
2016-09-22 | 114.620003 | 804.700012 | 787.210022 | 57.820000 | 44.150002 |
Everything works as expected.
Just as we did earlier, we can use Pandas' pct_change method to get the daily returns of our stocks.
1rets_df = close_df.pct_change()2
3rets_df.tail()
AAPL | AMZN | GOOG | MSFT | YHOO | |
---|---|---|---|---|---|
Date | |||||
2016-09-16 | -0.005624 | 0.011472 | -0.003732 | 0.001049 | -0.007274 |
2016-09-19 | -0.011660 | -0.004393 | -0.004136 | -0.005590 | -0.010992 |
2016-09-20 | -0.000088 | 0.006606 | 0.007457 | -0.002108 | -0.009261 |
2016-09-21 | -0.000176 | 0.012202 | 0.006235 | 0.016722 | 0.031549 |
2016-09-22 | 0.009423 | 0.018943 | 0.014158 | 0.001039 | 0.000227 |
Let's try creating a scatterplot to visualise any correlations between different stocks. First we'll visualise a scatterplot for the relationship between the daily return of a stock to itself.
1sns.jointplot('GOOG','GOOG',rets_df,kind='scatter',color='green')
As expected, the relationship is perfectly linear because we're trying to correlate something with itself. Now, let's check out the relationship between Google and Apple's daily returns.
1sns.jointplot('GOOG','AAPL',rets_df,kind='scatter')
There seems to be a minor correlation between the two stocks, looking at the figure above. The Pearson R Correlation Coefficient value of 0.45 echoes that sentiment.
But what about other combinations of stocks?
1sns.pairplot(rets_df.dropna())
Quick and dirty overarching visualisation of the scatterplots and histograms of daily returns of our stocks. To see the actual numbers for the correlation coefficients, we can use seaborn's corrplot method.
1sns.corrplot(rets_df.dropna(),annot=True)
Google and Microsoft seem to have the highest correlation. But another interesting thing to note is that all tech companies that we explored are positively correlated.
How much value do we put at risk by investing in a particular stock?
A basic way to quantify risk is to compare the expected return (which can be the mean of the stock's daily returns) with the standard deviation of the daily returns.
1rets = rets_df.dropna()2
3plt.figure(figsize=(8,5))4
5plt.scatter(rets.mean(),rets.std(),s=25)6
7plt.xlabel('Expected Return')8plt.ylabel('Risk')9
10
11#For adding annotatios in the scatterplot12for label,x,y in zip(rets.columns,rets.mean(),rets.std()):13 plt.annotate(14 label,15 xy=(x,y),xytext=(-120,20),16 textcoords = 'offset points', ha = 'right', va = 'bottom',17 arrowprops = dict(arrowstyle='->',connectionstyle = 'arc3,rad=-0.5'))
We'd want a stock to have a high expected return and a low risk; Google and Microsoft seem to be the safe options for that. Meanwhile, Yahoo and Amazon stocks have higher expected returns, but also have a higher risk
Value at Risk
We can treat Value at risk as the amount of money we could expect to lose for a given confidence interval. We'll use the 'Bootstrap' method and the 'Monte Carlo Method' to extract this value.
Bootstrap Method
Using this method, we calculate the empirical quantiles from a histogram of daily returns. The quantiles help us define our confidence interval.
1sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='purple')
To recap, our histogram for Apple's stock looked like the above. And our daily returns dataframe looked like:
1rets.head()
AAPL | AMZN | GOOG | MSFT | YHOO | |
---|---|---|---|---|---|
Date | |||||
2015-09-24 | 0.005948 | -0.004328 | 0.005527 | 0.000912 | -0.013450 |
2015-09-25 | -0.002522 | -0.017799 | -0.022100 | 0.000683 | -0.007157 |
2015-09-28 | -0.019789 | -0.038512 | -0.027910 | -0.014793 | -0.052523 |
2015-09-29 | -0.030061 | -0.015851 | 0.000134 | 0.003465 | 0.023913 |
2015-09-30 | 0.011370 | 0.031891 | 0.022606 | 0.018877 | 0.023001 |
1#Using Pandas built in qualtile method2rets['AAPL'].quantile(0.05)
-0.025722813451247724
The 0.05 empirical quantile of daily returns is at -0.019. This means that with 95% confidence, the worst daily loss will not exceed 2.57% (of the investment).
How can we attempt to predict future stock behaviour?
Monte Carlo Method
Check out this link for more info on the Monte Carlo method. In short: in this method, we run simulations to predict the future many times, and aggregate the results in the end for some quantifiable value.
1days = 3652
3#delta t4dt = 1/3655
6mu = rets.mean()['GOOG']7
8sigma = rets.std()['GOOG']9
10#Function takes in stock price, number of days to run, mean and standard deviation values11def stock_monte_carlo(start_price,days,mu,sigma):12 13 price = np.zeros(days)14 price[0] = start_price15 16 shock = np.zeros(days)17 drift = np.zeros(days)18 19 for x in xrange(1,days):20 21 #Shock and drift formulas taken from the Monte Carlo formula22 shock[x] = np.random.normal(loc=mu*dt,scale=sigma*np.sqrt(dt))23 24 drift[x] = mu * dt25 26 #New price = Old price + Old price*(shock+drift)27 price[x] = price[x-1] + (price[x-1] * (drift[x]+shock[x]))28 29 return price
We're going to run the simulation of Google stocks. Let's check out the opening value of the stock.
1GOOG.head()
Open | High | Low | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
Date | ||||||
2015-09-23 | 622.049988 | 628.929993 | 620.000000 | 622.359985 | 1470900 | 622.359985 |
2015-09-24 | 616.640015 | 627.320007 | 612.400024 | 625.799988 | 2240100 | 625.799988 |
2015-09-25 | 629.770020 | 629.770020 | 611.000000 | 611.969971 | 2174000 | 611.969971 |
2015-09-28 | 610.340027 | 614.604980 | 589.380005 | 594.890015 | 3127700 | 594.890015 |
2015-09-29 | 597.280029 | 605.000000 | 590.219971 | 594.969971 | 2309500 | 594.969971 |
Let's do a simulation of 100 runs, and plot them.
1start_price = 622.049 #Taken from above2
3for run in xrange(100):4 plt.plot(stock_monte_carlo(start_price,days,mu,sigma))5
6plt.xlabel('Days')7plt.ylabel('Price')8plt.title('Monte Carlo Analysis for Google')
1runs = 100002
3simulations = np.zeros(runs)4
5for run in xrange(runs):6 simulations[run] = stock_monte_carlo(start_price,days,mu,sigma)[days-1]
1q = np.percentile(simulations,1)2
3plt.hist(simulations,bins=200)4
5plt.figtext(0.6,0.8,s="Start price: $%.2f" %start_price)6
7plt.figtext(0.6,0.7,"Mean final price: $%.2f" % simulations.mean())8
9plt.figtext(0.6,0.6,"VaR(0.99): $%.2f" % (start_price -q,))10
11plt.figtext(0.15,0.6, "q(0.99): $%.2f" % q)12
13plt.axvline(x=q, linewidth=4, color='r')14
15plt.title(u"Final price distribution for Google Stock after %s days" %days, weight='bold')
We can infer from this that, Google's stock is pretty stable. The starting price that we had was USD622.05, and the average final price over 10,000 runs was USD623.36.
The red line indicates the value of stock at risk at the desired confidence interval. For every stock, we'd be risking USD18.38, 99% of the time.