Sajal Sharma

# Stock Market Analysis for Tech Stocks

13.10.2016 -
Python, Pandas, Seaborn, Financial Analysis

In this project, we'll analyse data from the stock market for some technology stocks.

Again, we'll use Pandas to extract and analyse the information, visualise it, and look at different ways to analyse the risk of a stock, based on its performance history.

Here are the questions we'll try to answer:

• What was the change in a stock's price over time?
• What was the daily return average of a stock?
• What was the moving average of various stocks?
• What was the correlation between daily returns of different stocks?
• How much value do we put at risk by investing in a particular stock?
• How can we attempt to predict future stock behaviour?
1#Python Data Analysis imports
2import pandas as pd
3from pandas import Series,DataFrame
4import numpy as np
5
6#Visualisation imports
7import matplotlib.pyplot as plt
8import seaborn as sns
9sns.set_style('whitegrid')
10%matplotlib inline
11
12#To grab stock data
14from datetime import datetime
15
16#To handle floats in Python 2
17from __future__ import division

We're going to analyse some tech stocks, and it seems like a good idea to look at their performance over the last year. We can create a list with the stock names, for future looping.

1#We're going to analyse stock info for Apple, Google, Microsoft, and Amazon
2tech_list = ['AAPL','GOOG','MSFT','AMZN','YHOO']
3
4#Setting the end date to today
5end = datetime.now()
6
7#Start date set to 1 year back
8start = datetime(end.year-1,end.month,end.day)
9
10#Using Yahoo Finance to grab the stock data
11for stock in tech_list:
12 globals()[stock] = DataReader(stock,'yahoo',start,end) #The globals method sets the stock name to a global variable

Thanks to the globals method, Apple's stock data will be stored in the AAPL global variable dataframe. Let's see if that worked.

Date
2015-09-23113.629997114.720001113.300003114.32000035756700111.926895
2015-09-24113.250000115.500000112.370003115.00000050219500112.592660
2015-09-25116.440002116.690002114.019997114.70999956151900112.308730
2015-09-28113.849998114.570000112.440002112.44000252109000110.086252
2015-09-29112.830002113.510002107.860001109.05999873365400106.777002
1#Basic stats for Apple's Stock
2AAPL.describe()
count253.000000253.000000253.000000253.0000002.530000e+02253.000000
mean104.824941105.777510103.881146104.8584984.179762e+07103.707287
std8.0737188.1103928.0193988.0759141.749642e+077.735402
min90.00000091.66999889.47000190.3399961.304640e+0789.853242
25%97.32000098.20999996.58000297.1399992.944520e+0796.348065
50%105.519997106.309998104.879997105.7900013.695570e+07104.701886
75%110.629997111.769997109.410004110.7799994.896780e+07109.220001
max123.129997123.820000121.620003122.5700001.333697e+08120.004194

And that easily, we can make out what the stock's minimum, maximum, and average price was for the last year.

1#Some basic info about the dataframe
2AAPL.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2015-09-23 to 2016-09-22
Data columns (total 6 columns):
Open 253 non-null float64
High 253 non-null float64
Low 253 non-null float64
Close 253 non-null float64
Volume 253 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.8 KB

No missing info in the dataframe above, so we can go about our business.

### What's the change in stock's price over time?

1#Plotting the stock's adjusted closing price using pandas

Similarily, we can plot change in a stock's volume being traded, over time.

1#Plotting the total volume being traded over time
2AAPL['Volume'].plot(legend=True,figsize=(12,5))

### What was the moving average of various stocks?

Let's check out the moving average for stocks over a 10, 20 and 50 day period of time. We'll add that information to the stock's dataframe.

1ma_day = [10,20,50]
2
3for ma in ma_day:
4 column_name = "MA for %s days" %(str(ma))
5
7
8AAPL.tail()
OpenHighLowCloseVolumeAdj CloseMA for 10 daysMA for 20 daysMA for 50 days
Date
2016-09-16115.120003116.129997114.040001114.91999879886900114.919998108.808999108.1500104.992706
2016-09-19115.190002116.180000113.250000113.58000247023000113.580002109.393999108.3610105.341124
2016-09-20113.050003114.120003112.510002113.57000034514300113.570000109.980999108.6140105.683375
2016-09-21113.849998113.989998112.440002113.55000336003200113.550003110.499999108.8490106.016473
2016-09-22114.349998114.940002114.000000114.62000331011700114.620003111.410000109.1785106.381911
1AAPL[['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(subplots=False,figsize=(12,5))
1<matplotlib.axes._subplots.AxesSubplot at 0x114a9c390>

Moving averages for more days have a smoother plot, as they're less reliable on daily fluctuations. So even though, Apple's stock has a slight dip near the start of September, it's generally been on an upward trend since mid-July.

### What was the daily return average of a stock?

1#The daily return column can be created by using the percentage change over the adjusted closing price
3
4AAPL['Daily Return'].tail()
Date
2016-09-16 -0.005624
2016-09-19 -0.011660
2016-09-20 -0.000088
2016-09-21 -0.000176
2016-09-22 0.009423
Name: Daily Return, dtype: float64
1#Plotting the daily return
2AAPL['Daily Return'].plot(figsize=(14,5),legend=True,linestyle='--',marker='o')

1sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='red')

Positive daily returns seem to be slightly more frequent than negative returns for Apple.

### What was the correlation between daily returns of different stocks?

3
4close_df.tail()
AAPLAMZNGOOGMSFTYHOO
Date
2016-09-16114.919998778.520020768.88000557.25000043.669998
2016-09-19113.580002775.099976765.70001256.93000043.189999
2016-09-20113.570000780.219971771.40997356.81000142.790001
2016-09-21113.550003789.739990776.21997157.75999844.139999
2016-09-22114.620003804.700012787.21002257.82000044.150002

Everything works as expected.

Just as we did earlier, we can use Pandas' pct_change method to get the daily returns of our stocks.

1rets_df = close_df.pct_change()
2
3rets_df.tail()
AAPLAMZNGOOGMSFTYHOO
Date
2016-09-16-0.0056240.011472-0.0037320.001049-0.007274
2016-09-19-0.011660-0.004393-0.004136-0.005590-0.010992
2016-09-20-0.0000880.0066060.007457-0.002108-0.009261
2016-09-21-0.0001760.0122020.0062350.0167220.031549
2016-09-220.0094230.0189430.0141580.0010390.000227

Let's try creating a scatterplot to visualise any correlations between different stocks. First we'll visualise a scatterplot for the relationship between the daily return of a stock to itself.

1sns.jointplot('GOOG','GOOG',rets_df,kind='scatter',color='green')

As expected, the relationship is perfectly linear because we're trying to correlate something with itself. Now, let's check out the relationship between Google and Apple's daily returns.

1sns.jointplot('GOOG','AAPL',rets_df,kind='scatter')

There seems to be a minor correlation between the two stocks, looking at the figure above. The Pearson R Correlation Coefficient value of 0.45 echoes that sentiment.

But what about other combinations of stocks?

1sns.pairplot(rets_df.dropna())

Quick and dirty overarching visualisation of the scatterplots and histograms of daily returns of our stocks. To see the actual numbers for the correlation coefficients, we can use seaborn's corrplot method.

1sns.corrplot(rets_df.dropna(),annot=True)

Google and Microsoft seem to have the highest correlation. But another interesting thing to note is that all tech companies that we explored are positively correlated.

### How much value do we put at risk by investing in a particular stock?

A basic way to quantify risk is to compare the expected return (which can be the mean of the stock's daily returns) with the standard deviation of the daily returns.

1rets = rets_df.dropna()
2
3plt.figure(figsize=(8,5))
4
5plt.scatter(rets.mean(),rets.std(),s=25)
6
7plt.xlabel('Expected Return')
8plt.ylabel('Risk')
9
10
11#For adding annotatios in the scatterplot
12for label,x,y in zip(rets.columns,rets.mean(),rets.std()):
13 plt.annotate(
14 label,
15 xy=(x,y),xytext=(-120,20),
16 textcoords = 'offset points', ha = 'right', va = 'bottom',
17 arrowprops = dict(arrowstyle='->',connectionstyle = 'arc3,rad=-0.5'))

We'd want a stock to have a high expected return and a low risk; Google and Microsoft seem to be the safe options for that. Meanwhile, Yahoo and Amazon stocks have higher expected returns, but also have a higher risk

### Value at Risk

We can treat Value at risk as the amount of money we could expect to lose for a given confidence interval. We'll use the 'Bootstrap' method and the 'Monte Carlo Method' to extract this value.

Bootstrap Method

Using this method, we calculate the empirical quantiles from a histogram of daily returns. The quantiles help us define our confidence interval.

1sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='purple')

To recap, our histogram for Apple's stock looked like the above. And our daily returns dataframe looked like:

AAPLAMZNGOOGMSFTYHOO
Date
2015-09-240.005948-0.0043280.0055270.000912-0.013450
2015-09-25-0.002522-0.017799-0.0221000.000683-0.007157
2015-09-28-0.019789-0.038512-0.027910-0.014793-0.052523
2015-09-29-0.030061-0.0158510.0001340.0034650.023913
2015-09-300.0113700.0318910.0226060.0188770.023001
1#Using Pandas built in qualtile method
2rets['AAPL'].quantile(0.05)
-0.025722813451247724

The 0.05 empirical quantile of daily returns is at -0.019. This means that with 95% confidence, the worst daily loss will not exceed 2.57% (of the investment).

### How can we attempt to predict future stock behaviour?

Monte Carlo Method

Check out this link for more info on the Monte Carlo method. In short: in this method, we run simulations to predict the future many times, and aggregate the results in the end for some quantifiable value.

1days = 365
2
3#delta t
4dt = 1/365
5
6mu = rets.mean()['GOOG']
7
8sigma = rets.std()['GOOG']
9
10#Function takes in stock price, number of days to run, mean and standard deviation values
11def stock_monte_carlo(start_price,days,mu,sigma):
12
13 price = np.zeros(days)
14 price[0] = start_price
15
16 shock = np.zeros(days)
17 drift = np.zeros(days)
18
19 for x in xrange(1,days):
20
21 #Shock and drift formulas taken from the Monte Carlo formula
22 shock[x] = np.random.normal(loc=mu*dt,scale=sigma*np.sqrt(dt))
23
24 drift[x] = mu * dt
25
26 #New price = Old price + Old price*(shock+drift)
27 price[x] = price[x-1] + (price[x-1] * (drift[x]+shock[x]))
28
29 return price

We're going to run the simulation of Google stocks. Let's check out the opening value of the stock.

Date
2015-09-23622.049988628.929993620.000000622.3599851470900622.359985
2015-09-24616.640015627.320007612.400024625.7999882240100625.799988
2015-09-25629.770020629.770020611.000000611.9699712174000611.969971
2015-09-28610.340027614.604980589.380005594.8900153127700594.890015
2015-09-29597.280029605.000000590.219971594.9699712309500594.969971

Let's do a simulation of 100 runs, and plot them.

1start_price = 622.049 #Taken from above
2
3for run in xrange(100):
4 plt.plot(stock_monte_carlo(start_price,days,mu,sigma))
5
6plt.xlabel('Days')
7plt.ylabel('Price')

1runs = 10000
2
3simulations = np.zeros(runs)
4
5for run in xrange(runs):
6 simulations[run] = stock_monte_carlo(start_price,days,mu,sigma)[days-1]
1q = np.percentile(simulations,1)
2
3plt.hist(simulations,bins=200)
4
5plt.figtext(0.6,0.8,s="Start price: \$%.2f" %start_price)
6
7plt.figtext(0.6,0.7,"Mean final price: \$%.2f" % simulations.mean())
8
9plt.figtext(0.6,0.6,"VaR(0.99): \$%.2f" % (start_price -q,))
10
11plt.figtext(0.15,0.6, "q(0.99): \$%.2f" % q)
12
13plt.axvline(x=q, linewidth=4, color='r')
14
15plt.title(u"Final price distribution for Google Stock after %s days" %days, weight='bold')

We can infer from this that, Google's stock is pretty stable. The starting price that we had was USD622.05, and the average final price over 10,000 runs was USD623.36.

The red line indicates the value of stock at risk at the desired confidence interval. For every stock, we'd be risking USD18.38, 99% of the time.