Sajal Sharma

# Stock Market Analysis for Tech Stocks

13.10.2016 -
Python, Pandas, Seaborn, Financial Analysis

In this project, we'll analyse data from the stock market for some technology stocks.

Again, we'll use Pandas to extract and analyse the information, visualise it, and look at different ways to analyse the risk of a stock, based on its performance history.

Here are the questions we'll try to answer:

• What was the change in a stock's price over time?
• What was the daily return average of a stock?
• What was the moving average of various stocks?
• What was the correlation between daily returns of different stocks?
• How much value do we put at risk by investing in a particular stock?
• How can we attempt to predict future stock behaviour?
``````1#Python Data Analysis imports2import pandas as pd3from pandas import Series,DataFrame4import numpy as np5
6#Visualisation imports7import matplotlib.pyplot as plt8import seaborn as sns9sns.set_style('whitegrid')10%matplotlib inline11
12#To grab stock data13from pandas.io.data import DataReader14from datetime import datetime15
16#To handle floats in Python 217from __future__ import division``````

We're going to analyse some tech stocks, and it seems like a good idea to look at their performance over the last year. We can create a list with the stock names, for future looping.

``````1#We're going to analyse stock info for Apple, Google, Microsoft, and Amazon2tech_list = ['AAPL','GOOG','MSFT','AMZN','YHOO']3
4#Setting the end date to today5end = datetime.now()6
7#Start date set to 1 year back8start = datetime(end.year-1,end.month,end.day) 9
10#Using Yahoo Finance to grab the stock data11for stock in tech_list:12    globals()[stock] = DataReader(stock,'yahoo',start,end) #The globals method sets the stock name to a global variable``````

Thanks to the globals method, Apple's stock data will be stored in the AAPL global variable dataframe. Let's see if that worked.

``1AAPL.head()``
Date
2015-09-23113.629997114.720001113.300003114.32000035756700111.926895
2015-09-24113.250000115.500000112.370003115.00000050219500112.592660
2015-09-25116.440002116.690002114.019997114.70999956151900112.308730
2015-09-28113.849998114.570000112.440002112.44000252109000110.086252
2015-09-29112.830002113.510002107.860001109.05999873365400106.777002
``1#Basic stats for Apple's Stock2AAPL.describe()``
count253.000000253.000000253.000000253.0000002.530000e+02253.000000
mean104.824941105.777510103.881146104.8584984.179762e+07103.707287
std8.0737188.1103928.0193988.0759141.749642e+077.735402
min90.00000091.66999889.47000190.3399961.304640e+0789.853242
25%97.32000098.20999996.58000297.1399992.944520e+0796.348065
50%105.519997106.309998104.879997105.7900013.695570e+07104.701886
75%110.629997111.769997109.410004110.7799994.896780e+07109.220001
max123.129997123.820000121.620003122.5700001.333697e+08120.004194

And that easily, we can make out what the stock's minimum, maximum, and average price was for the last year.

``1#Some basic info about the dataframe2AAPL.info()``
``<class 'pandas.core.frame.DataFrame'>DatetimeIndex: 253 entries, 2015-09-23 to 2016-09-22Data columns (total 6 columns):Open         253 non-null float64High         253 non-null float64Low          253 non-null float64Close        253 non-null float64Volume       253 non-null int64Adj Close    253 non-null float64dtypes: float64(5), int64(1)memory usage: 13.8 KB``

No missing info in the dataframe above, so we can go about our business.

### What's the change in stock's price over time?

``1#Plotting the stock's adjusted closing price using pandas2AAPL['Adj Close'].plot(legend=True,figsize=(12,5))``

Similarily, we can plot change in a stock's volume being traded, over time.

``1#Plotting the total volume being traded over time2AAPL['Volume'].plot(legend=True,figsize=(12,5))``

### What was the moving average of various stocks?

Let's check out the moving average for stocks over a 10, 20 and 50 day period of time. We'll add that information to the stock's dataframe.

``````1ma_day = [10,20,50]2
3for ma in ma_day:4    column_name = "MA for %s days" %(str(ma))5    6    AAPL[column_name] = AAPL['Adj Close'].rolling(window=ma,center=False).mean()7
8AAPL.tail()``````
OpenHighLowCloseVolumeAdj CloseMA for 10 daysMA for 20 daysMA for 50 days
Date
2016-09-16115.120003116.129997114.040001114.91999879886900114.919998108.808999108.1500104.992706
2016-09-19115.190002116.180000113.250000113.58000247023000113.580002109.393999108.3610105.341124
2016-09-20113.050003114.120003112.510002113.57000034514300113.570000109.980999108.6140105.683375
2016-09-21113.849998113.989998112.440002113.55000336003200113.550003110.499999108.8490106.016473
2016-09-22114.349998114.940002114.000000114.62000331011700114.620003111.410000109.1785106.381911
``1AAPL[['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(subplots=False,figsize=(12,5))``
``1<matplotlib.axes._subplots.AxesSubplot at 0x114a9c390>``

Moving averages for more days have a smoother plot, as they're less reliable on daily fluctuations. So even though, Apple's stock has a slight dip near the start of September, it's generally been on an upward trend since mid-July.

### What was the daily return average of a stock?

``````1#The daily return column can be created by using the percentage change over the adjusted closing price2AAPL['Daily Return'] = AAPL['Adj Close'].pct_change()3
4AAPL['Daily Return'].tail()``````
``Date2016-09-16   -0.0056242016-09-19   -0.0116602016-09-20   -0.0000882016-09-21   -0.0001762016-09-22    0.009423Name: Daily Return, dtype: float64``
``1#Plotting the daily return2AAPL['Daily Return'].plot(figsize=(14,5),legend=True,linestyle='--',marker='o')``

``1sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='red')``

Positive daily returns seem to be slightly more frequent than negative returns for Apple.

### What was the correlation between daily returns of different stocks?

``````1#Reading just the 'Adj Close' column this time2close_df = DataReader(tech_list,'yahoo',start,end)['Adj Close']3
4close_df.tail()``````
AAPLAMZNGOOGMSFTYHOO
Date
2016-09-16114.919998778.520020768.88000557.25000043.669998
2016-09-19113.580002775.099976765.70001256.93000043.189999
2016-09-20113.570000780.219971771.40997356.81000142.790001
2016-09-21113.550003789.739990776.21997157.75999844.139999
2016-09-22114.620003804.700012787.21002257.82000044.150002

Everything works as expected.

Just as we did earlier, we can use Pandas' pct_change method to get the daily returns of our stocks.

``````1rets_df = close_df.pct_change()2
3rets_df.tail()``````
AAPLAMZNGOOGMSFTYHOO
Date
2016-09-16-0.0056240.011472-0.0037320.001049-0.007274
2016-09-19-0.011660-0.004393-0.004136-0.005590-0.010992
2016-09-20-0.0000880.0066060.007457-0.002108-0.009261
2016-09-21-0.0001760.0122020.0062350.0167220.031549
2016-09-220.0094230.0189430.0141580.0010390.000227

Let's try creating a scatterplot to visualise any correlations between different stocks. First we'll visualise a scatterplot for the relationship between the daily return of a stock to itself.

``1sns.jointplot('GOOG','GOOG',rets_df,kind='scatter',color='green')``

As expected, the relationship is perfectly linear because we're trying to correlate something with itself. Now, let's check out the relationship between Google and Apple's daily returns.

``1sns.jointplot('GOOG','AAPL',rets_df,kind='scatter')``

There seems to be a minor correlation between the two stocks, looking at the figure above. The Pearson R Correlation Coefficient value of 0.45 echoes that sentiment.

But what about other combinations of stocks?

``1sns.pairplot(rets_df.dropna())``

Quick and dirty overarching visualisation of the scatterplots and histograms of daily returns of our stocks. To see the actual numbers for the correlation coefficients, we can use seaborn's corrplot method.

``1sns.corrplot(rets_df.dropna(),annot=True)``

Google and Microsoft seem to have the highest correlation. But another interesting thing to note is that all tech companies that we explored are positively correlated.

### How much value do we put at risk by investing in a particular stock?

A basic way to quantify risk is to compare the expected return (which can be the mean of the stock's daily returns) with the standard deviation of the daily returns.

``````1rets = rets_df.dropna()2
3plt.figure(figsize=(8,5))4
5plt.scatter(rets.mean(),rets.std(),s=25)6
7plt.xlabel('Expected Return')8plt.ylabel('Risk')9
10
11#For adding annotatios in the scatterplot12for label,x,y in zip(rets.columns,rets.mean(),rets.std()):13    plt.annotate(14    label,15    xy=(x,y),xytext=(-120,20),16    textcoords = 'offset points', ha = 'right', va = 'bottom',17    arrowprops = dict(arrowstyle='->',connectionstyle = 'arc3,rad=-0.5'))``````

We'd want a stock to have a high expected return and a low risk; Google and Microsoft seem to be the safe options for that. Meanwhile, Yahoo and Amazon stocks have higher expected returns, but also have a higher risk

### Value at Risk

We can treat Value at risk as the amount of money we could expect to lose for a given confidence interval. We'll use the 'Bootstrap' method and the 'Monte Carlo Method' to extract this value.

Bootstrap Method

Using this method, we calculate the empirical quantiles from a histogram of daily returns. The quantiles help us define our confidence interval.

``1sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='purple')``

To recap, our histogram for Apple's stock looked like the above. And our daily returns dataframe looked like:

``1rets.head()``
AAPLAMZNGOOGMSFTYHOO
Date
2015-09-240.005948-0.0043280.0055270.000912-0.013450
2015-09-25-0.002522-0.017799-0.0221000.000683-0.007157
2015-09-28-0.019789-0.038512-0.027910-0.014793-0.052523
2015-09-29-0.030061-0.0158510.0001340.0034650.023913
2015-09-300.0113700.0318910.0226060.0188770.023001
``1#Using Pandas built in qualtile method2rets['AAPL'].quantile(0.05)``
``-0.025722813451247724``

The 0.05 empirical quantile of daily returns is at -0.019. This means that with 95% confidence, the worst daily loss will not exceed 2.57% (of the investment).

### How can we attempt to predict future stock behaviour?

Monte Carlo Method

Check out this link for more info on the Monte Carlo method. In short: in this method, we run simulations to predict the future many times, and aggregate the results in the end for some quantifiable value.

``````1days = 3652
3#delta t4dt = 1/3655
6mu = rets.mean()['GOOG']7
8sigma = rets.std()['GOOG']9
10#Function takes in stock price, number of days to run, mean and standard deviation values11def stock_monte_carlo(start_price,days,mu,sigma):12    13    price = np.zeros(days)14    price[0] = start_price15    16    shock = np.zeros(days)17    drift = np.zeros(days)18    19    for x in xrange(1,days):20        21        #Shock and drift formulas taken from the Monte Carlo formula22        shock[x] = np.random.normal(loc=mu*dt,scale=sigma*np.sqrt(dt))23        24        drift[x] = mu * dt25        26        #New price = Old price + Old price*(shock+drift)27        price[x] = price[x-1] + (price[x-1] * (drift[x]+shock[x]))28        29    return price``````

We're going to run the simulation of Google stocks. Let's check out the opening value of the stock.

``1GOOG.head()``
Date
2015-09-23622.049988628.929993620.000000622.3599851470900622.359985
2015-09-24616.640015627.320007612.400024625.7999882240100625.799988
2015-09-25629.770020629.770020611.000000611.9699712174000611.969971
2015-09-28610.340027614.604980589.380005594.8900153127700594.890015
2015-09-29597.280029605.000000590.219971594.9699712309500594.969971

Let's do a simulation of 100 runs, and plot them.

``````1start_price = 622.049 #Taken from above2
3for run in xrange(100):4    plt.plot(stock_monte_carlo(start_price,days,mu,sigma))5

``````1runs = 100002
3simulations = np.zeros(runs)4
5for run in xrange(runs):6    simulations[run] = stock_monte_carlo(start_price,days,mu,sigma)[days-1]``````
``````1q = np.percentile(simulations,1)2
3plt.hist(simulations,bins=200)4
5plt.figtext(0.6,0.8,s="Start price: \$%.2f" %start_price)6
7plt.figtext(0.6,0.7,"Mean final price: \$%.2f" % simulations.mean())8
9plt.figtext(0.6,0.6,"VaR(0.99): \$%.2f" % (start_price -q,))10
11plt.figtext(0.15,0.6, "q(0.99): \$%.2f" % q)12
13plt.axvline(x=q, linewidth=4, color='r')14
15plt.title(u"Final price distribution for Google Stock after %s days" %days, weight='bold')``````

We can infer from this that, Google's stock is pretty stable. The starting price that we had was USD622.05, and the average final price over 10,000 runs was USD623.36.

The red line indicates the value of stock at risk at the desired confidence interval. For every stock, we'd be risking USD18.38, 99% of the time.