Stock and investments analysis is a theme that can be deeply explored in programming. This includes R language, which already has a big literature, packages and functions developed in this matter. In this post, we’ll do a brief introduction to the subject using the packages quantmod
and ggplot2
.
For those who are beginning, R is a programming language and integrated enviroment focused in statistics, but with a lot of applications in different areas. The software download can be done here using your preferred mirror. I also recommend installing RStudio, which is an interface with a lot of additional resources for R.
The analyzed stock here will be PBR, from the brazilian company Petrobras, with data extracted from Yahoo Finance using the package quantmod. quantmod
is a well known package used to quantitave financial modelling. Also, we’ll use the package ggplot2 for data visualization.
Preparing the environment
The environment preparation is quite simple, even for those who are beginning or those who aren’t used to code. Firstly, we install and load the necessary packages.
rm(list=ls())install.packages("quantmod")install.packages("ggplot2")library(quantmod)library(ggplot2)
I think it’s important to highlight that we only install the packages once. In the next times we run the code, it’ll be necessary only to load them using the library
command.
After installing and loading the packages, now we’ll download the stock prices series and treat the data in order to get them in the best possible format for the analysis. We download the data using the function getSymbols
.
pbr < getSymbols("PBR", src = "yahoo", from = "20130101", to = "20170601", auto.assign = FALSE)
Let’s do it step by step!
The first argument PBR
is the symbol in Yahoo Finance of the stock that we’re going to analyze. If you want to look for other stock or asset, you can search its symbol here.
The argument src= "yahoo"
indicates the source of the data. We can also use other sources like Google Finance, FRED, Oanda, local databases, CSVs and many others.
The third and fourth arguments indicate the time period in which the data is going to be extracted, with data in the format “yyyymmdd”.
Lastly, the argument auto.assign = FALSE
allows us to name the dataset with the name we want. In case it’s TRUE
, the name automatically will be the symbol we’re looking for, that is, the first argument.
Addendum
If Yahoo Finance API is not working, it’s possible to download the prices straight from the website. You just have to search for the stock tick or company name. In the asset page, click on “Historical Data”, just like the image below:
Then, select the desired period and click on “Download Data”. The download file will be in the format csv
. Now, just move it for your working directory, that can be discovered using the command getwd()
.
With the file in the folder, use the following commands to read the base:
pbr < read.csv("PBR.csv")pbr[,1] < as.Date(pbr[,1])pbr < xts(pbr)pbr < pbr[,1]
The command read.csv()
reads the file and assigns it to an object. Then, we transform the first column of the base to date format. Next, we use the command xts()
to transform the base from dataframe
type to xts
. Lastly, we remove the first column (date), since now the price lines are already indexed by day.
Prices Visualization
Now let’s take a brief look at the data, just to know them better. Run the following lines and analyze each output.
head(pbr)tail(pbr)summary(pbr)str(pbr)
## PBR.Open PBR.High PBR.Low PBR.Close PBR.Volume PBR.Adjusted## 20120103 26.993 28.025 26.939 26.11 12754300 24.54040## 20120104 27.567 28.280 27.567 26.46 12351500 24.86936## 20120105 27.993 28.057 27.525 26.11 8568600 24.54040## 20120106 27.929 27.929 27.280 25.69 8532100 24.14565## 20120109 27.748 28.695 27.588 26.88 26046600 25.26411## 20120110 29.078 29.461 28.993 27.45 16966500 25.79985
## PBR.Open PBR.High PBR.Low PBR.Close PBR.Volume PBR.Adjusted## 20170523 8.76 8.91 8.74 8.83 22071200 8.83## 20170524 8.95 9.20 8.88 9.08 25863100 9.08## 20170525 9.07 9.25 8.81 8.89 30557400 8.89## 20170526 8.74 9.03 8.72 8.95 22845300 8.95## 20170530 8.85 8.90 8.69 8.70 21082600 8.70## 20170531 8.67 8.76 8.44 8.48 23066100 8.48
## Index PBR.Open PBR.High PBR.Low ## Min. :20130102 Min. : 2.88 Min. : 2.97 Min. : 2.71 ## 1st Qu.:20140208 1st Qu.: 7.05 1st Qu.: 7.20 1st Qu.: 6.82 ## Median :20150318 Median :10.25 Median :10.40 Median :10.08 ## Mean :20150317 Mean :10.67 Mean :10.86 Mean :10.46 ## 3rd Qu.:20160423 3rd Qu.:14.42 3rd Qu.:14.60 3rd Qu.:14.15 ## Max. :20170531 Max. :20.83 Max. :20.94 Max. :19.96 ## PBR.Close PBR.Volume PBR.Adjusted ## Min. : 2.900 Min. : 6046400 Min. : 2.900 ## 1st Qu.: 7.005 1st Qu.: 17193900 1st Qu.: 7.005 ## Median :10.210 Median : 24066300 Median :10.230 ## Mean :10.657 Mean : 27527831 Mean :10.814 ## 3rd Qu.:14.330 3rd Qu.: 33655150 3rd Qu.:14.615 ## Max. :20.650 Max. :164885500 Max. :20.650
## An 'xts' object on 20130102/20170531 containing:## Data: num [1:1111, 1:6] 18.9 18.7 19.2 19.2 18.9 ...##  attr(*, "dimnames")=List of 2## ..$ : NULL## ..$ : chr [1:6] "PBR.Open" "PBR.High" "PBR.Low" "PBR.Close" ...## Indexed by objects of class: [Date] TZ: UTC## xts Attributes: ## List of 2## $ src : chr "yahoo"## $ updated: POSIXct[1:1], format: "20170716 20:52:31"
With the commands head()
and tail()
we can see the first and last 6 lines of the base. There are 6 columns with: opening price, maximum and minimum prices, closing price, volume of transactions and adjusted price. Using the command summary()
we verify the descriptive statistics of each price series and volume. The command str()
returns the object structure. In this case, it’s a xts object, a time series.
Now let’s plot daily prices, using the Adjusted Price column, since it incorporates events like splits and dividends distribution, which can affect the series.
ggplot(pbr, aes(x = index(pbr), y = pbr[,6])) + geom_line(color = "darkblue") + ggtitle("Petrobras prices series") + xlab("Date") + ylab("Price") + theme(plot.title = element_text(hjust = 0.5)) + scale_x_date(date_labels = "%b %y", date_breaks = "6 months")
We created this graphic using the command ggplot
. First, we use the object pbr
as the series to be ploted. Then we indicate which elements will be in the axes: index(pbr)
, the date in xaxis, and the adjusted price column, pbr[,6]
, in yaxis. Next, we add the element to be ploted, in this case, a blue line: geom_line(color = "darkblue")
.
Afterwards, we include the title and names of the axes, with the commands ggtitle("Petrobras prices series")
, xlab("Date")
, ylab("Price")
. By standard, the graph title is aligned to the left. To centralize it, the command theme(plot.title = element_text(hjust = 0.5))
is used.
Lastly, to make the temporal axis more informative, we put the date tick at every 6 months in the format mmm aa
using scale_x_date(date_labels = "%b %y", date_breaks = "6 months")
.
In stocks Technical Analysis, a very used technique is the plot of Moving Averages in the prices graphs. A simple moving average is an arithmetic average from the last \(q\) days from a \(x_{t}\) series in the \(t\) time period. So, the moving average \(MA^{q}_{t}\) is given by:
\[MA^{q}_{t}= \frac{1}{q} \sum_{i=0}^{q1}x_{t1}\]
This indicator is interesting because it helps to identify trends and smooths noises from prices. That is, the bigger the days window for the MA calculation, smaller is the MA responsiveness to price variation. The smaller the window, the faster MA adapts itself to changes. Now let’s calculate two moving averages for the stock prices series, one with 10 days window and the other with 30 days:
pbr_mm < subset(pbr, index(pbr) >= "20160101")pbr_mm10 < rollmean(pbr_mm[,6], 10, fill = list(NA, NULL, NA), align = "right")pbr_mm30 < rollmean(pbr_mm[,6], 30, fill = list(NA, NULL, NA), align = "right")pbr_mm$mm10 < coredata(pbr_mm10)pbr_mm$mm30 < coredata(pbr_mm30)
First we subset the base for data since 2016 using the function subset()
. Then, we use the function rollmean()
, which takes as argument: the series \((x_t)\), in this case the adjusted price; the window of periods \((q)\); an optional fill argument, that is used to complete the days where it’s not possible to calculate the moving average, since the enough quantity of days hasn’t passed; the argument align
indicates if the moving average should be calculated using the periods in the left, in the center or in the right of the \(t\) day of the series. Lastly, we add the MA to two new columns in the initial database.
We calculated the two MA using 10 and 30 days of windows, filling the values with NA and using the periods in the left. Afterwards, we can plot both series in the same graphic of prices to identify trends. An existing theory in Technical Analysis is the one that when two MA of short and long term cross each other, there is an indication of buying or selling the stock. When the short term MA cross the long term upwards, there’s a buy a signal. When the opposite happens, there’s a sell signal.
Ploting the prices series and the moving averages for all days since 2016:
ggplot(pbr_mm, aes(x = index(pbr_mm))) + geom_line(aes(y = pbr_mm[,6], color = "PBR")) + ggtitle("Petrobras prices series") + geom_line(aes(y = pbr_mm$mm10, color = "MM10")) + geom_line(aes(y = pbr_mm$mm30, color = "MM30")) + xlab("Date") + ylab("Price") + theme(plot.title = element_text(hjust = 0.5), panel.border = element_blank()) + scale_x_date(date_labels = "%b %y", date_breaks = "3 months") + scale_colour_manual("Series", values=c("PBR"="gray40", "MM10"="firebrick4", "MM30"="darkcyan"))
To create the graph, we plot the line of prices and the lines of moving averages. In this case, we plot each line differently, creating a kind of nickname for the color of each one. Then, we add the line scale_colour_manual
, indicating the color of each nickname in order to make the color visible in the legend of the graph.
Verifying the plot, it’s possible to notice that there were 14 points in which the series crossed themselves and 1 point where they overlapped. Following the buy signal, we’d have been successful in 4 of 7 times. Following the sell indication, we’d have done right 5 in 7 times. In total, we’d be 9/14, that is 64% of success with a quite simple indicator. Of course this metric shouldn’d be used alone, many other informations should be considerated.
Returns!
We have seen how the stock price has changed over time. Now we’ll verify how the stock return has behaved in the same period. To do this, we first need to create a new object with the calculated returns, using the adjusted prices column:
pbr_ret < diff(log(pbr[,6]))pbr_ret < pbr_ret[1,]
What we’ve done here was using logarithm properties to calculate the logreturn of the stock. We did:
\[r_{t} = ln(1+R_t) = ln\bigg(\frac{P_t}{P_{t1}}\bigg) = ln(P_{t})ln(P_{t1}) \approx R_{t}\]
The diff
command calculates the difference of all values in any vector or element. With this, we only apply the difference to natural logarithms of stock prices.
In addition to this way, it’s possible to calculate returns differently. The package quantmod
has some interesting functions to do this. Firstly, it’s really simple to select only one column of prices of each stock. For example:
Op(pbr)Cl(pbr)Ad(pbr)
Each line will have as output the opening, closing and adjusted prices, respectively. The same accounts for other columns: Hi()
, Lo()
and Vo()
, for maximum and minimum prices and volume of transactions.
For the returns, we simply adapt and define which columns will be used, for example: ClCl()
will give us the returns using the closing prices from two periods; OpCl()
will result in the return from the closing price over the opening price from the same day.
Another interesting possibility given by quantmod
is the calculation of returns for different periods. For example, it’s possible to calculate the returns by day, week, month, quarter and year, just by using the following commands:
dailyReturn(pbr)weeklyReturn(pbr)monthlyReturn(pbr)quarterlyReturn(pbr)yearlyReturn(pbr)
All these commands make the analysis and the calculation much faster and simpler to do.
Now let’s verify some basic statistics from Petrobras returns.
summary(pbr_ret)## Index PBR.Adjusted ## Min. :20130103 Min. :0.1852413 ## 1st Qu.:20140210 1st Qu.:0.0209915 ## Median :20150318 Median :0.0005999 ## Mean :20150318 Mean :0.0007548 ## 3rd Qu.:20160424 3rd Qu.: 0.0177905 ## Max. :20170531 Max. : 0.1555103
sd(pbr_ret)## [1] 0.03713089
This indicates that in average the stock hasn’t performed well. Now we can plot the returns and see how they’ve done over time:
ggplot(pbr_ret, aes(x = index(pbr_ret), y = pbr_ret)) + geom_line(color = "deepskyblue4") + ggtitle("Petrobras returns series") + xlab("Date") + ylab("Return") + theme(plot.title = element_text(hjust = 0.5)) + scale_x_date(date_labels = "%b %y", date_breaks = "6 months")
To plot this last graph, we used the same parameters of the prices graph, changing only the line color.
Making a brief analysis of the graphic, it’s possible to see that the smallest return of the series happened around may. More specifically, in may 18, one day after the news release with recordings of brazilian president Michel Temer. In the audios, he agreed to give payments in exchange of silence from arrested politicians. This fact strongly impacted the stocks market, specially brazilian and public companies, like Petrobras.
Now let’s take a small look at the stock returns in 2017:
pbr_ret17 < subset(pbr_ret, index(pbr_ret) > "20170101")ggplot(pbr_ret17, aes(x = index(pbr_ret17), y = pbr_ret17)) + geom_line(color = "deepskyblue4") + ggtitle("Petrobras returns series in 2017") + xlab("Date") + ylab("Return") + theme(plot.title = element_text(hjust = 0.5)) + scale_x_date(date_labels = "%b %y", date_breaks = "1 months")
summary(pbr_ret17)sd(pbr_ret17)
## Index PBR.Adjusted ## Min. :20170103 Min. :0.185241 ## 1st Qu.:20170208 1st Qu.:0.014350 ## Median :20170317 Median :0.002689 ## Mean :20170317 Mean :0.001707 ## 3rd Qu.:20170424 3rd Qu.: 0.014977 ## Max. :20170531 Max. : 0.068795## [1] 0.03089105
We separated in an object all of the returns from 2017, using the function subset()
. With this, the descriptive statistics still suggest a negative average return. In other hand, the standard deviation is smaller, which indicates smaller risk. Furthermore, this negative return is given by, in majority, by the big fall in the day 18.
Conclusion
That’s all for this tutorial, folks. In future posts I pretend to do deeper analysis and also approach portfolio and risk theory. I hope you’ve liked!
 ← Previous Post
 Next Post →
Search Results for Stock and Investments Analysis in R Programming

"Introduction to Quantitative Trading Strategies in R"  This article provides an overview of quantitative trading strategies and how they can be implemented in R using the quantmod package. It covers topics such as data retrieval, data visualization, and the calculation of technical indicators. The article also includes code examples and explanations of the various functions available in the quantmod package.

"Getting Started with Financial Analysis in R"  This tutorial introduces the basics of financial analysis in R, including data retrieval, data manipulation, and data visualization. It covers topics such as calculating returns, moving averages, and volatility. The tutorial includes code examples and explanations of the various functions available in the quantmod package.

"Introduction to Stock Market Analysis with R"  This tutorial provides an introduction to stock market analysis using R. It covers topics such as data retrieval, data visualization, and the calculation of technical indicators. The tutorial includes code examples and explanations of the various functions available in the quantmod package.

"An Introduction to Financial Analysis in R"  This article provides an introduction to financial analysis in R, including data retrieval, data manipulation, and data visualization. It covers topics such as calculating returns, moving averages, and volatility. The article includes code examples and explanations of the various functions available in the quantmod package.

"Stock Market Analysis with R"  This tutorial provides an overview of stock market analysis using R. It covers topics such as data retrieval, data visualization, and the calculation of technical indicators. The tutorial includes code examples and explanations of the various functions available in the quantmod package.
Based on these search results, it is clear that there are several resources available for learning about stock and investments analysis in R programming. These resources cover topics such as data retrieval, data manipulation, data visualization, and the calculation of technical indicators. The quantmod package is commonly used for these tasks and provides a wide range of functions for financial analysis.