Download Financial Dataset Using Yahoo Finance in Python | A Complete Guide
This article was published as a part of the Data Science Blogathon
The article aims to empower you to create your projects by learning how to create your data frame and collect data about the stock market and the crypto market from the internet and then base your code on it. This will allow you to create your ML models and experiment with real-world data.
In this article, I will demonstrate two methods and both use Yahoo Finance as the data source since it is free and no registration is required. You can use any other data source like Quandi, Tiingo, IEX Cloud, and more.
In the first approach, we will consider the finance module in python and it is a very easy module to work with. The other module we will talk about is yahoofinancials which requires extra effort but gives back a whole lot of extra information in return. We will discuss that later and now we will begin by importing the required modules into our code.
We need to load the following libraries:
import pandas as pd import yfinance as yf from yahoofinancials import YahooFinancials
If you do not have these libraries, you can install them via pip.
!pip install yfinance !pip install yahoofinancials
First Method: How to use yfinance
It was previously known as ‘fix_yahoo_finance’ but later it transformed into a module of its own but it is not an official one by Yahoo. The module ‘yfinance’ is now a very popular library that is very python friendly and can be used as a patch to pandas_datareader or a standalone library in itself. It has many potential uses and many people use it to download stock prices and also crypto prices. Without any further delay, let us execute the following code. We will begin by downloading the stock price of ‘Apple’
aapl_df = yf.download('AAPL', start='2019-01-01', end='2021-06-12', progress=False, ) aapl_df.head()
The data interval is set to 1 day but the internal can be externally specified with values like 1m,5m,15m,30m,60m,1h,1d,1wk,1mo, and more. The above command for downloading the data shows a start and an end date but you can also simply download the data with the code given below :
aapl_df = yf.download('AAPL')
There are many parameters of the download function which you can find in the documentation and start and end are some of the most common ones to be used. Since the data was small, the progress bar was set to false and showing it makes no sense and should be used for high volume or data.
We can also download multiple stock prices of more than one asset at one time. By providing a list of company names in a list format ( eg. [‘FB’,’ MSFT’,’AAPL’] )as the tickers argument. We can also provide an additional argument which is auto-adjust=True, so that all the current prices are adjusted for potential corporate actions like splits.
Apart from the yf.download function, we can also use the ticker module and you can execute the below code to download the last 5year stock prices of Apple.
ticker = yf.Ticker('AAPL') aapl_df = ticker.history(period="5y") aapl_df['Close'].plot(title="APPLE's stock price")
The one advantage of using a ticker module is that the multiple methods which are connected to it can be exploited. The available methods we can use are :
info – This method prints out a JSON formatter output which contains a lot of information about the company starting from their business full name, summary, industry, exchanges listed on with country and time zone, and more. It also comes equipped with the beta coefficient.
recommendations – This method contains a historical list of recommendations made by different analysts regarding the stock and whether to buy sell or give suggestions on it.
actions – This displays the actions like splits and dividends.
major_holders – This method displays the major holders of the share along with other relevant details.
institutional_holders – This method shows all the institutional holders of a particular share.
calendar – This function shows all the incoming events such as the earnings and you can even add this to your google calendar through code. Basically, it shows the important dividend dates for a company.
If you still want to explore more regarding the working of the functions, you can check out this GitHub repository of yfinance.
Second Method: How to use yahoofinancials?
The second method is to use the yahoofinancials module which is a bit tougher to work with but it provides much more information than yfinance. We will begin by downloading Apple’s stock prices.
To do this we will first pass an object of YahooFinancials bypassing the Apply ticker name and then use a variety of important information to get out the required data. Here the returned data is in a JSON format and hence we do some beautification to it so that it can be transformed into a DataFrame to display it properly.
yahoo_financials = YahooFinancials('AAPL') data = yahoo_financials.get_historical_price_data(start_date='2019-01-01', end_date='2019-12-31', time_interval='weekly') aapl_df = pd.DataFrame(data['AAPL']['prices']) aapl_df = aapl_df.drop('date', axis=1).set_index('formatted_date') aapl_df.head()
Coming down on a technical level, the process of obtaining a historical stock price is a bit longer than the case of yfinance but that is mostly due to the huge volume of data. Now we move onto some of the important functions of yahoofinancials.
get_stock_quote_type_data() – This method returns a lot of generic information about a stock which is similar to the yfinance info() function. The output is something like this.
get_summary_data() – This method returns a summary of the whole company along with useful data like the beta value, price to book value, and more.
get_stock_earnings_data() – THis method returns the information on the quarterly and yearly earnings of the company along with the next date when the company will report its earnings.
get_financial_stmts() – This is another useful method to retrieve financial statements of a company which is useful for the analysis of a stock
get_historical_price_data() – This is a method similar to the download() or Ticker() function to get the prices of stock with start_date, end_date and interval ranges.
The above module can also be used to download company data at once like yfinance and cryptocurrency data can also be downloaded as shown in the following code.
yahoo_financials = YahooFinancials('BTC-USD') data=yahoo_financials.get_historical_price_data("2019-07-10", "2021-05-30", "monthly") btc_df = pd.DataFrame(data['BTC-USD']['prices']) btc_df = btc_df.drop('date', axis=1).set_index('formatted_date') btc_df.head()
For more details about the module, you can check out its GitHub Repository.
The full information is ultimately sourced from Yahoo Finance and now you know how to import any stock or cryptocurrency price and information dataset into your code and begin exploring and experimenting with them. Good luck with your adventures and feel free to share your code with me on LinkedIn or feel free to reach out to me in case of any doubts or errors.
Thank you for reading till the end. Hope you are doing well and stay safe and are getting vaccinated soon or already are.
About the Author :
Data Engineer & Python Developer | Freelance Tech Writer