Skip to content

================ by Jawad Haider

06 - Input Output


Image
Copyright Qalmaqihir
For more information, visit us at www.github.com/qalmaqihir/



NOTE: Typically we will just be either reading csv files directly or using pandas-datareader to pull data from the web. Consider this lecture just a quick overview of what is possible with pandas (we won’t be working with SQL or Excel files in this course)

Data Input and Output

This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. Let’s take a look at the most common data types:

import numpy as np
import pandas as pd

CSV

Comma Separated Values files are text files that use commas as field delimeters.
Unless you’re running the virtual environment included with the course, you may need to install xlrd and openpyxl.
In your terminal/command prompt run:

conda install xlrd
conda install openpyxl

Then restart Jupyter Notebook. (or use pip install if you aren’t using the Anaconda Distribution)

CSV Input

df = pd.read_csv('example.csv')
df
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15

CSV Output

df.to_csv('example.csv',index=False)

Excel

Pandas can read and write MS Excel files. However, this only imports data, not formulas or images. A file that contains images or macros may cause the .read_excel()method to crash.

Excel Input

pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15

Excel Output

df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')

HTML

Pandas can read table tabs off of HTML.
Unless you’re running the virtual environment included with the course, you may need to install lxml, htmllib5, and BeautifulSoup4.
In your terminal/command prompt run:

conda install lxml
conda install html5lib
conda install beautifulsoup4

Then restart Jupyter Notebook. (or use pip install if you aren’t using the Anaconda Distribution)

HTML Input

Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects:

df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
df[0].head()
Bank Name City ST CERT Acquiring Institution Closing Date Updated Date
0 Washington Federal Bank for Savings Chicago IL 30570 Royal Savings Bank December 15, 2017 February 21, 2018
1 The Farmers and Merchants State Bank of Argonia Argonia KS 17719 Conway Bank October 13, 2017 February 21, 2018
2 Fayette County Bank Saint Elmo IL 1802 United Fidelity Bank, fsb May 26, 2017 July 26, 2017
3 Guaranty Bank, (d/b/a BestBank in Georgia & Mi... Milwaukee WI 30003 First-Citizens Bank & Trust Company May 5, 2017 March 22, 2018
4 First NBC Bank New Orleans LA 58302 Whitney Bank April 28, 2017 December 5, 2017

Great Job! Thats the end of this part.

Don't forget to give a star on github and follow for more curated Computer Science, Machine Learning materials