================ by Jawad Haider

07 - Pandas Excercise Solutions¶

Copyright Qalmaqihir For more information, visit us at www.github.com/qalmaqihir/

1 Pandas Exercises - Solutions
2 Great Job! Thats the end of this part.

Pandas Exercises - Solutions¶

TASK: Import pandas

# CODE HERE

import pandas as pd

TASK: Read in the bank.csv file that is located under the 01-Crash-Course-Pandas folder. Pay close attention to where the .csv file is located! Please don’t post to the QA forums if you can’t figure this one out, instead, run our solutions notebook directly to see how its done.

df = pd.read_csv('bank.csv')

TASK: Display the first 5 rows of the data set

# CODE HERE

df.head()

	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	previous	poutcome	y
0	30	unemployed	married	primary	no	1787	no	no	cellular	19	oct	79	1	-1	0	unknown	no
1	33	services	married	secondary	no	4789	yes	yes	cellular	11	may	220	1	339	4	failure	no
2	35	management	single	tertiary	no	1350	yes	no	cellular	16	apr	185	1	330	1	failure	no
3	30	management	married	tertiary	no	1476	yes	yes	unknown	3	jun	199	4	-1	0	unknown	no
4	59	blue-collar	married	secondary	no	0	yes	no	unknown	5	may	226	1	-1	0	unknown	no

TASK: What is the average (mean) age of the people in the dataset?

# CODE HERE

df['age'].mean()

41.17009511170095

TASK: What is the marital status of the youngest person in the dataset?

HINT

# CODE HERE

df['age'].idxmin()

df.iloc[503]['marital']

'single'

TASK: How many unique job categories are there?

# CODE HERE

df['job'].nunique()

TASK: How many people are there per job category? (Take a peek at the expected output)

# CODE HERE

df['job'].value_counts()

management       969
blue-collar      946
technician       768
admin.           478
services         417
retired          230
self-employed    183
entrepreneur     168
unemployed       128
housemaid        112
student           84
unknown           38
Name: job, dtype: int64

**TASK: What percent of people in the dataset were married? **

#CODE HERE

# Many, many ways to do this one! Here is just one way:
100*df['marital'].value_counts()['married']/len(df)

# df['marital].value_counts()

61.86684361866843

TASK: There is a column labeled “default”. Use pandas’ .map() method to create a new column called “default code” which contains a 0 if there was no default, or a 1 if there was a default. Then show the head of the dataframe with this new column.

Helpful Hint Link One

Helpful Hint Link Two

df['default code'] = df['default'].map({'no':0,'yes':1})

df.head()

	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	previous	poutcome	y
0	30	unemployed	married	primary	no	1787	no	no	cellular	19	oct	79	1	-1	0	unknown	no
1	33	services	married	secondary	no	4789	yes	yes	cellular	11	may	220	1	339	4	failure	no
2	35	management	single	tertiary	no	1350	yes	no	cellular	16	apr	185	1	330	1	failure	no
3	30	management	married	tertiary	no	1476	yes	yes	unknown	3	jun	199	4	-1	0	unknown	no
4	59	blue-collar	married	secondary	no	0	yes	no	unknown	5	may	226	1	-1	0	unknown	no

TASK: Using pandas .apply() method, create a new column called “marital code”. This column will only contained a shortened code of the possible marital status first letter. (For example “m” for “married” , “s” for “single” etc… See if you can do this with a lambda expression. Lots of ways to do this one!

Hint Link

# CODE HERE

df['marital code'] = df['marital'].apply(lambda status: status[0])

df.head()

	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	previous	poutcome	y	marital code
0	30	unemployed	married	primary	no	1787	no	no	cellular	19	oct	79	1	-1	0	unknown	no	m
1	33	services	married	secondary	no	4789	yes	yes	cellular	11	may	220	1	339	4	failure	no	m
2	35	management	single	tertiary	no	1350	yes	no	cellular	16	apr	185	1	330	1	failure	no	s
3	30	management	married	tertiary	no	1476	yes	yes	unknown	3	jun	199	4	-1	0	unknown	no	m
4	59	blue-collar	married	secondary	no	0	yes	no	unknown	5	may	226	1	-1	0	unknown	no	m

TASK: What was the longest lasting duration?

# CODE HERE

df['duration'].max()

TASK: What is the most common education level for people who are unemployed?

# CODE HERE

df[df['job']=='unemployed']['education'].value_counts()

secondary    68
tertiary     32
primary      26
unknown       2
Name: education, dtype: int64

TASK: What is the average (mean) age for being unemployed?

# CODE HERE

df[df['job']=='unemployed']['age'].mean()

40.90625

Great Job! Thats the end of this part.¶

Don't forget to give a star on github and follow for more curated Computer Science, Machine Learning materials