Sunday, November 11, 2018

Introduction to Data Science

Introduction to Data Science

Image result for data science image

 

It is the process of using data to understand different things, to understand the world. Data science is the art of uncovering the insights and trends that are hiding behind data.

It's when you translate data into a story,so you use the storytelling to generate insights.And with these insights,you can make strategic choices for a company or institution. 

Application of Data Science

How now companies can use all the information they're gathering from their customers to actually develop new products that respond to the needs of the customers.

Google Search is an application of data science.

How data science can forecast and predict these consumer behaviors
based on past purchased history.

Data scientist is one who finds solutions to problems by analyzing big or small data
using appropriate tools and then tells stories to communicate her findings to the relevant stakeholders. 

Python Basics:

Download ANACONDA using below link
https://www.anaconda.com/download/ 
Install jupyter
Launch jupyter notebook

Lists –A list can simply be defined by writing a list of comma separated values in square brackets

 

 Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Please note that Python strings are immutable, so you can not change part of strings.

 

lower()
upper()
strip()
isdigit()
isspace()
find()
replace()
split()
join()

 

 

Tuples – A tuple is represented by a number of values separated by commas. Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly. Additionally, even though tuples are immutable, they can hold mutable data if needed.

 

Dictionary – Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. 

 

The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python:
import math as m

we have defined an alias m to library math.We can now use various functions
from math library (e.g. factorial) by referencing it using the alias m.factorial().
from math import *

you have imported the entire name space in math i.e.you can directly use factorial() without referring to math.
 
Exploratory analysis in Python using Pandas

Pandas is one of the most useful data analysis library in Python.They have been instrumental in increasing the use of Python in data science community.

Importing libraries and the data set:

Following are the libraries we will use during this tutorial:
  • numpy
  • matplotlib
  • pandas
import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline

df = pd.read_csv("C:/PYTHON/train.csv")
 
NOTE:Download file( train.csv)
using the link 
 
Quick Data Exploration

Once you have read the dataset, you can have a look at few top rows by using the function head()
df.head(10)
 
you can look at summary of numerical fields by using describe() function

df.describe()

describe() function would provide count, mean, standard deviation (std), min, quartiles and max in its output

For the non-numerical values (e.g. Property_Area, Credit_History etc.), we can look at frequency distribution to understand whether they make sense or not. The frequency table can be printed by following command:

df['Property_Area'].value_counts()

Distribution analysis:

Lets start by plotting the histogram of ApplicantIncome using the following commands:

df['ApplicantIncome'].hist(bins=50)

we look at box plots to understand the distributions. Box plot for fare can be plotted by:

df.boxplot(column='ApplicantIncome')

we are looking at people with different education levels. Let us segregate them by Education:

df.boxplot(column='ApplicantIncome')
1

 

We can see that there is no substantial different between the mean income of graduate and non-graduates. But there are a higher number of graduates with very high incomes, which are appearing to be the outliers. 

To be continue...

Amazon Offer

 


0 comments:

Post a Comment