Introduction to Data Science
It is the process of using data to understand different things, to understand the world. Data science is the art of uncovering the insights and trends that are hiding behind data.
It's when you translate data into a story,so you use the storytelling to generate insights.And with these insights,you can make strategic choices for a company or institution.
Application of Data Science
How now companies can use all the information they're gathering from their customers to actually develop new products that respond to the needs of the customers.
Google Search is an application of data science.
How data science can forecast and predict these consumer behaviors
based on past purchased history.
based on past purchased history.
Data scientist is one who finds solutions to problems by analyzing big or small data
using appropriate tools and then tells stories to communicate her findings to the relevant stakeholders.
using appropriate tools and then tells stories to communicate her findings to the relevant stakeholders.
Python Basics:
Download ANACONDA using below link
https://www.anaconda.com/download/
Install jupyter
Launch jupyter notebook
Lists –A list can simply be defined by writing a list of
comma separated values in square brackets
Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Please note that Python strings are immutable, so you can not change part of strings.
lower()
upper()
strip()
isdigit()
isspace()
find()
replace()
split()
join()
Tuples – A
tuple is represented by a number of values separated by commas. Tuples
are immutable and the output is surrounded by parentheses so that nested
tuples are processed correctly. Additionally, even though tuples are
immutable, they can hold mutable data if needed. 
Dictionary – Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}.

The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python:
import math as m we have defined an alias m to library math.We can now use various functions from math library (e.g. factorial) by referencing it using the alias m.factorial().
from math import * you have imported the entire name space in math i.e.you can directly use factorial() without referring to math.
Exploratory analysis in Python using Pandas
Pandas is one of the most useful data analysis library in Python.They have been instrumental in increasing the use of Python in data science community.
Importing libraries and the data set:
Following are the libraries we will use during this tutorial:- numpy
- matplotlib
- pandas
import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
df = pd.read_csv("C:/PYTHON/train.csv")
NOTE:Download file( train.csv)
using the link
Quick Data Exploration
Once you have read the dataset, you can have a look at few top rows by using the function head()
df.head(10)
you can look at summary of numerical fields by using describe() function
df.describe()
describe() function would provide count, mean, standard deviation (std), min, quartiles and max in its output
For the non-numerical values (e.g. Property_Area, Credit_History etc.), we can look at frequency distribution to understand whether they make sense or not. The frequency table can be printed by following command:
df['Property_Area'].value_counts()
Distribution analysis:
Lets start by plotting the histogram of ApplicantIncome using the following commands:df['ApplicantIncome'].hist(bins=50)
we look at box plots to understand the distributions. Box plot for fare can be plotted by:
df.boxplot(column='ApplicantIncome')
we are looking at people with different education levels. Let us segregate them by Education:
df.boxplot(column='ApplicantIncome')

We can see that there is no substantial different between the mean
income of graduate and non-graduates. But there are a higher number of
graduates with very high incomes, which are appearing to be the
outliers.
To be continue...



0 comments:
Post a Comment