Friday 27 November 2020

Imputer in Python

#Check for null values in the data-set. If any, impute those missing values using simple imputer. (Hint: Use 'median' to impute)find out the mean of 'enginesize' after imputing
import pandas as pd
import numpy as np
# Importing the SimpleImputer class 
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.NaN, strategy='median')
df.enginesize = imputer.fit_transform(df['enginesize'].values.reshape(-1,1))[:,0]
df['enginesize'].mean()

Thursday 22 October 2020

Different type of Calculation in Python using Numpy and Pandas Library

import pandas as pd import numpy as np from scipy.stats import kurtosis from scipy.stats import skew my_array = [] a = int(input("Size of array:")) #size of array for i in range(a): my_array.append(int(input("Element : "))) #appending to array my_array = np.array(my_array) #storing into a numpy array print(list(my_array)) x = (np.round(my_array,3)) #making to 3 decimal values y = skew(x) #calculating Skewness value print(round(y,3)) z = kurtosis(x) #calculating kurtosis value print(round(z,3)) print(np.var(list(x))) #calculating variance print(round(np.std(x),3)) #calculating std deviation
Output :
Size of array:5
Element : 24
Element : 567
Element : 70
Element : 4
Element : 45
[24, 567, 70, 4, 45]
1.461
0.197
45637.2
213.629

Friday 16 October 2020

Different libraries in Python


Python’s statistics is a built-in Python library for descriptive statistics. You     can use it if your datasets are not too large or if you can’t rely on importing    other libraries.

NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.

SciPy is a third-party library for scientific computing based on NumPy. It offers additional functionality compared to NumPy, including scipy.stats for statistical analysis.

Pandas is a third-party library for numerical computing based on NumPy. It excels in handling labeled one-dimensional (1D) data with Series objects and two-dimensional (2D) data with DataFrame objects.

Matplotlib is a third-party library for data visualization. It works well in combination with NumPy, SciPy, and Pandas.