Friday, 27 November 2020

Imputer in Python

#Check for null values in the data-set. If any, impute those missing values using simple imputer. (Hint: Use 'median' to impute)find out the mean of 'enginesize' after imputing
import pandas as pd
import numpy as np
# Importing the SimpleImputer class 
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.NaN, strategy='median')
df.enginesize = imputer.fit_transform(df['enginesize'].values.reshape(-1,1))[:,0]
df['enginesize'].mean()

Thursday, 22 October 2020

Different type of Calculation in Python using Numpy and Pandas Library

import pandas as pd import numpy as np from scipy.stats import kurtosis from scipy.stats import skew my_array = [] a = int(input("Size of array:")) #size of array for i in range(a): my_array.append(int(input("Element : "))) #appending to array my_array = np.array(my_array) #storing into a numpy array print(list(my_array)) x = (np.round(my_array,3)) #making to 3 decimal values y = skew(x) #calculating Skewness value print(round(y,3)) z = kurtosis(x) #calculating kurtosis value print(round(z,3)) print(np.var(list(x))) #calculating variance print(round(np.std(x),3)) #calculating std deviation
Output :
Size of array:5
Element : 24
Element : 567
Element : 70
Element : 4
Element : 45
[24, 567, 70, 4, 45]
1.461
0.197
45637.2
213.629

Friday, 16 October 2020

Different libraries in Python


Python’s statistics is a built-in Python library for descriptive statistics. You     can use it if your datasets are not too large or if you can’t rely on importing    other libraries.

NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.

SciPy is a third-party library for scientific computing based on NumPy. It offers additional functionality compared to NumPy, including scipy.stats for statistical analysis.

Pandas is a third-party library for numerical computing based on NumPy. It excels in handling labeled one-dimensional (1D) data with Series objects and two-dimensional (2D) data with DataFrame objects.

Matplotlib is a third-party library for data visualization. It works well in combination with NumPy, SciPy, and Pandas.