Python's simplicity combined with powerful libraries like NumPy, pandas, and matplotlib make it the go-to language for data science and machine learning.
Essential Libraries
- NumPy: Fast numerical computing with N-dimensional arrays
- pandas: Data manipulation and analysis with DataFrames
- matplotlib: Data visualization
- scikit-learn: Machine learning algorithms
Loading and Exploring Data
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
print(df.describe())
print(df.info())Data Cleaning
# Drop missing values
df.dropna(inplace=True)
# Fill missing values
df['age'].fillna(df['age'].mean(), inplace=True)
# Remove duplicates
df.drop_duplicates(inplace=True)