Python

Python for Data Science: Getting Started

Python's simplicity combined with powerful libraries like NumPy, pandas, and matplotlib make it the go-to language for data science and machine learning.

Essential Libraries

  • NumPy: Fast numerical computing with N-dimensional arrays
  • pandas: Data manipulation and analysis with DataFrames
  • matplotlib: Data visualization
  • scikit-learn: Machine learning algorithms

Loading and Exploring Data

import pandas as pd

df = pd.read_csv('data.csv')
print(df.head())
print(df.describe())
print(df.info())

Data Cleaning

# Drop missing values
df.dropna(inplace=True)

# Fill missing values
df['age'].fillna(df['age'].mean(), inplace=True)

# Remove duplicates
df.drop_duplicates(inplace=True)