Ad – 728Γ—90
πŸ“Š Data Science

Pandas Tutorial – Data Analysis with Python

Pandas is Python's premier data analysis library. It provides two main data structures: Series (1D) and DataFrame (2D table). Pandas can read and write CSV, Excel, JSON, SQL β€” and provides SQL-like operations (filter, groupby, join) for any data source.

⏱️ 35 min read🎯 AdvancedπŸ“… Updated 2026

Pandas Series

A Series is a 1D labelled array β€” like a Python list with an index.

Python
import pandas as pd

# Create from list
scores = pd.Series([85, 92, 78, 95, 88], 
                   name="scores",
                   index=["Alice","Bob","Charlie","Diana","Eve"])

print(scores)
print(scores["Alice"])   # 85
print(scores.mean())     # 87.6
print(scores[scores > 90])
β–Ά Output
Alice 85 Bob 92 Charlie 78 Diana 95 Eve 88 Name: scores, dtype: int64 85 87.6 Bob 92 Diana 95

Creating DataFrames

A DataFrame is a 2D table with labelled rows and columns.

Python
import pandas as pd

# From dict of lists
df = pd.DataFrame({
    "name":  ["Alice", "Bob", "Charlie", "Diana"],
    "age":   [25, 30, 35, 28],
    "score": [88, 92, 78, 95],
    "city":  ["London", "Paris", "NYC", "London"]
})

print(df)
print(df.shape)    # (4, 4)
print(df.dtypes)
β–Ά Output
name age score city 0 Alice 25 88 London 1 Bob 30 92 Paris 2 Charlie 35 78 NYC 3 Diana 28 95 London (4, 4) name object age int64 score int64 city object

Selecting Data

Use [], .loc[], and .iloc[] to access rows and columns.

Python
# Select column
print(df["name"])           # Series
print(df[["name","score"]]) # DataFrame

# Select rows by label (.loc)
print(df.loc[0])            # First row
print(df.loc[0:2, ["name","score"]])  # Rows 0-2, 2 cols

# Select rows by position (.iloc)
print(df.iloc[0:3, 0:2])   # First 3 rows, first 2 cols
Ad – 336Γ—280

Filtering Rows

Use boolean conditions to filter rows.

Python
# Filter
high_scorers = df[df["score"] >= 90]
print(high_scorers)

# Multiple conditions
london_high = df[(df["city"] == "London") & (df["score"] > 80)]
print(london_high[["name","score"]])
β–Ά Output
name age score city 1 Bob 30 92 Paris 3 Diana 28 95 London name score 0 Alice 88 3 Diana 95

GroupBy – Aggregation

Group data and compute aggregates β€” like SQL GROUP BY.

Python
# Average score by city
print(df.groupby("city")["score"].mean())

# Multiple aggregations
print(df.groupby("city").agg({"score": ["mean","max"], "age": "mean"}))
β–Ά Output
city London 91.5 NYC 78.0 Paris 92.0 Name: score, dtype: float64