Pandas Series
A Series is a 1D labelled array β like a Python list with an index.
Python
import pandas as pd
# Create from list
scores = pd.Series([85, 92, 78, 95, 88],
name="scores",
index=["Alice","Bob","Charlie","Diana","Eve"])
print(scores)
print(scores["Alice"]) # 85
print(scores.mean()) # 87.6
print(scores[scores > 90])βΆ Output
Alice 85
Bob 92
Charlie 78
Diana 95
Eve 88
Name: scores, dtype: int64
85
87.6
Bob 92
Diana 95Creating DataFrames
A DataFrame is a 2D table with labelled rows and columns.
Python
import pandas as pd
# From dict of lists
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 28],
"score": [88, 92, 78, 95],
"city": ["London", "Paris", "NYC", "London"]
})
print(df)
print(df.shape) # (4, 4)
print(df.dtypes)βΆ Output
name age score city
0 Alice 25 88 London
1 Bob 30 92 Paris
2 Charlie 35 78 NYC
3 Diana 28 95 London
(4, 4)
name object
age int64
score int64
city objectSelecting Data
Use [], .loc[], and .iloc[] to access rows and columns.
Python
# Select column
print(df["name"]) # Series
print(df[["name","score"]]) # DataFrame
# Select rows by label (.loc)
print(df.loc[0]) # First row
print(df.loc[0:2, ["name","score"]]) # Rows 0-2, 2 cols
# Select rows by position (.iloc)
print(df.iloc[0:3, 0:2]) # First 3 rows, first 2 colsAd β 336Γ280
Filtering Rows
Use boolean conditions to filter rows.
Python
# Filter
high_scorers = df[df["score"] >= 90]
print(high_scorers)
# Multiple conditions
london_high = df[(df["city"] == "London") & (df["score"] > 80)]
print(london_high[["name","score"]])βΆ Output
name age score city
1 Bob 30 92 Paris
3 Diana 28 95 London
name score
0 Alice 88
3 Diana 95GroupBy β Aggregation
Group data and compute aggregates β like SQL GROUP BY.
Python
# Average score by city
print(df.groupby("city")["score"].mean())
# Multiple aggregations
print(df.groupby("city").agg({"score": ["mean","max"], "age": "mean"}))βΆ Output
city
London 91.5
NYC 78.0
Paris 92.0
Name: score, dtype: float64