Key Machine Learning Concepts
Supervised learning: learn from labelled examples (classification, regression). Unsupervised learning: find patterns in unlabelled data (clustering, dimensionality reduction). Reinforcement learning: learn from rewards and penalties.
# The ML workflow:
# 1. Collect and prepare data
# 2. Split into training and test sets
# 3. Choose and train a model
# 4. Evaluate on test set
# 5. Tune and improve
# 6. Deploy
print("Data β Model β Predictions")scikit-learn β Python's ML Toolkit
scikit-learn provides consistent APIs for 50+ ML algorithms, plus data preprocessing, model evaluation, and pipelines.
# pip install scikit-learn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# Load example dataset
iris = load_iris()
X, y = iris.data, iris.target
print(f"Samples: {len(X)}, Features: {X.shape[1]}")
print(f"Classes: {iris.target_names}")Training Your First Model
Split data, train a classifier, and evaluate it.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
iris = load_iris()
X, y = iris.data, iris.target
# Split: 80% train, 20% test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2%}")Linear Regression Example
Predict continuous values (price, temperature, salary) with linear regression.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# Synthetic house price data
np.random.seed(42)
sqft = np.random.randint(500, 3000, 200).reshape(-1, 1)
price = sqft * 250 + np.random.normal(0, 15000, (200,1))
X_train, X_test, y_train, y_test = train_test_split(sqft, price, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"RΒ² Score: {r2_score(y_test, y_pred):.3f}")
print(f"Predict 2000 sqft: ${model.predict([[2000]])[0][0]:,.0f}")