← Back to Portfolio

Batsman vs Bowler Cricket Analysis

1. Overview

Predict the probability of a batsman scoring 50+ against a particular bowler using historical head-to-head data.

2. Data Collection

Scraped ESPN CricInfo for match logs (2010–2024)
Features: balls faced, dismissal type, bowler speed, pitch type

3. Modeling Approach

We frame this as a binary classification (>=50 vs. <50) using XGBoost:

Data scraping & cleaning with BeautifulSoup
Feature encoding (one-hot pitch, batsman handedness)
Train/test split by season
Hyperparameter tuning with xgboost.cv

4. Code Snippet


import xgboost as xgb
from bs4 import BeautifulSoup
import requests, pandas as pd

# 1. Scrape a match page
url = "https://www.espncricinfo.com/..."
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")
# … parsing logic …

# 2. Prepare DMatrix
dtrain = xgb.DMatrix(df_train.drop("score50",axis=1), label=df_train.score50)

# 3. Cross-validate
params = {"objective":"binary:logistic", "max_depth":4, "eta":0.1}
cv = xgb.cv(params, dtrain, num_boost_round=100, nfold=5)

5. Insights & Future Work

Model AUC: 0.82
Key features: bowler pace, pitch condition
Next: add weather data, extend to T20 vs. Test formats
Wrap into a Streamlit app for interactive queries