← Back to Portfolio

Batsman vs Bowler Cricket Analysis

1. Overview

Predict the probability of a batsman scoring 50+ against a particular bowler using historical head-to-head data.

2. Data Collection

3. Modeling Approach

We frame this as a binary classification (>=50 vs. <50) using XGBoost:

  1. Data scraping & cleaning with BeautifulSoup
  2. Feature encoding (one-hot pitch, batsman handedness)
  3. Train/test split by season
  4. Hyperparameter tuning with xgboost.cv

4. Code Snippet


import xgboost as xgb
from bs4 import BeautifulSoup
import requests, pandas as pd

# 1. Scrape a match page
url = "https://www.espncricinfo.com/..."
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")
# … parsing logic …

# 2. Prepare DMatrix
dtrain = xgb.DMatrix(df_train.drop("score50",axis=1), label=df_train.score50)

# 3. Cross-validate
params = {"objective":"binary:logistic", "max_depth":4, "eta":0.1}
cv = xgb.cv(params, dtrain, num_boost_round=100, nfold=5)
    

5. Insights & Future Work