Predict the probability of a batsman scoring 50+ against a particular bowler using historical head-to-head data.
We frame this as a binary classification (>=50
vs. <50
) using XGBoost:
BeautifulSoup
xgboost.cv
import xgboost as xgb
from bs4 import BeautifulSoup
import requests, pandas as pd
# 1. Scrape a match page
url = "https://www.espncricinfo.com/..."
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")
# … parsing logic …
# 2. Prepare DMatrix
dtrain = xgb.DMatrix(df_train.drop("score50",axis=1), label=df_train.score50)
# 3. Cross-validate
params = {"objective":"binary:logistic", "max_depth":4, "eta":0.1}
cv = xgb.cv(params, dtrain, num_boost_round=100, nfold=5)