Kobe Byrant and the Lakers (11-14) aren't doing as well as I had expected given the team they acquired in the off season. Everyone likes to point out that when he scores over x number of points (e.g. 30), the Lakers have lost more than they have won. So I took his stats for this season and had a look.
Using Random Forests™, we can assess the importance of each predictor variable (his stats) in predicting the outcome of the game (win/lost).
#install if necessary
install.packages("randomForest")
library(randomForest)
data <- read.table("kobe.csv", sep=",", header=T)
head(data)
result minute fg_made fg_attempt fg_percent trey_made trey_attempt trey_percent ft_made
1 win 40:02:00 12 21 57.1 2 6 33.3 8
2 win 43:36:00 9 29 31.0 1 8 12.5 11
3 lost 44:02:00 10 24 41.7 5 11 45.5 6
4 lost 40:43:00 16 28 57.1 3 9 33.3 7
5 lost 43:03:00 9 24 37.5 4 9 44.4 12
6 lost 43:30:00 11 24 45.8 4 5 80.0 9
ft_attempt ft_percent offensive_rebound defensive_rebound rebound assist turnover steal block
1 9 88.9 0 4 4 6 5 1 0
2 13 84.6 3 4 7 7 3 2 1
3 6 100.0 2 8 10 6 5 1 1
4 10 70.0 1 4 5 2 5 1 1
5 14 85.7 0 0 0 3 2 1 0
6 10 90.0 0 3 3 7 5 3 2
foul point
1 0 34
2 1 30
3 5 31
4 1 42
5 2 34
6 5 35
rf1 <- randomForest(result~., data=data, mtry=2, ntree=500, importance=TRUE)
importance(rf1, type=1)
MeanDecreaseAccuracy
minute 0.0000000
fg_made 5.3025854
fg_attempt 0.6252638
fg_percent 0.1072702
trey_made 2.3203188
trey_attempt 4.1491359
trey_percent -1.9348577
ft_made 1.7674257
ft_attempt -0.0889209
ft_percent -2.3835824
offensive_rebound -0.4633745
defensive_rebound -1.3406056
rebound -1.9043518
assist 6.7749841
turnover 3.0035312
steal -1.4873402
block -0.8102073
foul 0.2128397
point 5.1763946
[/sourcecode]
The number of assists he gets per game is more important in predicting the outcome of the game, than points (and the dependent variable fg_made). Let's plot these and do some <i>t</i>-tests:
[sourcecode lang="r"]
library(ggplot2)
qplot(result, assist, data = data, geom="boxplot")
qplot(result, point, data = data, geom="boxplot")
t.test(subset(data$assist, data$result == 'lost'), subset(data$assist, data$result == 'win'))
Welch Two Sample t-test
data: subset(data$assist, data$result == "lost") and subset(data$assist, data$result == "win")
t = -3.6023, df = 22.801, p-value = 0.001517
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.713380 -1.273633
sample estimates:
mean of x mean of y
3.642857 6.636364
t.test(subset(data$point, data$result == 'lost'), subset(data$point, data$result == 'win'))
Welch Two Sample t-test
data: subset(data$point, data$result == "lost") and subset(data$point, data$result == "win")
t = 3.7755, df = 19.76, p-value = 0.001209
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.261713 14.803222
sample estimates:
mean of x mean of y
33.71429 24.18182
And lastly, the correlation between assist and point:
cor(data$assist,data$point) [1] -0.5332513
At least statistically speaking, he should get his team mates more involved (until Nash comes back).

This work is licensed under a Creative Commons
Attribution 4.0 International License.

nice…