Kobe Byrant and the Lakers (11-14) aren't doing as well as I had expected given the team they acquired in the off season. Everyone likes to point out that when he scores over x number of points (e.g. 30), the Lakers have lost more than they have won. So I took his stats for this season and had a look.
Using Random Forests™, we can assess the importance of each predictor variable (his stats) in predicting the outcome of the game (win/lost).
#install if necessary install.packages("randomForest") library(randomForest) data <- read.table("kobe.csv", sep=",", header=T) head(data) result minute fg_made fg_attempt fg_percent trey_made trey_attempt trey_percent ft_made 1 win 40:02:00 12 21 57.1 2 6 33.3 8 2 win 43:36:00 9 29 31.0 1 8 12.5 11 3 lost 44:02:00 10 24 41.7 5 11 45.5 6 4 lost 40:43:00 16 28 57.1 3 9 33.3 7 5 lost 43:03:00 9 24 37.5 4 9 44.4 12 6 lost 43:30:00 11 24 45.8 4 5 80.0 9 ft_attempt ft_percent offensive_rebound defensive_rebound rebound assist turnover steal block 1 9 88.9 0 4 4 6 5 1 0 2 13 84.6 3 4 7 7 3 2 1 3 6 100.0 2 8 10 6 5 1 1 4 10 70.0 1 4 5 2 5 1 1 5 14 85.7 0 0 0 3 2 1 0 6 10 90.0 0 3 3 7 5 3 2 foul point 1 0 34 2 1 30 3 5 31 4 1 42 5 2 34 6 5 35 rf1 <- randomForest(result~., data=data, mtry=2, ntree=500, importance=TRUE) importance(rf1, type=1) MeanDecreaseAccuracy minute 0.0000000 fg_made 5.3025854 fg_attempt 0.6252638 fg_percent 0.1072702 trey_made 2.3203188 trey_attempt 4.1491359 trey_percent -1.9348577 ft_made 1.7674257 ft_attempt -0.0889209 ft_percent -2.3835824 offensive_rebound -0.4633745 defensive_rebound -1.3406056 rebound -1.9043518 assist 6.7749841 turnover 3.0035312 steal -1.4873402 block -0.8102073 foul 0.2128397 point 5.1763946 [/sourcecode] The number of assists he gets per game is more important in predicting the outcome of the game, than points (and the dependent variable fg_made). Let's plot these and do some <i>t</i>-tests: library(ggplot2) qplot(result, assist, data = data, geom="boxplot") qplot(result, point, data = data, geom="boxplot") t.test(subset(data$assist, data$result == 'lost'), subset(data$assist, data$result == 'win')) Welch Two Sample t-test data: subset(data$assist, data$result == "lost") and subset(data$assist, data$result == "win") t = -3.6023, df = 22.801, p-value = 0.001517 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.713380 -1.273633 sample estimates: mean of x mean of y 3.642857 6.636364 t.test(subset(data$point, data$result == 'lost'), subset(data$point, data$result == 'win')) Welch Two Sample t-test data: subset(data$point, data$result == "lost") and subset(data$point, data$result == "win") t = 3.7755, df = 19.76, p-value = 0.001209 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 4.261713 14.803222 sample estimates: mean of x mean of y 33.71429 24.18182
And lastly, the correlation between assist and point:
cor(data$assist,data$point) [1] -0.5332513
At least statistically speaking, he should get his team mates more involved (until Nash comes back).

This work is licensed under a Creative Commons
Attribution 4.0 International License.
nice…