Kobe Byrant and the 2012 Lakers

Kobe Byrant and the Lakers (11-14) aren’t doing as well as I had expected given the team they acquired in the off season. Everyone likes to point out that when he scores over x number of points (e.g. 30), the Lakers have lost more than they have won. So I took his stats for this season and had a look.

Using Random Forests™, we can assess the importance of each predictor variable (his stats) in predicting the outcome of the game (win/lost).

#install if necessary
install.packages("randomForest")
library(randomForest)
data <- read.table("kobe.csv", sep=",", header=T)
head(data)
  result   minute fg_made fg_attempt fg_percent trey_made trey_attempt trey_percent ft_made
1    win 40:02:00      12         21       57.1         2            6         33.3       8
2    win 43:36:00       9         29       31.0         1            8         12.5      11
3   lost 44:02:00      10         24       41.7         5           11         45.5       6
4   lost 40:43:00      16         28       57.1         3            9         33.3       7
5   lost 43:03:00       9         24       37.5         4            9         44.4      12
6   lost 43:30:00      11         24       45.8         4            5         80.0       9
  ft_attempt ft_percent offensive_rebound defensive_rebound rebound assist turnover steal block
1          9       88.9                 0                 4       4      6        5     1     0
2         13       84.6                 3                 4       7      7        3     2     1
3          6      100.0                 2                 8      10      6        5     1     1
4         10       70.0                 1                 4       5      2        5     1     1
5         14       85.7                 0                 0       0      3        2     1     0
6         10       90.0                 0                 3       3      7        5     3     2
  foul point
1    0    34
2    1    30
3    5    31
4    1    42
5    2    34
6    5    35
rf1 <- randomForest(result~., data=data, mtry=2, ntree=500, importance=TRUE)
importance(rf1, type=1)
                  MeanDecreaseAccuracy
minute                       0.0000000
fg_made                      5.3025854
fg_attempt                   0.6252638
fg_percent                   0.1072702
trey_made                    2.3203188
trey_attempt                 4.1491359
trey_percent                -1.9348577
ft_made                      1.7674257
ft_attempt                  -0.0889209
ft_percent                  -2.3835824
offensive_rebound           -0.4633745
defensive_rebound           -1.3406056
rebound                     -1.9043518
assist                       6.7749841
turnover                     3.0035312
steal                       -1.4873402
block                       -0.8102073
foul                         0.2128397
point                        5.1763946
&#91;/sourcecode&#93;

The number of assists he gets per game is more important in predicting the outcome of the game, than points (and the dependent variable fg_made). Let's plot these and do some <i>t</i>-tests:


library(ggplot2)
qplot(result, assist, data = data, geom="boxplot")
qplot(result, point, data = data, geom="boxplot")

t.test(subset(data$assist, data$result == 'lost'), subset(data$assist, data$result == 'win'))

        Welch Two Sample t-test

data:  subset(data$assist, data$result == "lost") and subset(data$assist, data$result == "win") 
t = -3.6023, df = 22.801, p-value = 0.001517
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -4.713380 -1.273633 
sample estimates:
mean of x mean of y 
 3.642857  6.636364

t.test(subset(data$point, data$result == 'lost'), subset(data$point, data$result == 'win'))

        Welch Two Sample t-test

data:  subset(data$point, data$result == "lost") and subset(data$point, data$result == "win") 
t = 3.7755, df = 19.76, p-value = 0.001209
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
  4.261713 14.803222 
sample estimates:
mean of x mean of y 
 33.71429  24.18182

kobe_win_lost

And lastly, the correlation between assist and point:

cor(data$assist,data$point)
[1] -0.5332513

At least statistically speaking, he should get his team mates more involved (until Nash comes back).

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
One comment Add yours

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.