Learning R through a mini game

Just last night I found this educational mini game written in R and decided to have a go at it:

I completed it but as I alluded to in my tweet, not in a very elegant manner. This post is on using the dplyr package in R to solve some of the problems. If you want to give the game a go first, then stop reading now.

To get started, first install the game/package, load the library, and use the proton() function:

install.packages("proton")
# load library
library(proton)
# start game
proton()

Install the dplyr package if you don't already have it:

install.packages("dplyr")
# load library
library(dplyr)

The first problem


Pietraszko uses a password which is very difficult to guess.
At first, try to hack an account of a person which is not as cautious as Pietraszko.

But who is the weakest point? Initial investigation suggests that John Insecure doesn't care about security and has an account on the Proton server. He may use a password which is easy to crack.
Let's attack his account first!

Problem 1: Find the login of John Insecure.

Bit has scrapped 'employees' data (names and logins) from the www web page of Technical University of Warsaw. The data is in the data.frame employees.
Now, your task is to find John Insecure's login.
When you finally find out what John's login is, use proton(action = "login", login="XYZ") command, where XYZ is Insecure's login.

# first check out the data frame
head(employees)
    name  surname       login
1  Jorge  Patrick   j.patrick
2 Gerald     Long gerald.long
3 Javier  Mendoza   j.mendoza
4    Roy Johnston        rjoh
5  Annie    Keith annie.keith
6   Nora   Castro        ncas

# our task is to find John Insercue's login
# let's use the filter() function in dplyr
filter(employees, name == "John", surname == "Insecure")
  name  surname   login
1 John Insecure johnins

# our solution
proton(action = "login", login="johnins")

The second problem


Congratulations! You have found out what John Insecure's login is!
It is highly likely that he uses some typical password.
Bit downloaded from the Internet a database with 1000 most commonly used passwords.
You can find this database in the top1000passwords vector.

Problem 2: Find John Insecure's password.

Use proton(action = "login", login="XYZ", password="ABC") command in order to log into the Proton server with the given credentials.
If the password is correct, you will get the following message:
Success! User is logged in!.
Otherwise you will get:
Password or login is incorrect!.

# check out the vector
head(top1000passwords)
[1] "123456"    "password"  "12345678"  "qwerty"    "123456789" "12345"

# loop through and try every typical password
# and add an if block to find out the password
for (i in top1000passwords){
  x <- proton(action = "login", login="johnins", password=i)
  if(length(grep(pattern = "Success", x = x)) == 1){
    print(i)
  }
}

[1] "q1w2e3r4t5"

The third problem


Well done! This is the right password!
Bit used John Insecure's account in order to log into the Proton server.
It turns out that John has access to server logs.
Now, Bit wants to check from which workstation Pietraszko is frequently logging into the Proton server. Bit hopes that there will be some useful data.

Logs are in the logs dataset.
Consecutive columns contain information such as: who, when and from which computer logged into Proton.

Problem 3: Check from which server Pietraszko logs into the Proton server most often.

Use proton(action = "server", host="XYZ") command in order to learn more about what can be found on the XYZ server.
The biggest chance to find something interesting is to find a server from which Pietraszko logs in the most often.

# This was the problem I brute-forced when it wasn't necessary
# I forgot that I could get Pietraszko's login from the employee data frame
# check out the logs
head(logs)
             login           host                data
1        r.spencer 193.0.96.13.15 2014-09-01 09:01:12
2     isaac.arnold  193.0.96.13.9 2014-09-01 09:01:51
3 warren.dickerson  194.29.178.32 2014-09-01 09:08:08
4          c.lopez   194.29.178.4 2014-09-01 09:09:02
5        l.russell 194.29.178.162 2014-09-01 09:22:22
6          t.silva 194.29.178.102 2014-09-01 09:25:40

# find the login of Pietraszko
filter(employees, surname == "Pietraszko")
      name    surname login
1 Slawomir Pietraszko  slap

head(filter(logs, login == "slap"))
  login           host                data
1  slap  194.29.178.16 2014-09-02 02:32:48
2  slap  194.29.178.16 2014-09-03 15:10:41
3  slap 194.29.178.108 2014-09-04 22:52:26
4  slap 194.29.178.108 2014-09-05 11:22:20
5  slap 194.29.178.108 2014-09-06 13:35:41
6  slap  194.29.178.16 2014-09-06 17:33:06

# use group and summarise to find the
# host most commonly logged in
slap_log_group <- group_by(filter(logs, login == "slap"), host)
summarise(slap_log_group, count = n())
Source: local data frame [5 x 2]

            host count
          (fctr) (int)
1  194.29.178.16   112
2 193.0.96.13.20    33
3 194.29.178.155     6
4 193.0.96.13.38     1
5 194.29.178.108    74

# solution
proton(action = "server", host="194.29.178.16")

The fourth and last problem


It turns out that Pietraszko often uses the public workstation 194.29.178.16.
What a carelessness.

Bit infiltrated this workstation easily. He downloaded bash_history file which contains a list of all commands that were entered into the server's console.
The chances are that some time ago Pietraszko typed a password into the console by mistake thinking that he was logging into the Proton server.

Problem 4: Find the Pietraszko's password.

In the bash_history dataset you will find all commands and parameters which have ever been entered.
Try to extract from this dataset only commands (only strings before space) and check whether one of them looks like a password.

# check out the history
head(bash_history)
[1] "mcedit /var/log/lighttpd/*" "pwd"                        "vim /var/log/mysql.*"      
[4] "rm /bin"                    "cat ~/.Xauthority"          "ls /srv"

# how many unique commands?
length(unique(bash_history))
[1] 489

# the password should only be a string without spaces
unique(grep(pattern = " ", x = bash_history, value = TRUE, invert = TRUE))
[1] "pwd"              "ps"               "whoiam"           "top"             
[5] "mc"               "DHbb7QXppuHnaXGN"

# now we can login as Slawomir Pietraszko to complete the game
# remember from problem two how we can log on to the server?
# proton(action = "login", login="XYZ", password="ABC")
proton(action = "login", login="slap", password="DHbb7QXppuHnaXGN")
Congratulations!

You have cracked Pietraszko's password!
Secret plans of his lab are now in your hands.
What is in this mysterious lab?
You may read about it in the `Pietraszko's cave` story which is available at http://biecek.pl/BetaBit/Warsaw

Next adventure of Beta and Bit will be available soon.

            proton.login.pass 
"Success! User is logged in!"

Summary

I thought that was rather fun. I finally tested out dplyr and it is definitely much easier than typing:

employees[employees$name=="John" & employees$surname=="Insecure",]
    name  surname   login
217 John Insecure johnins

The first time I went through the game I wanted to finish it as quickly as possible, hence for problem three, I just tried all the unique hosts. I only realised later during problem four, that I needed Pietraszko's login and that I could get that from the employee table.

For my regular readers, I have a bigger post on DNA sequencing that is still in the works. I'll try to finish that post in the coming week.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
Posted in fun, RTagged
One comment Add yours

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.