Tidyverse
Jump to navigation
Jump to search
As of tidyverse 1.2.0, the following packages are included in the core tidyverse:
- ggplot2
- dplyr
- tidyr
- readr
- purrr
- tibble
- stringr
- forcats
dplyr
https://bookdown.org/rdpeng/rprogdatascience/managing-data-frames-with-the-dplyr-package.html
- dplyr can work with other data frame "backends" such as SQL databases; there is an SQL interface for relational databases via the DBI package
- dplyr can be integrated with the data.table package for large fast tables
Useful tidbits:
- case_when()
- filter(row_number() == n())
- count(), tally(), add_count(), add_tally()
- distinct()
- subset <- select(chicago, ends_with("2")) # select based on a pattern
- subset <- select(chicago, starts_with("d")) # select based on a pattern, see ?select for more information
- chicago <- rename(chicago, dewpoint = dptp, pm25 = pm25tmean2)
- filter(rowSums(.[2:ncol(.)]) == 4) see https://stackoverflow.com/questions/32618744/dplyr-how-to-reference-columns-by-column-index-rather-than-column-name-using-mu
tidyr
The goal of tidyr is to help you create tidy data. Tidy data is data where:
- Each variable is in a column.
- Each observation is a row.
- Each value is a cell.
The principles of tidy data provide a standard way to organise data values within a dataset.