失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 0409-C · NFL Positional Salaries · ggplot2 折线图 散点图 分面图 · R 语言数据可视化 案例 源码

0409-C · NFL Positional Salaries · ggplot2 折线图 散点图 分面图 · R 语言数据可视化 案例 源码

时间:2018-06-27 02:54:54

相关推荐

0409-C · NFL Positional Salaries · ggplot2  折线图 散点图 分面图 · R 语言数据可视化 案例 源码

所有作品合集传送门: Tidy Tuesday

年合集传送门:

NFL Positional Salaries

NFL Positional Salaries

Tidy Tuesday 在 GitHub 上的传送地址:

Thomas Mock (). Tidy Tuesday: A weekly data project aimed at the R ecosystem./rfordatascience/tidytuesday

1. 一些环境设置

# 设置为国内镜像, 方便快速安装模块options("repos" = c(CRAN = "https://mirrors.tuna./CRAN/"))

2. 设置工作路径

wkdir <- '/home/user/R_workdir/TidyTuesday//-04-09_NFL_Positional_Salaries/src-c'setwd(wkdir)

3. 加载 R 包

library(readxl)library(ggrepel)library(tidyverse)library(showtext)

# 在 Ubuntu 系统上测试的, 不加这个我画出来的汉字会乱码 ~showtext_auto()

4. 加载数据

df_input <- readxl::read_excel("../data/nfl_salary.xlsx")# 简要查看数据内容glimpse(df_input)

## Rows: 800## Columns: 11## $ year<dbl> , , , , , , , , 20…## $ Cornerback<dbl> 11265916, 11000000, 10000000, 10000000, 10000000, …## $ `Defensive Lineman` <dbl> 17818000, 16200000, 12476000, 11904706, 11762782, …## $ Linebacker<dbl> 16420000, 15623000, 11825000, 10083333, 10020000, …## $ `Offensive Lineman` <dbl> 15960000, 12800000, 11767500, 10358200, 10000000, …## $ Quarterback <dbl> 17228125, 16000000, 14400000, 14100000, 13510000, …## $ `RunnALB ERcT`<dbl> 12955000, 10873833, 9479000, 7700000, 7500000, 703…## $ Safety <dbl> 8871428, 8787500, 8282500, 8000000, 7804333, 76527…## $ `Special Teamer` <dbl> 4300000, 3725000, 3556176, 3500000, 3250000, 32250…## $ `Tight End` <dbl> 8734375, 8591000, 8290000, 7723333, 6974666, 61333…## $ `ALB ReceEvRT`<dbl> 16250000, 14175000, 11424000, 11415000, 10800000, …

# 检查数据的列名colnames(df_input)

## [1] "year" "Cornerback" "Defensive Lineman"## [4] "Linebacker" "Offensive Lineman" "Quarterback"## [7] "Running Back""Safety" "Special Teamer" ## [10] "Tight End" "Wide Receiver"

5. 数据预处理

# 整理数据, 从宽数据透视到长数据转换df_tidy <- df_input %>% gather(`Cornerback`:`Wide Receiver`, key = "player_position", value = "salary", `Cornerback`:`Wide Receiver`)glimpse(df_tidy)

## Rows: 8,000## Columns: 3## $ year <dbl> , , , , , , , , , …## $ player_position <chr> "Cornalbert", "Cornerback", "Cornerback", "Cornerback"…## $ salary<dbl> 11265916, 11000000, 10000000, 10000000, 10000000, 9244…

# 选择前 20 名的薪资df_tidy <- df_tidy %>% group_by(year, player_position) %>% top_n(n = 16, wt = salary)# 将 player_position 变量转换成因子型df_tidy$player_position <- factor(df_tidy$player_position, levels = c("Running Back", "Quarterback","Offensive Lineman","Tight End","Wide Receiver", "Cornerback", "Defensive Lineman", "Linebacker","Safety", "Special Teamer"))# 将工资除以百万,以获得更具可读性的刻度df_tidy <- df_tidy %>% mutate(salary_mil = round(salary / 1000000))# 计算每个职位每年的总开支# 最好使用 dplyr::summarise() 形式调用函数, 不然容易和内置函数冲突df_position_year <- df_tidy %>% group_by(player_position, year) %>% dplyr::summarize(total_spent = sum(salary_mil), .groups = 'drop')# 计算所有职位每年的支出总额df_year <- df_tidy %>% group_by(year) %>% dplyr::summarize(spent_per_year = sum(salary_mil), .groups = 'drop')# 数据框合并df_plot <- df_tidy %>% left_join(df_position_year, by = c("player_position", "year"))df_plot <- df_plot %>% left_join(df_year, by = c("year"))# 计算每个职位每年的百分比# 建议使用 dplyr::mutate 形式调用函数, 不然容易与 plyr 中的函数冲突 (因为我自己就报错了...)df_plot <- df_plot %>% dplyr::mutate(pct_total = (total_spent / spent_per_year) * 100)# 创建 进攻 Offense 和 防守 Defense 类别df_plot <- df_plot %>% dplyr::mutate(off_def = ifelse(player_position %in% c("Quarterback", "Wide Receiver", "Offensive Lineman","Running Back", "Tight End"), "OFFENSE", "DEFENSE"))# 将 off_def 变量转换成因子型df_plot$off_def <- factor(df_plot$off_def, levels = c("OFFENSE", "DEFENSE"))# 删除缺失值的观测df_plot <- na.omit(df_plot)glimpse(df_plot)

## Rows: 1,288## Columns: 8## Groups: year, player_position [80]## $ year <dbl> , , , , , , , , , …## $ player_position <fct> Cornerback, Cornerback, Cornerback, Cornerback, Corner…## $ salary<dbl> 11265916, 11000000, 10000000, 10000000, 10000000, 9244…## $ salary_mil<dbl> 11, 11, 10, 10, 10, 9, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 1…## $ totalalbert<dbl> 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129,…## $ spent_per_year <dbl> 1247, 1247, 1247, 1247, 1247, 1247, 1247, 1247, 1247, …## $ pct_total <dbl> 10.34483, 10.34483, 10.34483, 10.34483, 10.34483, 10.3…## $ off_def <fct> DEFENSE, DEFENSE, DEFENSE, DEFENSE, DEFENSE, DEFENSE, …

6. 折线图 (一)

6.1 利用 ggplot2 绘图

# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起gg <- ggplot(df_plot)# geom_point() 绘制散点图gg <- gg + geom_point(aes(x = year, y = salary_mil), alpha = 1/4, na.rm = TRUE)# geom_smooth() 添加拟合曲线gg <- gg + geom_smooth(aes(x = year, y = salary_mil), se = FALSE, color = "#FF5722", method = 'loess', formula = 'y~x', na.rm = TRUE)gg <- gg + ylim(0, 25)# facet_wrap() 可视化分面图gg <- gg + facet_wrap(~ player_position, nrow = 2, ncol = 5)gg <- gg + labs(title = "橄榄球队内每个职位前20名球员的薪酬水平",subtitle = "排名靠前的后卫的平均薪酬已经停滞不前",x = NULL,y = '平均上限值',caption = "NFL Quarterback Salaries\ngraph by 萤火之森")# theme_minimal() 去坐标轴边框的最小化主题gg <- gg + theme_minimal()# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观gg <- gg + theme(# panel.grid.major 主网格线, 这一步表示删除主要网格线panel.grid.major = element_line("grey", size = 0.2),# panel.grid.minor 次网格线, 这一步表示删除次要网格线panel.grid.minor = element_blank(),# axis.text 坐标轴刻度文本axis.text = element_text(color = "black", size = 8),# axis.title 坐标轴标题axis.title = element_text(color = "black", size = 8),# axis.ticks 坐标轴刻度线axis.ticks = element_blank(),# plot.title 主标题plot.title = element_text(hjust = 0.5, color = "black", size = 16, face = "bold"),# plot.subtitle 次要标题plot.subtitle = element_text(hjust = 0.5, color = "red", size = 10),# plot.background 图片背景plot.background = element_rect(fill = "white"),# strip.text 自定义分面图每个分面标题的文字strip.text = element_text(face = "bold", size = rel(0.7)),# strip.background 自定义分面图每个分面的背景颜色strip.background = element_rect("white"))

6.2 保存图片到 PDF 和 PNG

gg

filename = '0409-C-01'ggsave(filename = paste0(filename, ".pdf"), width = 8.6, height = 5, device = cairo_pdf)ggsave(filename = paste0(filename, ".png"), width = 8.6, height = 5, dpi = 100, device = "png")

7. 折线图 (二)

7.1 利用 ggplot2 绘图

# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起plot.labels <- data.frame(player_position = c("RB", "QB", "OL", "TE", "WR", "CB", "DL", "LB", "S", "ST"),x = c(rep(, 10)),y = c(5, 21, 10, 7.5, 12, 10, 13.5, 11, 7.5, 3))gg <- ggplot(df_plot, aes(x = year, y = pct_total, color = player_position))gg <- gg + geom_point()gg <- gg + geom_smooth(se = FALSE, method = 'loess', formula = 'y~x')gg <- gg + geom_text(data = plot.labels, aes(x = x, y = y, label = player_position), check_overlap = TRUE)gg <- gg + facet_wrap(~ off_def)gg <- gg + labs(title = "橄榄球队内每个职位前20名球员的花费百分比",subtitle = "团队在RB上花费更少",x = NULL,y = 'Percent spent on each position',caption = "NFL Quarterback Salaries\ngraph by 萤火之森")# theme_minimal() 去坐标轴边框的最小化主题gg <- gg + theme_minimal()# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观gg <- gg + theme(# panel.grid.major 主网格线, 这一步表示删除主要网格线panel.grid.major = element_line("grey", size = 0.2),# panel.grid.minor 次网格线, 这一步表示删除次要网格线panel.grid.minor = element_blank(),# axis.text 坐标轴刻度文本axis.text = element_text(color = "black", size = 8),# axis.title 坐标轴标题axis.title = element_text(color = "black", size = 8),# axis.ticks 坐标轴刻度线axis.ticks = element_blank(),# plot.title 主标题plot.title = element_text(hjust = 0.5, color = "black", size = 16, face = "bold"),# plot.subtitle 次要标题plot.subtitle = element_text(hjust = 0.5, color = "red", size = 10),# plot.background 图片背景plot.background = element_rect(fill = "white"),# legend.position 设置图例位置, "none" 表示不显示图例legend.position = "none",# strip.text 自定义分面图每个分面标题的文字strip.text = element_text(face = "bold", size = rel(0.7)),# strip.background 自定义分面图每个分面的背景颜色strip.background = element_rect("white"))

7.2 保存图片到 PDF 和 PNG

gg

filename = '0409-C-02'ggsave(filename = paste0(filename, ".pdf"), width = 9.4, height = 5.5, device = cairo_pdf)ggsave(filename = paste0(filename, ".png"), width = 9.4, height = 5.5, dpi = 100, device = "png")

8. session-info

sessionInfo()

## R version 4.2.1 (-06-23)## Platform: x86_64-pc-linux-gnu (64-bit)## Running under: Ubuntu 20.04.4 LTS## ## Matrix products: default## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3## ## locale:## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=CLC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages:## [1] statsgraphics grDevices utilsdatasets methods base## ## other attached packages:## [1] showtext_0.9-5 showtextdb_3.0 sysfonts_0.8.8 forcats_0.5.2 ## [5] stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4readr_2.1.2 ## [9] tidyr_1.2.1tibble_3.1.8 tidyverse_1.3.2 ggrepel_0.9.1 ## [13] ggplot2_3.3.6 readxl_1.4.1 ## ## loaded via a namespace (and not attached):## [1] httr_1.4.4sass_0.4.2jsonlite_1.8.0## [4] splines_4.2.1 modelr_0.1.9 bslib_0.4.0 ## [7] assertthat_0.2.1 highr_0.9 googlesheets4_1.0.1## [10] cellranger_1.1.0 yaml_2.3.5pillar_1.8.1 ## [13] backports_1.4.1lattice_0.20-45glue_1.6.2 ## [16] digest_0.6.29 rvest_1.0.3 colorspace_2.0-3 ## [19] htmltools_0.5.3Matrix_1.4-1 pkgconfig_2.0.3 ## [22] broom_1.0.1 haven_2.5.1 scales_1.2.1 ## [25] tzdb_0.3.0googledrive_2.0.0 mgcv_1.8-40 ## [28] generics_0.1.3farver_2.1.1 ellipsis_0.3.2## [31] cachem_1.0.6 withr_2.5.0 cli_3.3.0## [34] magrittr_2.0.3crayon_1.5.1 evaluate_0.16## [37] fs_1.5.2 fansi_1.0.3 nlme_3.1-159 ## [40] xml2_1.3.3textshaping_0.3.6 tools_4.2.1 ## [43] hms_1.1.2 gargle_1.2.1 lifecycle_1.0.1 ## [46] munsell_0.5.0 reprex_2.0.2 compiler_4.2.1## [49] jquerylib_0.1.4systemfonts_1.0.4 rlang_1.0.5 ## [52] grid_4.2.1rstudioapi_0.14labeling_0.4.2## [55] rmarkdown_2.16gtable_0.3.1 DBI_1.1.3## [58] R6_2.5.1 lubridate_1.8.0knitr_1.40 ## [61] fastmap_1.1.0 utf8_1.2.2ragg_1.2.3 ## [64] stringi_1.7.8 Rcpp_1.0.9vctrs_0.4.1 ## [67] dbplyr_2.2.1 tidyselect_1.1.2 xfun_0.32

测试数据

配套数据下载:nfl_salary.xlsx

如果觉得《0409-C · NFL Positional Salaries · ggplot2 折线图 散点图 分面图 · R 语言数据可视化 案例 源码》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。