失眠网 > nba球员数据分析和可视化_可视化NBA球员统计

nba球员数据分析和可视化_可视化NBA球员统计

时间：2024-01-03 08:08:58

nba球员数据分析和可视化

I haven’t written a post in a while. I had a lot to do for university and my hobbies like recreational programming and blogging have to suffer during those times. But now I have found some time and I’ll be adding smaller posts every now and then.

我已经有一段时间没有写帖子了。我在大学里有很多工作要做，在那段时间里，诸如娱乐性编程和博客之类的爱好不得不受到影响。但是现在我已经找到了一些时间，我会不时增加一些较小的帖子。

In the Machine Learning course I am taking at university I could use matplotlibto plot my functions for the homework submissions. So I have gotten more familiar with coding plots and graphs in Python since my last post about matplotlib.So I wanted to prepare some interactive plots for my blog and present to you what I have been able to create so far.

在我上大学的机器学习课程中，我可以使用matplotlib绘制作业提交的功能。因此，自上一篇有关matplotlib的文章以来，我对Python中的绘图和图形编码更加熟悉。因此，我想为我的博客准备一些交互式图，并向您介绍到目前为止我已经能够创建的内容。

网页傻瓜 (Web Scrapping for Dummies)

First I wanted to find some interesting data to display. I decided to collect my own data as opposed to take an already publicly available data set, since this can easily be done in Python. For a step-by-step guide on how to scrap data from a web page,where the site is generated on the server side and you can find your data directly in the html code, I recommend chapter 11 inAutomate the Boring Stuff in Pythonfrom Al Sweigart. The standard module for this type of web scraping is BeautifulSoup, which makes it easy for you to find certain tags in an HTML file.

首先，我想找到一些有趣的数据来显示。我决定收集自己的数据，而不是采用已经公开可用的数据集，因为这可以在Python中轻松完成。有关如何从网页中抓取数据的分步指南，其中网站是在服务器端生成的，您可以直接在html代码中找到数据，我建议使用Python自动完成枯燥的工作中的第11章。来自Al Sweigart。这种类型的Web抓取的标准模块是BeautifulSoup ，它使您可以轻松地在HTML文件中查找某些标签。

So I decided, that I wanted to collect the stats of all currently active NBA players. Luckily, there is a blog post from Greg Redathat explained exactly how this can be done in Python. This approach of web scrapping is different, since a lot of newer sites create the web page on the client-side. So you have to find the url for the request to the server. The response you then get is often a JSON object, which you can then parse for the information you want.

所以我决定，我想收集所有当前活跃的NBA球员的统计数据。幸运的是，格雷格·雷达（Greg Reda ）的博客文章准确地解释了如何使用Python完成此工作。 Web抓取的这种方法是不同的，因为许多新站点在客户端上创建网页。因此，您必须找到到服务器的请求的URL。然后，您获得的响应通常是一个JSON对象，然后您可以对其进行解析以获取所需的信息。

The web page is generated on the client-side, so the latter approach was necessary. I first collected the person_ids from every NBA player in the database and then checked their roster status, whether the player is still actively playing in the NBA or not (here is the url for the player list). This is how my code for this task looks like:

网页是在客户端生成的，因此后一种方法是必要的。我首先从数据库中的每个NBA球员那里收集了person_id，然后检查了他们的名册状态，即该球员是否仍在积极参加NBA（这是球员名单的网址）。这是我执行此任务的代码的样子：

import requestsimport csvimport sys# get me all active playersurl_allPlayers = ("/stats/commonallplayers?IsOnlyCurrentSeason""=0&LeagueID=00&Season=-16")#request url and parse the JSONresponse = requests.get(url_allPlayers)response.raise_for_status()players = response.json()['resultSets'][0]['rowSet']# use rooster status flag to check if player is still actively playingactive_players = [players[i] for i in range(0,len(players)) if players[i][2]==1 ]ids = [active_players[i][0] for i in range(0,len(active_players))]print("Number of Active Players: " + str(len(ids)))

I can then use the IDs of the active players to open their own web page and use the request, that gives me more detailed stats like their point average in this season (/). I found the right request by following the approach explained in Gred Reda’s blog post. Here is the JSON objectfor the active NBA player Quincy Acy. When viewing these JSON object in your browser I suggest installing a plug-in like JSONView, since this allows a cleaner view at the content in the JSON object . This also has a great feature, where it shows in the bottom left of your browser screen how to access the current value your mouse is hovering over, which made writing the Python code much easier.

然后，我可以使用活跃玩家的ID来打开自己的网页并使用请求，这可以为我提供更详细的统计信息，例如本赛季（/）的平均得分。我遵循Gred Reda博客文章中介绍的方法找到了正确的请求。这是活跃的NBA球员Quincy Acy的JSON对象。在浏览器中查看这些JSON对象时，建议安装JSONView之类的插件，因为这样可以更清晰地查看JSON对象中的内容。这还有一个很棒的功能，它在浏览器屏幕的左下方显示了如何访问鼠标悬停的当前值，这使编写Python代码变得更加容易。

JSON object without JSONView

没有JSONView的JSON对象

Same JSON object usingJSONView in Chrome

在Chrome中使用JSONView的相同JSON对象

I then loop over the collected IDs and scrapped together the individual data from the players. Here’s my code for this:

然后，我遍历收集到的ID，并将玩家的个人数据收集在一起。这是我的代码：

name_height_pos = []for i in ids:url_onePlayer=("/stats/commonplayerinfo?""LeagueID=00&PlayerID=" + str(i) + "&SeasonType=Regular+Season")#request url and parse the JSONresponse = requests.get(url_onePlayer)response.raise_for_status()one_player = response.json()['resultSets'][0]['rowSet']stats_player = response.json()['resultSets'][1]['rowSet']try:points = stats_player[0][3]assists = stats_player[0][4]rebounds = stats_player[0][5]PIE = stats_player[0][6]# handle the case, where player is active, but didn't play# in any game so far in this season (-1 just a place holder value)except IndexError:points = -1assists = -1rebounds = -1PIE = -1name_height_pos.append([one_player[0][1] + " " + one_player[0][2], one_player[0][10], one_player[0][14], one_player[0][18],"http://i./nba/nba/.element/img/2.0/sect/statscube/players/large/"+one_player[0][1].lower()+"_"+ one_player[0][2].lower() +".png",points,assists,rebounds,PIE])

In case you are wondering what PIE is, PIE (Player Impact Estimate) is an advanced statistic, that tries to describe a player’s contribution to the total statistics in an NBA game. PIE is calculated with a formula consisting out of simple player and game stats like points or team assists. If you are more interested in how PIE is calculated, go check the glossary of . I decided to also collect this information too, since the JSON objects didn’t offer any other advanced statistic like the PER rating.

如果您想知道PIE是什么，PIE（玩家影响评估）是一种高级统计数据，它试图描述球员在NBA游戏中对总统计数据的贡献。 PIE的计算公式由简单的球员和比赛统计数据组成，例如积分或球队助攻。如果您对PIE的计算方式更感兴趣，请查看的词汇表。我决定也收集此信息，因为JSON对象没有提供任何其他高级统计信息，例如PER等级。

I also saved the link to the head shot of each player in case that could be useful for some kind of visualization. These links are all of a similar pattern, only the name of the player had to be inserted for the image file name. So this is how Quincy Acy looks like for everybody, who would like to know, and this is how Will Barton looks like.

我还保存了指向每个玩家头像的链接，以防对某些可视化有用。这些链接都是类似的模式，只需要为图像文件名插入播放器的名称。这就是Quincy Acy对于每个想知道的人的模样，这就是Will Barton的模样。

In case you want to separate your visualization code in a separate Python script, then you should save the gathered info in some kind of table format like a csv file. Here’s a code snippet showing a easy way on how to save a list of lists in Python into a csv file:

如果要在单独的Python脚本中分隔可视化代码，则应以某种表格式（如csv文件）保存收集的信息。这是一个代码片段，显示了有关如何将Python中的列表列表保存到csv文件中的简便方法：

with open("players.csv", "w") as csvfile:writer = csv.writer(csvfile, delimiter=",", lineterminator="n")writer.writerow(["Name","Height","Pos","Team","img_Link","PTS","AST","REB","PIE"])for row in name_height_pos:writer.writerow(row)print("Saved as 'players.csv'")

I have added the complete fileI have used to scrap the NBA player data on GitHub for those, who want to try this out at home.

我已经添加了完整的文件，用于在GitHub上将NBA球员数据抓取给想要在家尝试的人。

Here also a short preview on how the head of the csv file would then look like (I have left out the link name here, since that is just a long string of letters, that doesn’t contribute to the understanding the csv file structure at all):

这里还简要预览了csv文件的头部（我在这里省略了链接名称，因为那只是一长串字母，因此对理解csv文件的结构没有帮助所有）：

Name,Height,Pos,Team,img_Link,PTS,AST,REB,PIEQuincy Acy,200.66,Forward,SAC,[link],2.3,0.5,1.8,0.046Jordan Adams,195.58,Guard,MEM,[link],3.5,1.5,1.0,0.092Steven Adams,213.36,Center,OKC,[link],6.0,0.8,5.8,0.074Arron Afflalo,195.58,Guard,NYK,[link],12.6,1.7,3.8,0.077⋮ , ⋮ , ⋮ , ⋮ , ⋮, ⋮ , ⋮ , ⋮ , ⋮

Plotly中的3D散点图 (3D scatter plots in Plotly)

I wanted to create some kind of visualization using three axes the describe the three most popular statistics recorded in a basketball game: points, assists and rebounds. Each plot point should represent one player and when hovering over a plot point it should at least show the player’s name and also the exact values for his average points, assists and rebounds scored per game, if it can’t easily be read from the graph, which can often be the case in 3D visualizations.

我想使用三个轴来创建某种可视化，以描述篮球比赛中记录的三个最受欢迎的统计数据：得分，助攻和篮板。每个积点都应代表一个球员，如果将其悬停在积点上，则至少应显示该球员的姓名，以及每场比赛平均得分，助攻和篮板的准确值（如果无法从图表中轻松读取），在3D可视化中通常是这种情况。

I decided to use Plotlyto create my first 3D scatter plot. Plotly is a web program, that allows you to upload your collected data and to easily create plots and graphs, that you then easily can share or even embed into your website. Plotly allows you to create very specific visualizations and there are many variables that can be manipulated to create exactly the type of plot you want.

我决定使用Plotly来创建我的第一个3D散点图。 Plotly是一个Web程序，可让您上载收集的数据并轻松创建图和图，然后轻松地将它们共享甚至嵌入到您的网站中。 Plotly允许您创建非常特定的可视化效果，并且可以操纵许多变量来精确创建所需的图类型。

You have to sign up for in Plotly and create your own user account, but this is all free and it also offers a portfolio view for other users to look at the plots, tables and graphs you have created (here is my account page for example).

您必须在Plotly上注册并创建自己的用户帐户，但这完全免费，并且还为其他用户提供了投资组合视图，以供您查看您创建的图表，表格和图表（例如，这是我的帐户页面））。

So this is the scatter plot I was able to create in a matter of minutes with the data I have collected:

因此，这是我可以在几分钟内使用收集的数据创建的散点图：

使用MPLD3的交互式散点图 (Interactive scatter plots with MPLD3)

While the first example with Plotly didn’t require much additional coding or any knowledge of matplotlib, you can also create interactive plots for the web using the matplotlib module in combination with MPLD3. MPLD3 can create a html file containing a converted version of your matplotlib graph into javascript code using D3JS. MPLD3 can’t convert everything that can be made with matplotlib, like 3D graphs for example, but its still a good solution if you want to keep some interactivity when presenting your matplotlib plots on the web.

尽管使用Plotly的第一个示例不需要太多额外的编码或对matplotlib的任何了解，但是您也可以结合使用matplotlib模块和MPLD3来为Web创建交互式绘图。 MPLD3可以创建一个包含matplotlib图的转换后的版本为JavaScript代码使用HTML文件D3JS 。 MPLD3不能转换使用matplotlib可以进行的所有操作，例如3D图形，但是如果要在网络上显示matplotlib图时保持某种交互性，它仍然是一个很好的解决方案。

I had some troubles with getting the MPLD3 module to work on Python 3.4, when using the release available from simply using pip to install the module. It worked, when I used pip to install the most recent release on GitHub (see my previous post on Legofy to see how this works), but there you also have to follow a few additional installation steps, but these are well explained on the README.md page.

使用从仅使用pip来安装模块的可用版本时，我很难使MPLD3模块在Python 3.4上运行。当我使用pip在GitHub上安装最新版本时，它可以正常工作（请参阅我在Legofy上的上一篇文章以了解其工作原理），但是您还必须遵循一些其他安装步骤，但是自述文件中对此进行了详细说明.md页面。

So here is what a I created with matplotlib and the collected data. The code for this plot is given under the plot:

这就是我用matplotlib和收集的数据创建的内容。该图的代码在该图下给出：

# some plots made with the mined nba dataimport csvimport matplotlib.pyplot as pltfrom matplotlib.font_manager import FontPropertiesimport mpld3def main():with open("players_02.csv", "r") as nba_file:csv_reader = csv.reader(nba_file, delimiter=",")nba_list = list(csv_reader)[1:] # skip header description# header: Name,Height,Pos,Team,img_Link,PTS,AST,REB,PIE [Total: 9]# convert data into right data types (Height, PTS, AST, REB, PIE)for row in nba_list:row[1] = float(row[1])row[5] = float(row[5])row[6] = float(row[6])row[7] = float(row[7])row[8] = float(row[8]) # remove all active players without any recorded statsnba_list = [player for player in nba_list if player[5] != -1]heights = [i[1] for i in nba_list]points = [i[5] for i in nba_list]assists = [i[6] for i in nba_list]rebounds = [i[7] for i in nba_list]pie_ratings = [i[8] for i in nba_list]fig, axarr = plt.subplots(3, sharex=True)axarr[0].set_title("NBA Player Stats", fontsize=20, color="blue")# points / assistsscatter1 = axarr[0].scatter(points, assists)axarr[0].set_xlabel("Points", fontsize=16)axarr[0].set_ylabel("Assists", fontsize=16)axarr[0].set_ylim([0,15])axarr[0].grid(True)# points / reboundsscatter2 = axarr[1].scatter(points, rebounds)axarr[1].set_xlabel("Points", fontsize=16)axarr[1].set_ylabel("Rebounds", fontsize=16)axarr[1].set_ylim([0,16])axarr[1].grid(True)# points / pie_ratingsscatter3 = axarr[2].scatter(points, pie_ratings)axarr[2].set_xlabel("Points", fontsize=16)axarr[2].set_ylabel("PIE Rating", fontsize=16)axarr[2].grid(True)plt.xlim(0,35)labels = [i[0] for i in nba_list]tooltip1 = mpld3.plugins.PointLabelTooltip(scatter1, labels=labels)tooltip2 = mpld3.plugins.PointLabelTooltip(scatter2, labels=labels)tooltip3 = mpld3.plugins.PointLabelTooltip(scatter3, labels=labels)mpld3.plugins.connect(fig, tooltip1)mpld3.plugins.connect(fig, tooltip2)mpld3.plugins.connect(fig, tooltip3)mpld3.save_html(fig, "test_web_plots.html")mpld3.show()if __name__ == "__main__":main()

Matplotlib plots have the visuals of MATLAB plots out of the 90s and a lot of additional care is necessary to make these plots prettier. To make this easier for yourself I would suggest using Seaborn, that already applies some nice visual standards to your Matplotlib plots.

Matplotlib绘图具有90年代以来MATLAB绘图的外观，因此，使这些绘图更漂亮需要额外的注意。为了使自己更容易，我建议使用Seaborn ，它已经将一些不错的视觉标准应用于Matplotlib图。

This turned out to be a longer post than expected, but I hope it brought some insight on how easily you can collect data using Python and how you can quickly create interesting plots without a lot of code.

事实证明这比预期的要更长，但是我希望它带来一些见识，使您可以了解如何使用Python轻松收集数据以及如何无需大量代码即可快速创建有趣的图。