1、淘宝商品抓取
需要用cookie才能抓取,另外信息在源代码里隐藏,需要正则匹配提取
import requestsimport refrom lxml import etreeimport jsonheaders1 = {"authority":"authority","cookie":"t=9112f19ggggUjn6IZNGOI_GrdT9tGz36F","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36","upgrade-insecure-requests":"1",}def request1(url):# time.sleep(3)print(url)html=requests.get(url,headers=headers1)print(html.text)# json1=json.loads(html.text)['data']json_html=re.findall('g_page_config = (.*);.*?g_srp_loadCss',html.text,re.S)[0]print(json_html)json1=json.loads(json_html)# print(json1)return json1taobao_url="/search?q=笔记本"taobao_html=request1(taobao_url)print(len(taobao_html["mods"]["itemlist"]["data"]["auctions"]))for i in range(len(taobao_html["mods"]["itemlist"]["data"]["auctions"])):print(taobao_html["mods"]["itemlist"]["data"]["auctions"][i]["nick"])print(taobao_html["mods"]["itemlist"]["data"]["auctions"][i]["title"])
1、淘宝评论抓取
js里面通过分析得到:/feedRateList.htm?auctionNumId=542463533286&userNumId=10729¤tPageNum=1&pageSize=20
(需要得到产品id和userid–这个源代码里可以获取)
如果觉得《淘宝商品及评论抓取》对你有帮助,请点赞、收藏,并留下你的观点哦!