失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > python爬取电影天堂电影信息

python爬取电影天堂电影信息

时间:2018-11-09 03:32:22

相关推荐

python爬取电影天堂电影信息

from lxml import etreeimport requests# url='/html/gndy/dyzz/index.html'headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, ''like Gecko) Chrome/83.0.4103.61 Safari/537.36'}# 爬取前八页的电影的链接def Film_urls():url='/html/gndy/dyzz/list_23_{}.html'Base_url='/'urls=[]for i in range(1,8):print("正在爬取每页电影链接。。。。")url=url.format(i)response=requests.get(url,headers=headers)html=etree.HTML(response.text)urls+=(html.xpath("//table[@class='tbspan']//a/@href"))urls=[Base_url+url for url in urls]return urls# 爬取每个电影的详细内容def Film_content(url):movie={}response=requests.get(url,headers=headers)response=response.content.decode(encoding='gbk',errors='ignore')html=etree.HTML(response)content=html.xpath("//div[@id='Zoom']")[0]#定义筛选函数def parse_info(info,str):return info.replace(str,"").strip()infos=content.xpath(".//text()")for index,info in enumerate(infos):if info.startswith("◎片名"):info=parse_info(info,"◎片名")movie['name']=infoelif info.startswith("◎年代"):info=parse_info(info,"◎年代")movie['year']=infoelif info.startswith("◎产地"):info=parse_info(info,"◎产地")movie['country']=infoelif info.startswith("◎类别"):info=parse_info(info,"◎类别")movie['category']=infoelif info.startswith("◎豆瓣评分"):info=parse_info(info,"◎豆瓣评分")movie['douban_rating']=infoelif info.startswith("◎片长"):info=parse_info(info,"◎片长")movie['duration']=infoelif info.startswith("◎导演"):info = parse_info(info, "◎导演")movie['director'] = infoelif info.startswith("◎主演"):info = parse_info(info, "◎主演")actors = [info]for x in range(index + 1, len(infos)):actor = infos[x].strip()if actor.startswith("◎"):breakactors.append(actor)movie['actors'] = actorselif info.startswith("◎简介"):info = parse_info(info, "◎简介")movie['profile']=infos[index+1].strip()imgs=content.xpath(".//img/@src")movie['img_url']=imgs[0]download_url = html.xpath("//td[@bgcolor='#fdfddf']/a/@href")movie['download_url'] = download_urlreturn moviedef spider():urls = Film_urls()movies=[]for url in urls[:10]:print("正在爬取电影详细内容。。。")movie=Film_content(url)movies.append(movie)return moviesif __name__ == '__main__':movies=spider()print(movies)

如果觉得《python爬取电影天堂电影信息》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。