失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > Python网络爬虫--项目实战--scrapy爬取人人车

Python网络爬虫--项目实战--scrapy爬取人人车

时间:2020-01-15 08:20:16

相关推荐

Python网络爬虫--项目实战--scrapy爬取人人车

一、目标

爬取多页人人车的车辆信息

二、分析

2.1 网站分析

在网页源代码中可以搜索到页面中的数据,所以可以判断该页面为静态加载的

三、完整代码

renrenche.py

import scrapyfrom car.items import RrcItemclass RenrencheSpider(scrapy.Spider):name = 'renrenche'allowed_domains = ['']start_urls = ['/bj/ershouche/?&plog_id=618ab1bbf616cab93022afa088592885']base_url = ''def parse(self, response):selector = response.xpath('//ul[contains(@class,"row-fluid list-row js-car-list")]/li/a[not(@rel)]')# print(len(selector))# print(selector)for car in selector:car_name = car.xpath('./h3/text()').extract_first()total_price = car.xpath('./div[contains(@class,"tags-box")]/div/text()').extract_first().replace("\n","").replace(" ","") +"万"down_pay = car.xpath('./div[contains(@class,"tags-box")]/div/div/div/text()').extract_first()car_detail = car.xpath('./@href').extract_first()car_item = RrcItem()car_item['car_name'] = car_namecar_item['car_price'] = total_pricecar_item['down_pay'] = down_payyield car_itemflag = response.xpath('//ul[contains(@class,"pagination js-pagination")][last()]/@class').extract_first()if not flag:url = response.xpath('//ul[contains(@class,"pagination js-pagination")]/li[last()]/a/@href').extract_first()yield scrapy.Request(url=self.base_url+url,callback=self.parse)

pipelines.py

# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: /en/latest/topics/item-pipeline.html# useful for handling different item types with a single interfaceimport MySQLdbfrom itemadapter import ItemAdapterfrom car.spiders.renrenche import RenrencheSpiderclass CarPipeline:def process_item(self, item, spider):return itemclass RrcPipeline:def open_spider(self,spider):conn = MySQLdb.Connect(host='localhost',user='root',password='6666',port=3306,database='maiche',charset='utf8')cursor = conn.cursor()self.conn = connself.cursor = cursordef process_item(self, item, spider):if isinstance(spider,RenrencheSpider):self.cursor.execute("insert into car(carname,totalprice,downpay) values('%s',""'%s','%s');" %(item.get('car_name'),item.get('car_price'),item.get('down_pay')))mit()return itemdef close_spider(self,spider):self.conn.close()

四、遇到的坑

1.创建数据库连接时没有加编码格式

如果觉得《Python网络爬虫--项目实战--scrapy爬取人人车》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。