后端开发|php教程
Curl 采集乱码与采集不到 PHP
后端开发-php教程
PHP程序是用gbk2312编码的:
网站项目源码下载,vscode j配置文件,Ubuntu跑bench,无法连接tomcat原因,爬虫模板代码,php 获取当前编码,seo快排优化免费咨询,最新网站源代码asp acc,织梦 模板文件在哪lzw
<?php
$url = “”;//gbk2312编码
//$url = “”;//gbk2312编码
//$url = “”;//gbk2312编码
dedecms 5.7源码,vscode无法读取文件,Ubuntu增加终端,tomcat跨系统吗,第一个sqlite,爬虫进一步学什么,php的魔术变量,seo顾问优化推广软件,网络营销网站源码,bootstrap 炫酷模板lzw
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , true);//返回获取的输出的文本流
$ret = curl_exec($ch);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
curl_close($ch);
echo $ret;
源码目录结构,vscode如何修改图片,ubuntu网络加速,tomcat 增加并发数,水爬虫价格,php excle,武隆区seo优化优惠码,个人网站怎么收款,zblog商品展示模板lzw
?>
在采集时,是正常的,但是采集时是为空的,采集时是丢码的.
这是怎么回事呢?如何解决?有哪位怎么呀?先谢谢了!!!没多少分了,不好意思。
回复讨论(解决方案)
网易限制了API采集不到。sohu也可能限制了。
用 fopen 或 file_get_content可以,但file_get_content容易出现超时就停止程序执行了。
别的不说,我就是来拿分的.楼主记得给全分
$curl=curl_init(\);curl_setopt($curl,CURLOPT_RETURNTRANSFER,1);curl_setopt($curl,CURLOPT_USERAGENT,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322));$html=curl_exec($curl);var_dump($html);$curl=curl_init(\);curl_setopt($curl,CURLOPT_RETURNTRANSFER,1);curl_setopt($curl,CURLOPT_USERAGENT,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322));$html=curl_exec($curl);//$html=strstr($html,<);$html=gzdecode($html);var_dump($html);function gzdecode($data) {$len = strlen($data);if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {return null; // Not GZIP format (See RFC 1952)}$method = ord(substr($data,2,1)); // Compression method$flags = ord(substr($data,3,1)); // Flagsif ($flags & 31 != $flags) {// Reserved bits are set -- NOT ALLOWED by RFC 1952return null;}// NOTE: $mtime may be negative (PHP integer limitations)$mtime = unpack("V", substr($data,4,4));$mtime = $mtime[1];$xfl = substr($data,8,1);$os = substr($data,8,1);$headerlen = 10;$extralen = 0;$extra = "";if ($flags & 4) {// 2-byte length prefixed EXTRA data in headerif ($len - $headerlen - 2 < 8) { return false; // Invalid format}$extralen = unpack("v",substr($data,8,2));$extralen = $extralen[1];if ($len - $headerlen - 2 - $extralen < 8) { return false; // Invalid format}$extra = substr($data,10,$extralen);$headerlen += 2 + $extralen;} $filenamelen = 0;$filename = "";if ($flags & 8) {// C-style string file NAME data in headerif ($len - $headerlen - 1 < 8) { return false; // Invalid format}$filenamelen = strpos(substr($data,8+$extralen),chr(0));if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // Invalid format}$filename = substr($data,$headerlen,$filenamelen);$headerlen += $filenamelen + 1;} $commentlen = 0;$comment = "";if ($flags & 16) {// C-style string COMMENT data in headerif ($len - $headerlen - 1 < 8) { return false; // Invalid format}$commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format}$comment = substr($data,$headerlen,$commentlen);$headerlen += $commentlen + 1;} $headercrc = "";if ($flags & 1) {// 2-bytes (lowest order) of CRC32 on header presentif ($len - $headerlen - 2 < 8) { return false; // Invalid format}$calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;$headercrc = unpack("v", substr($data,$headerlen,2));$headercrc = $headercrc[1];if ($headercrc != $calccrc) { return false; // Bad header CRC}$headerlen += 2;} // GZIP FOOTER - These be negative due to PHPs limitations$datacrc = unpack("V",substr($data,-8,4));$datacrc = $datacrc[1];$isize = unpack("V",substr($data,-4));$isize = $isize[1]; // Perform the decompression:$bodylen = $len-$headerlen-8;if ($bodylen 0) {switch ($method) { case 8:// Currently the only supported compression method:$data = gzinflate($body);break; default:// Unknown compression methodreturn false;}} else {// Im not sure if zero-byte body content is allowed.// Allow it for now... Do nothing...} // Verifiy decompressed size and CRC32:// NOTE: This may fail with large data sizes depending on how//PHPs integer limitations affect strlen() since $isize//may be negative for large sizes.if ($isize != strlen($data) || crc32($data) != $datacrc) {// Bad format! Length or CRC doesn match!return false;}return $data; }
非常感谢young5335,给全分,可惜就这么点分了,想多给都不行呀。
curl_setopt($ch, CURLOPT_USERAGENT,’Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)’);
那么一大堆代码,这句最有用,也解决了问题
如果觉得《Curl 采集乱码与采集不到 PHP》对你有帮助,请点赞、收藏,并留下你的观点哦!