基于PHP/CURL/codeIgniter的Spider Webbot爬虫[0]-使用原生PHP的fopen去抓取整个网页

时间 2019-12-13

标签基于 php curl codeigniter spider webbot 爬虫使用原生 fopen 抓取整个网页栏目 PHP 繁體版

原文原文链接

学了7天的PHP/CURL，写了一个爬虫开源项目。html

如今把全部的笔记放到Segmentfault记录下来，算是一个记念。git

https://github.com/hosinoruri/Omoikane

$target="http://www.WebbotsSpidersScreenScrapers.com/hello_world.html";//定义抓取下载的档案
//$file_handle=fopen($target, "r");//对目标档案创建一个网络链接。$file_handle只是一个文件名
$downloaded_page_array=file($target);// this is a arraygithub

//显示档案的内容
for ($xx=0; $xx < count($downloaded_page_array); $xx++)
echo $downloaded_page_array[$xx];//抓取csv和excel文档特别有效，HTML效果不大
//使用file()把从目标网站抓取下来的文件保存成数组，经过for输出，以$xx做为一个始终少于抓取下来的数组
//下标来限制输出数目，经过循环里面打印数组输出完整的网页数组

/*
//取得档案
while (!feof($file_handle)) {
echo fgets($file_handle,4096);//程序使用fget()，以4096位一块的方式取得并显示这个档案，直到下载完毕
}
fclose($file_handle);//关闭这个链接
//使用能够连html标记也打印出来
*/网络