一步步编写本身的PHP爬取代理IP项目（二）

时间 2019-11-16

标签步步编写本身 php 取代项目栏目 PHP 繁體版

原文原文链接

这一章节咱们正式开展咱们的爬虫项目，首先咱们先要知道哪一个网站能获取到免费代理IP，目前比较火的有西刺代理，快代理等，这里咱们拿西刺代理做为例子。php

这里就是一个个免费的IP地址以及各自的端口号，咱们的任务就是要把这些IP和端口号爬取下来，检测其可用性而且存储起来。linux

首先咱们须要编写一个入口文件，咱们命名为run.php，其内容大概是这样：windows

use ProxyPool\core\ProxyPool;

$proxy = new ProxyPool();
$proxy->run();

实例化ProxyPool而且调用里面的run方法，而咱们要用到命名空间而且use它，天然就避免不了一个autoloader（根据命名自动加载对应的文件）。网站

代码以下：ui

<?php
namespace AutoLoad;

class autoloader
{
    /**     
    * 根据命名自动加载     
    *     
    * @param string $name  use的路径，例如咱们这里就是ProxyPool\core\ProxyPool  
    * @return boolean     
    */
    public static function load_namespace($name)
    {
        //兼容windows和linux的目录分隔符
        $class_path = str_replace('\\', DIRECTORY_SEPARATOR, $name);
        
        //获取文件路径        
        $class_file = __DIR__ . substr($class_path, strlen('ProxyPool')) . '.php';
        
         //若是不存在，去上一层目录寻找     
        if (empty($class_file) || !is_file($class_file))             
        {                
           $class_file = __DIR__ . DIRECTORY_SEPARATOR . '..' . DIRECTORY_SEPARATOR . "$class_path.php";            
        }
        
        if (is_file($class_file))         
        {            
            require_once($class_file);            
            if (class_exists($name, false))             
            {                
                return true;            
            }        
        }        
        return false;
    }
}
//spl注册自动加载
spl_autoload_register('\AutoLoad\autoloader::load_namespace');

而后咱们再回来修改咱们的run.php文件：spa

<?php
require_once __DIR__ . '/autoloader.php';

use ProxyPool\core\ProxyPool;

$proxy = new ProxyPool();
$proxy->run();

这样咱们就能够经过命名空间直接use咱们本身写好的各个类文件啦。代理