分析智联招聘的API接口，进行数据爬取

时间 2019-11-19

标签分析招聘 api 接口进行数据繁體版

原文原文链接

一丶简介

如今的网站基本上都是先后端分离的，前端的你看到的数据，基本上都不是HTML上的和数据，都是经过后端语言来读取数据库服务器的数据而后动态的加载数据到前端的网页中。前端

而后天然而然的而后随着ajax技术的出现，前端的语言也能够实现对后端数据库中的数据进行获取，而后就出现了api接口这一说法。简单的说就是经过特定的参数和地址来对某一网站的某个接口进行数据的获取。python

通常api接口获取到的数据都是json的，就算不是接送的数据，也是又规律，又秩序的数据。对于这些数据进行分析，那是很是简单的。ajax

这也只是本人的一个小小的见解和简单的理解。数据库

二丶分析

进入到智联招聘的官方网站中，按F12进入到开发者模式中。从数据的加载中能够很轻易的找到三个api接口json

第一个API接口

https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity=合肥app

参数	做用
输入你要的查询的城市的名称	会使返回的结果有按城市的编码（code）

第二个API接口

https://dict.zhaopin.cn/dict/dictOpenService/getDict?dictNames=region_relation,education,recruitment,education_specialty,industry_relation,careet_status,job_type_parent,job_type_relationpython爬虫

参数值	return—result（code）
region_relation	地区信息
education	学历信息
recruitment	招聘信息（是否统招）
education_specialty	职业类别
industry_relation	行业
careet_status	到岗状态
job_type_parent	职位类别
job_type_relation	职位

第三个API接口

https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId=664&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3

这个API接口的值都是在上面两个接口中获取到的代码，

参数	做用
pageSize	获取的数据的大小
cityId	城市
workExperience	工做经验
education	学历
companyType	公司性质
employmentType	职位类型
jobWelfareTag	工做福利
kw	关键字
kt	值可变，做用暂时不明，参数不能少

三丶数据爬取

如今API接口都已经找到了，就是数据的获取和本地的存储了。

爬取数据的目标

根据输入城市来进行数据的查询和存储，本次数据只查找python的工做岗位

每一个职位信息中都有不少的字段信息，为了方便我就只提取几个字段，方法相同

所有代码：

"""
本次的数据爬取只作简单的反爬虫预防策略
"""
import requests
import os
import json

class siper(object):
    def __init__(self):
        self.header={
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
            "Origin":"https://sou.zhaopin.com",
            "Host":"fe-api.zhaopin.com",
            "Accept-Encoding":"gzip, deflate, br"
        }
        print("职位查询程序开始······")
        # 打开文件
        self.file = "result.json"
        path = os.getcwd()
        pathfile = os.path.join(path,self.file)
        self.fp = open(pathfile,"w",encoding="utf-8")
        self.fp.write("[\n")

    def get_response(self,url):
        return requests.get(url=url,headers = self.header)

    def get_citycode(self,city):
        url = "https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity={}".format(city)
        response = self.get_response(url)
        result = json.loads(response.text)
        return result['data']['code']

    def parse_data(self,url):
        response = self.get_response(url)
        result = json.loads(response.text)['data']['results']
        items = []
        for i in result:
            item = {}
            item['职位'] = i['jobName']
            item['工资'] = i['salary']
            item['招聘状态'] = i['timeState']
            item['经验要求'] = i['workingExp']['name']
            item['学历要求'] = i['eduLevel']['name']
            items.append(item)
        return items

    def save_data(self,items):
        num = 0
        for i in items:
            num = num + 1
            self.fp.write(json.dumps(i,ensure_ascii=False))
            if num == len(items):
                self.fp.write("\n")
            else:
                self.fp.write(",\n")
            print("%s--%s"%(str(num),str(i)))

    def end(self):
        self.fp.write("]")
        self.fp.close()
        print("职位查询程序结束······")
        print("数据已写入到{}文件中······".format(self.file))

    def main(self):
        try:
            cityname = input("请输入你要查询的城市的名称（市级城市）：")
            city = self.get_citycode(cityname)
            url = "https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId={}&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3".format(
                city)
            items = self.parse_data(url)
            self.save_data(items)
            self.end()
        except Exception as e:
            print("城市输入错误！！！（强制退出程序）")
            print(e)
            exit(0)


if __name__ == '__main__':
    siper = siper()
    siper.main()

执行结果：

执行结果文件：

四丶总结

这个程序的逻辑和代码的书写都比较简单，属于爬虫的基础内容，比较复杂的就是API接口的寻找。

本文中的接口返回的就是json文件，因此数据的分析部分比较简单，运用python中的json模块，能够很快的将数据分析出来。

本人也是python爬虫数据分析的入门学生，但愿和你们一块儿学习一块儿进步，

本文中的内容属于学习使用，不用于商业盈利。