首先使用开发者工具获取所需爬取的url,如图所示html
查看数据类型为json格式,看以前各位大佬的博客叫json数据类型,json
用json库loads转换为dict数据格式取出每个元素,再打开文件写入数据,网络
import requests import json try: r=requests.get('https://edu.cnblogs.com/Homework/GetAnswers?homeworkId=2420&_=1544072161608') r.raise_for_status() r.encoding=r.apparent_encoding datas=json.loads(r.text)['data'] except: print("网络错误") else: crawl='' for data in datas: crawl+=str(data["StudentNo"])+","+data["RealName"]+","+data["DateAdded"].replace("T"," ")+","+data["Title"]+","+data["Url"]+"\n" with open ('hwlist.csv','w') as f: f.write(crawl)
以上是源代码,下面是结果app