极客时间(time.geekbang.org/),想必你们都知道的,上面有不少值得你们学习的课程,以下图:html
直接看一下最终效果图java
专栏课程生成本地htmlgit
视频课程中的视频文件采集到本地web
你们请先买某个课程,而后才能够采集chrome
登陆地址: time.geekbang.org/json
cookie 中存储了当前帐号的登陆凭证,采集数据的时候须要用到这些信息系,在chrome浏览器中按F12能够获取到cookie信息,以下图:浏览器
采集代码比较多,已上传至gitee:gitee.com/likun_557/j…cookie
修改com.ady01.demo4.jksj.util.CollectorUtil类中**COOKIE_VALUE*的值替换为你的cookieide
public static final String COOKIE_VALUE = "_ga=GA1.2.1259366273.1550461508; _gid=GA1.2.556986769.1555908262; GCID=f412bb7-029";
复制代码
修改com.ady01.demo4.jksj.util.CollectorUtilTest中的 cid 的值学习
@Test
public void articleList() throws Exception {
//须要采集的专栏id
long cid = 139L;
ColumnDto columnDto = CollectorUtil.articleList(cid);
ColumnCollectorResponse columnCollectorResponse = columnDto.getColumnCollectorResponse();
List<ArticleCollectorResponse> articleCollectorResponseList = columnDto.getArticleCollectorResponseList();
String articleCollectorResponseListJson = FrameUtil.json(articleCollectorResponseList, true);
log.info("articleCollectorResponseList:{}", articleCollectorResponseListJson);
String s = FreemarkerUtil.getFtlToString("column",
FrameUtil.newHashMap(
"articleCollectorResponseListJson", articleCollectorResponseListJson,
"columnCollectorResponse", columnCollectorResponse));
//将采集生成的html保存到本地
FileUtils.write(new File("D:\\极客时间\\" + columnCollectorResponse.getColumn_title() + ".html"), s, "utf-8");
}
复制代码
执行com.ady01.demo4.jksj.util.CollectorUtilTest中的articleList方法,采集成功
生成的文件
浏览器中打开
修改com.ady01.demo4.jksjvideo.util.CollectorUtil类中**COOKIE_VALUE*的值替换为你的cookie
public static final String COOKIE_VALUE = "_ga=GA1.2.1259366273.1550461508; _gid=GA1.2.556986769.1555908262; GCID=f412bb7-029";
复制代码
修改com.ady01.demo4.jksjvideo.util.CollectorUtilTest中的 cid 的值
@Test
public void saveCourseDto() throws IOException {
//视频保存的地址
String saveDir = "D:\\极客时间\\%s";
//视频课程id
Long cid = 160L;
CourseDto courseDto = CollectorUtil.courseDto(cid);
log.info("courseDto:{}", FrameUtil.json(courseDto, true));
for (ArticleCollectorResponse articleCollectorResponse : courseDto.getArticleCollectorResponseList()) {
try {
String dir = String.format(saveDir + "\\%s", courseDto.getCourseCollectorResponse().getColumn_title(), articleCollectorResponse.getId());
CollectorUtil.saveFile(articleCollectorResponse, dir);
} catch (IOException e) {
log.error(e.getMessage(), e);
}
}
int i = 1;
for (ArticleCollectorResponse articleCollectorResponse : courseDto.getArticleCollectorResponseList()) {
File file = new File(String.format(saveDir + "\\%s", courseDto.getCourseCollectorResponse().getColumn_title(), articleCollectorResponse.getId()), String.format("%s.%s", articleCollectorResponse.getId(), ".ts"));
String s = FrameUtil.generateCode(i + "", 3, "0", true);
File newFile = new File(String.format(saveDir + "\\video", courseDto.getCourseCollectorResponse().getColumn_title()),
String.format("%s、%s.%s", s, articleCollectorResponse.getArticle_title().substring(articleCollectorResponse.getArticle_title().indexOf("|") + 2), "ts").replaceAll("\\?", ""));
FileUtils.copyFile(file, newFile);
i++;
}
}
复制代码
执行com.ady01.demo4.jksjvideo.util.CollectorUtilTest中的saveCourseDto方法,采集成功
关注公众号:路人甲Java,发送“极客时间”,获取视频采集的源码