Python 实现 Html 转 Markdown（支持 MathJax 数学公式）

时间 2019-11-06

标签 python 实现 html markdown 支持 mathjax 数学公式栏目 Python 繁體版

原文原文链接

由于须要转 html 到 markdown，找了个 python 的库，该库主要是利用正则表达式实现将 Html 转为 Markdown。html

数学公式须要本身修改代码来处理。python

我 fork 的项目地址：https://github.com/fipped/tomdgit

使用方法：

把项目 clone 到当前路径，而后新建一个 python 文件：github

#coding:utf-8
from tomd import tomd
import os

# 全部博客 html 文件在目录blog 里
root="blog"
for file in os.listdir(root):
    path = os.path.join(root, file)
    if os.path.isfile(path):
        filename = os.path.splitext(file)
        if filename[1] == '.html':
            tomd.Tomd("".join(open(path).readlines()),root,file).export()

运行完，就可在blog目录看到全部 html 对应的.md 文件了。正则表达式

正则的一些用法：

.*? ：.是除了换行的任意字符，*是重复任意次，?表示非贪婪匹配，因此 <h1.*?>(.*?)</h1>匹配完<h1.*?>后就会匹配最先出现的</h1>。markdown
[\s\S]*?：\s是空白符，包括空格、换行等，\S是非空白符，因此就是任意字符重复任意次的非贪婪匹配。code
((?!sometext).)*?：这里就是非贪婪地匹配不是字符串sometext的任意内容任意次。htm