Java：OpenOffice方式实现Word转pdf/html/htm

时间 2019-11-07

标签 java openoffice 方式实现 word pdf html htm 栏目 Java 繁體版

原文原文链接

本来的想法想要直接在页面上实现预览，包括预览样式等等，相似这位博主这种方式：html

http://blog.csdn.net/lbf5210/article/details/50519190 可是发现其中的flowpaper貌似只有exe文件可下载，另外就是安装太多插件，麻烦。因此想了下，只要生成文件就是了，不采用其中的预览插件，这样就简单得多了。但必需要安装OpenOffice服务，方可实现转换。java

思路：传入已doc/docx文件流对象(无论后缀很方便，一次性处理) -> OpenOfficeConnection创建链接 -> DocumentConverter对象实现转换 -> 关闭OpenOfficeConnection链接。工具

其实实现过程也很简单，我这里写的工具类能够根据传入的文档的后缀及想要生成的文件格式来自动处理，post

其实关键代码就几行而已，以下：.net

DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
            converter.convert(docInputFile, htmlOutputFile);
            connection.disconnect();

其余都是细节问题，一样贴出完整代码demo，看注释便可：插件

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ConnectException;

import com.artofsolving.jodconverter.DocumentConverter;
import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter;


public class OfficeParseUtil {
	
	private static OfficeParseUtil officeParseUtil;
	
	/**
	 * 描述：实例化
	 * @return
	 * OfficeParseUtil
	 */
	public static synchronized OfficeParseUtil getOfficeParseUtil(){
		 if (officeParseUtil == null) {
			 officeParseUtil = new OfficeParseUtil();
	        }
	        return officeParseUtil;
	}
	
	/**
	 * 描述：转换office文件
	 * @param fromFileInputStream 文件
	 * @param toFilePath  保存地址
	 * @param fileName  文件完整名称，带后缀（如：doc docx xls ppt）
	 * @param type 转换类型，如：pdf html htm
	 * @return
	 * @throws IOException
	 * String
	 */
    public String parseOffice(InputStream fromFileInputStream, String toFilePath,String fileName,String type) throws IOException {
        String timesuffix = fileName.substring(1,fileName.indexOf(".")-1);//截取相同相同文件名
        String postfix = fileName.substring(fileName.indexOf(".")+1);//截取文件后缀
        String docFileName = null;
        String htmFileName = null;
        if("doc".equals(postfix)){
            docFileName = "doc_" + timesuffix + ".doc";
            htmFileName = "doc_" + timesuffix + "."+type;
        }else if("docx".equals(postfix)){
            docFileName = "docx_" + timesuffix + ".docx";
            htmFileName = "docx_" + timesuffix + "."+type;
        }else if("xls".equals(postfix)){
            docFileName = "xls_" + timesuffix + ".xls";
            htmFileName = "xls_" + timesuffix + "."+type;
        }else if("ppt".equals(postfix)){
            docFileName = "ppt_" + timesuffix + ".ppt";
            htmFileName = "ppt_" + timesuffix + "."+type;
        }else{
            return null;
        }

        File htmlOutputFile = new File(toFilePath + File.separatorChar + htmFileName);
        File docInputFile = new File(toFilePath + File.separatorChar + docFileName);
        if (!new File(toFilePath).exists()) {
        	docInputFile.getParentFile().mkdirs();
		}
        if (htmlOutputFile.exists())
            htmlOutputFile.delete();
        htmlOutputFile.createNewFile();
        if (docInputFile.exists())
            docInputFile.delete();
        docInputFile.createNewFile();
        /**
         * 由fromFileInputStream构建输入文件
         */
        try {
            OutputStream os = new FileOutputStream(docInputFile);
            int bytesRead = 0;
            byte[] buffer = new byte[1024 * 8];
            while ((bytesRead = fromFileInputStream.read(buffer)) != -1) {
                os.write(buffer, 0, bytesRead);
            }

            os.close();
            fromFileInputStream.close();
        } catch (IOException e) {
        }

        OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
        try {
            connection.connect();
            // convert
            DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
            converter.convert(docInputFile, htmlOutputFile);
            connection.disconnect();
        } catch (ConnectException e) {
        	htmFileName = null;
        	//转换出错删除临时文件
        	htmlOutputFile.delete();
            System.err.println("文件转换出错，请检查OpenOffice服务是否启动。");
        }finally{
        	// 转换完以后删除word文件
            docInputFile.delete();
        }
        return htmFileName;
    }
    
    public static void main(String[] args) throws IOException  {
    	OfficeParseUtil officeParseUtil = new OfficeParseUtil();
        File file = null;
        FileInputStream fileInputStream = null;

        file = new File("F:/wordtest/xxxxxx.doc");
        fileInputStream = new FileInputStream(file);
//      coc2HtmlUtil.file2Html(fileInputStream, "D:/poi-test/openOffice/docx","docx");
        officeParseUtil.parseOffice(fileInputStream, "F:/wordtest/pdf/","白蚁防治协议.doc","pdf");

    }

}

注意：code

一、启动OpenOffice服务服务。htm

二、这里不局限转换成pdf，也可转换成html 、 htm页面对象

三、转换后html/htm 的样式没有pdf的效果好，例如协议文件最后的签名存在排版有错乱blog

四、对文档中的Tab缩进的样式兼容也并很差

五、“1、 2、”使用这种Office自带样式标题会变成“一、二、”

感谢 http://blog.csdn.net/yjclsx/article/details/51445546