上次简述了使用poi读取大xls文件,这里说下读取xlsx格式的文件的方法html
先准备一个大的excel文件(xlsx大小5M),再将jvm的heap缩小到100m(JVM 参数 -Xmx100m)用于模拟OOM
并使用参数在OOM时dump内存 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=d://dump.hprofjava
在gradle中引入解析xlsx须要的jar包apache
compile 'org.apache.poi:poi:3.15' compile 'org.apache.poi:poi-ooxml:3.15' compile 'xerces:xercesImpl:2.11.0'
以后读取xlsx文件api
public static void main(String [] args) throws IOException { InputStream is = new FileInputStream("d://large.xlsx"); Workbook wb = new XSSFWorkbook(is); }
运行以后app
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:136) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:56) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:342) at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37) at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:285) at blog.excel.Xlsx.main(Xlsx.java:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
一样报出了OOM,缘由也是处理xlsx时会将数据彻底读入内存,致使内存溢出。jvm
POI也为xlsx提供了流式读取的方式,用于减少内存的使用xss
public class ExampleEventUserModel{ public void processOneSheet(String filename) throws Exception { OPCPackage pkg = OPCPackage.open(filename); XSSFReader r = new XSSFReader( pkg ); SharedStringsTable sst = r.getSharedStringsTable(); XMLReader parser = fetchSheetParser(sst); // 得到第一个sheet InputStream sheet2 = r.getSheet("rId1"); InputSource sheetSource = new InputSource(sheet2); parser.parse(sheetSource); sheet2.close(); } public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException { XMLReader parser = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser" ); ContentHandler handler = new SheetHandler(sst); parser.setContentHandler(handler); return parser; } /** * 处理sax的handler */ private static class SheetHandler extends DefaultHandler { private SharedStringsTable sst; private String lastContents; private boolean nextIsString; private SheetHandler(SharedStringsTable sst) { this.sst = sst; } //元素开始时的handler public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException { // c => 单元格 if(name.equals("c")) { System.out.print(attributes.getValue("r") + " - "); // 获取单元格类型 String cellType = attributes.getValue("t"); if(cellType != null && cellType.equals("s")) { nextIsString = true; } else { nextIsString = false; } } lastContents = ""; } //元素结束时的handler public void endElement(String uri, String localName, String name) throws SAXException { if(nextIsString) { int idx = Integer.parseInt(lastContents); lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString(); nextIsString = false; } // v => 单元格内容 if(name.equals("v")) { System.out.println(lastContents); } } //读取元素间内容时的handler public void characters(char[] ch, int start, int length) throws SAXException { lastContents += new String(ch, start, length); } } public static void main(String[] args) throws Exception { ExampleEventUserModel example = new ExampleEventUserModel(); example.processOneSheet("d://large.xlsx"); } }
一样的使用这种方法能够流式读取打的xlsx文件,可是只限于读取内部的数据,并且没法进行修改操做。以后会介绍写大文件的方法fetch