摘要
本节讲解html
zk的持久化框架 事务日志FileTxnLog日志结构 FileTxnLog源码 LogFormatter完成事务日志的反序列化 分析事务日志demo
持久化整体框架
持久化的类主要在包org.apache.zookeeper.server.persistence下,结构以下图java
TxnLog,接口类型,读取事务性日志的接口。 FileTxnLog,实现TxnLog接口,添加了访问该事务性日志的API。 Snapshot,接口类型,持久层快照接口。 FileSnap,实现Snapshot接口,负责存储、序列化、反序列化、访问快照。 FileTxnSnapLog,封装了TxnLog和SnapShot。 Util,工具类,提供持久化所需的API。
两种日志
zk主要存放了两类文件算法
snapshot(内存快照) log(事务日志,相似MySQL的binlog,将全部与修改数据相关的操做记录在log中)
关于事务性日志的定义,能够参照refer,简而言之就是 zk事务日志文件用来记录事物操做,每个事务操做如添加,删除节点等等,都会在事务日志中记录一条记录,用来在zookeeper异常状况下恢复数据数据库
下面介绍事务日志apache
事务日志
正常运行过程当中,针对全部更新操做,在返回客户端“更新成功”的响应前,ZK会确保已经将本次更新操做的事务日志写到磁盘上,只有这样,整个更新操做才会生效。session
接口TxnLog
public interface TxnLog { /** * roll the current * log being appended to * @throws IOException */ // 滚动日志,从当前日志滚到下一个日志,不是回滚 void rollLog() throws IOException; /** * Append a request to the transaction log * @param hdr the transaction header * @param r the transaction itself * returns true iff something appended, otw false * @throws IOException */ // 添加一个请求至事务性日志 boolean append(TxnHeader hdr, Record r) throws IOException; /** * Start reading the transaction logs * from a given zxid * @param zxid * @return returns an iterator to read the * next transaction in the logs. * @throws IOException */ // 读取事务性日志 TxnIterator read(long zxid) throws IOException; /** * the last zxid of the logged transactions. * @return the last zxid of the logged transactions. * @throws IOException */ // 事务性操做的最新zxid long getLastLoggedZxid() throws IOException; /** * truncate the log to get in sync with the * leader. * @param zxid the zxid to truncate at. * @throws IOException */ // 清空zxid之后的日志 boolean truncate(long zxid) throws IOException; /** * the dbid for this transaction log. * @return the dbid for this transaction log. * @throws IOException */ // 获取数据库的id long getDbId() throws IOException; /** * commmit the trasaction and make sure * they are persisted * @throws IOException */ // 提交事务并进行确认 void commit() throws IOException; /** * close the transactions logs */ // 关闭事务性日志 void close() throws IOException; /** * an iterating interface for reading * transaction logs. */ // 读取事务日志的迭代器接口 public interface TxnIterator { /** * return the transaction header. * @return return the transaction header. */ // 获取事务头部 TxnHeader getHeader(); /** * return the transaction record. * @return return the transaction record. */ // 获取事务 Record getTxn(); /** * go to the next transaction record. * @throws IOException */ // 下个事务 boolean next() throws IOException; /** * close files and release the * resources * @throws IOException */ // 关闭文件释放资源 void close() throws IOException; } }
实现类 FileTxnLog
文件结构
/** * The format of a Transactional log is as follows: * <blockquote><pre> * LogFile: * FileHeader TxnList ZeroPad * * FileHeader: { * magic 4bytes (ZKLG) * version 4bytes * dbid 8bytes * } * * TxnList: * Txn || Txn TxnList * * Txn: * checksum Txnlen TxnHeader Record 0x42 * * checksum: 8bytes Adler32 is currently used * calculated across payload -- Txnlen, TxnHeader, Record and 0x42 * * Txnlen: * len 4bytes * * TxnHeader: { * sessionid 8bytes * cxid 4bytes * zxid 8bytes * time 8bytes * type 4bytes * } * * Record: * See Jute definition file for details on the various record types * * ZeroPad: * 0 padded to EOF (filled during preallocation stage) * </pre></blockquote> */
主要接口
append
//添加一条事务性日志 public synchronized boolean append(TxnHeader hdr, Record txn) throws IOException { if (hdr != null) { // 事务头部不为空 if (hdr.getZxid() <= lastZxidSeen) { LOG.warn("Current zxid " + hdr.getZxid() + " is <= " + lastZxidSeen + " for " + hdr.getType()); } if (logStream==null) { //日志流为空 if(LOG.isInfoEnabled()){ LOG.info("Creating new log file: log." + Long.toHexString(hdr.getZxid())); } //生成一个新的log文件 logFileWrite = new File(logDir, ("log." + Long.toHexString(hdr.getZxid()))); fos = new FileOutputStream(logFileWrite); logStream=new BufferedOutputStream(fos); oa = BinaryOutputArchive.getArchive(logStream); //用TXNLOG_MAGIC VERSION dbId来生成文件头 FileHeader fhdr = new FileHeader(TXNLOG_MAGIC,VERSION, dbId); fhdr.serialize(oa, "fileheader");//序列化 // Make sure that the magic number is written before padding. logStream.flush(); currentSize = fos.getChannel().position(); streamsToFlush.add(fos); } padFile(fos);//剩余空间不够4k时则填充文件64M byte[] buf = Util.marshallTxnEntry(hdr, txn); if (buf == null || buf.length == 0) { throw new IOException("Faulty serialization for header " + "and txn"); } Checksum crc = makeChecksumAlgorithm();//生成验证算法 crc.update(buf, 0, buf.length); oa.writeLong(crc.getValue(), "txnEntryCRC");//将验证算法的值写入long Util.writeTxnBytes(oa, buf);//将序列化事务记录写入OutputArchive,以0x42('B')结束 return true; } return false; }
)app
getLogFiles
//找出<=snapshot的中最大的zxid的logfile以及后续的logfile public static File[] getLogFiles(File[] logDirList,long snapshotZxid) { List<File> files = Util.sortDataDir(logDirList, "log", true);//按照后缀抽取zxid,按zxid升序排序 long logZxid = 0; // Find the log file that starts before or at the same time as the // zxid of the snapshot for (File f : files) { long fzxid = Util.getZxidFromName(f.getName(), "log"); if (fzxid > snapshotZxid) { continue; } // the files // are sorted with zxid's if (fzxid > logZxid) { logZxid = fzxid; } } List<File> v=new ArrayList<File>(5); for (File f : files) { long fzxid = Util.getZxidFromName(f.getName(), "log"); if (fzxid < logZxid) { continue; } v.add(f); } return v.toArray(new File[0]); }
getLastLoggedZxid
//获取记录在log中的最后一个zxid public long getLastLoggedZxid() { File[] files = getLogFiles(logDir.listFiles(), 0); //找到最大的zxid所在的文件 long maxLog=files.length>0? Util.getZxidFromName(files[files.length-1].getName(),"log"):-1; // if a log file is more recent we must scan it to find // the highest zxid long zxid = maxLog; TxnIterator itr = null; try { FileTxnLog txn = new FileTxnLog(logDir); itr = txn.read(maxLog); while (true) { if(!itr.next()) break; TxnHeader hdr = itr.getHeader();//遍历这个文件,找到最后一条事务日志记录 zxid = hdr.getZxid();//取出zxid } } catch (IOException e) { LOG.warn("Unexpected exception", e); } finally { close(itr); } return zxid; }
commit
//提交事务日志至磁盘 public synchronized void commit() throws IOException { if (logStream != null) { logStream.flush();// 强制刷到磁盘 } for (FileOutputStream log : streamsToFlush) { log.flush();// 强制刷到磁盘 if (forceSync) { long startSyncNS = System.nanoTime(); log.getChannel().force(false); long syncElapsedMS = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startSyncNS); if (syncElapsedMS > fsyncWarningThresholdMS) { LOG.warn("fsync-ing the write ahead log in " + Thread.currentThread().getName() + " took " + syncElapsedMS + "ms which will adversely effect operation latency. " + "See the ZooKeeper troubleshooting guide"); } } } while (streamsToFlush.size() > 1) { streamsToFlush.removeFirst().close();// 移除流并关闭 } }
truncate
//清空大于指定zxid的事务日志 public boolean truncate(long zxid) throws IOException { FileTxnIterator itr = null; try { itr = new FileTxnIterator(this.logDir, zxid);//根据zxid找到迭代器 PositionInputStream input = itr.inputStream; if(input == null) { throw new IOException("No log files found to truncate! This could " + "happen if you still have snapshots from an old setup or " + "log files were deleted accidentally or dataLogDir was changed in zoo.cfg."); } long pos = input.getPosition(); // now, truncate at the current position RandomAccessFile raf = new RandomAccessFile(itr.logFile, "rw"); raf.setLength(pos);//把当前log后面的部分(zxid更大的)截断 raf.close(); while (itr.goToNextLog()) { if (!itr.logFile.delete()) {//把后面的log文件都删除 LOG.warn("Unable to truncate {}", itr.logFile); } } } finally { close(itr); } return true; }
rollLog
这个必定要看注释,意思不是回滚日志,是从当前日志滚到下一个框架
/** * rollover the current log file to a new one. * @throws IOException */ public synchronized void rollLog() throws IOException { if (logStream != null) { this.logStream.flush(); this.logStream = null; oa = null; } }
事务日志可视化 LogFormatter
能够结合org.apache.zookeeper.server.persistence.FileTxnLog#append进行理解 传入参数为对应的事务日志路径便可运维
public static void main(String[] args) throws Exception { if (args.length != 1) { System.err.println("USAGE: LogFormatter log_file"); System.exit(2); } FileInputStream fis = new FileInputStream(args[0]); BinaryInputArchive logStream = BinaryInputArchive.getArchive(fis); FileHeader fhdr = new FileHeader(); fhdr.deserialize(logStream, "fileheader"); //反序列化header完成验证 if (fhdr.getMagic() != FileTxnLog.TXNLOG_MAGIC) { System.err.println("Invalid magic number for " + args[0]); System.exit(2); } System.out.println("ZooKeeper Transactional Log File with dbid " + fhdr.getDbid() + " txnlog format version " + fhdr.getVersion()); int count = 0; while (true) { long crcValue; byte[] bytes; try { crcValue = logStream.readLong("crcvalue");//获取反序列化的checksum bytes = logStream.readBuffer("txnEntry"); } catch (EOFException e) { System.out.println("EOF reached after " + count + " txns."); return; } if (bytes.length == 0) { // Since we preallocate, we define EOF to be an // empty transaction System.out.println("EOF reached after " + count + " txns."); return; } Checksum crc = new Adler32(); crc.update(bytes, 0, bytes.length); if (crcValue != crc.getValue()) {//比较本身生成的checksum与传递过来的checksum throw new IOException("CRC doesn't match " + crcValue + " vs " + crc.getValue()); } TxnHeader hdr = new TxnHeader(); Record txn = SerializeUtils.deserializeTxn(bytes, hdr);//反序列化事务 System.out.println(DateFormat.getDateTimeInstance(DateFormat.SHORT, DateFormat.LONG).format(new Date(hdr.getTime())) + " session 0x" + Long.toHexString(hdr.getClientId()) + " cxid 0x" + Long.toHexString(hdr.getCxid()) + " zxid 0x" + Long.toHexString(hdr.getZxid()) + " " + TraceFormatter.op2String(hdr.getType()) + " " + txn); if (logStream.readByte("EOR") != 'B') { LOG.error("Last transaction was partial."); throw new EOFException("Last transaction was partial."); } count++; } }
事务日志可视化效果
针对http://www.jianshu.com/p/d1f8b9d6ad57贴出的demo 利用LogFormatter进行解析,事先把事务日志目录清空 输出为dom
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2 17-5-24 下午04时15分41秒 session 0x15c398687180000 cxid 0x0 zxid 0x1 createSession 20000 17-5-24 下午04时15分41秒 session 0x15c398687180000 cxid 0x2 zxid 0x2 create '/test1,#7a6e6f646531,v{s{31,s{'world,'anyone}}},T,1 17-5-24 下午04时15分41秒 session 0x15c398687180000 cxid 0x3 zxid 0x3 create '/test2,#7a6e6f646532,v{s{31,s{'world,'anyone}}},T,2 17-5-24 下午04时15分41秒 session 0x15c398687180000 cxid 0x4 zxid 0x4 create '/test3,#7a6e6f646533,v{s{31,s{'world,'anyone}}},T,3 17-5-24 下午04时15分43秒 session 0x15c398687180000 cxid 0x9 zxid 0x5 setData '/test2,#7a4e6f64653232,1 17-5-24 下午04时15分43秒 session 0x15c398687180000 cxid 0xb zxid 0x6 delete '/test2 17-5-24 下午04时15分43秒 session 0x15c398687180000 cxid 0xc zxid 0x7 delete '/test1 17-5-24 下午04时16分04秒 session 0x15c398687180000 cxid 0x0 zxid 0x8 closeSession null EOF reached after 8 txns.
结合FileTxnLog#append很好理解
吐槽
tag不匹配
序列化时 org.apache.zookeeper.server.persistence.FileTxnLog#append里面是 oa.writeLong(crc.getValue(), "txnEntryCRC");//将验证算法的值写入long 反序列化,解析的时候是 org.apache.zookeeper.server.LogFormatter#main crcValue = logStream.readLong("crcvalue"); 这俩tag都不同,虽然并不影响运行!!!
FileTxnLog#getLogFiles效率低
都已经按zxid升序排序了,一次循环就该搞定了
思考
文件后缀是按照zxid来生成的
logFileWrite = new File(logDir, ("log." + Long.toHexString(hdr.getZxid()))); 这对于定位文件,zxid都提供了一些便利 好比在getLastLoggedZxid中的调用
rollLog函数的意义
函数没有参数 必定要注意,是从当前日志,滚到下一个日志(好比日志量太大了之类的场景) 不是回滚日志里面的记录,试想回滚怎么能不告诉回滚的zxid呢
能够比较一下,rollLog函数形成logstream为null,append函数便会生成新的文件logFileWrite,新的流logStream
commit和rollLog两个函数都调用了flush,区别是什么
涉及到FileChannel,nio相关,
写入FileChannel调用链以下 org.apache.zookeeper.server.persistence.FileTxnLog#append org.apache.zookeeper.server.persistence.FileTxnLog#padFile org.apache.zookeeper.server.persistence.Util#padLogFile java.nio.channels.FileChannel#write(java.nio.ByteBuffer, long)
用了FileChannel的write方法
在commit函数中调用了 log.getChannel().force(false); 即java.nio.channels.FileChannel#force
查阅相关资料如 https://java-nio.avenwu.net/java-nio-filechannel.html 说明了
force方法会把全部未写磁盘的数据都强制写入磁盘。 这是由于在操做系统中出于性能考虑回把数据放入缓冲区,因此不能保证数据在调用write写入文件通道后就及时写到磁盘上了,除非手动调用force方法。 force方法须要一个布尔参数,表明是否把meta data也一并强制写入。
也就是只有commit方法会进行真正的写入磁盘,rollLog并无
事务日志何时会调用truncate 清空部分日志
集群版learner向leader同步的时候,leader告诉learner须要回滚同步 调用方Learner#syncWithLeader,后面40节会讲
问题
rollLog函数调用flush的做用
上面讲了commit和rollLog两个函数的区别 rollLog调用flush,最后的效果是什么呢?又没有写入磁盘(不然不会再调用commit) 写入了内存吗?又没有调用FileChannel的相关方法。
refer
http://www.cnblogs.com/leesf456/p/6279956.html 如何查看事务日志 FileTxnLog 什么是事务性日志 ZooKeeper运维之数据文件和事务日志