elasticsearch index之Translog

时间 2019-12-19

标签 elasticsearch index translog 栏目日志分析繁體版

原文原文链接

跟大多数分布式系统同样，es也经过临时写入写操做来保证数据安全。由于lucene索引过程当中，数据会首先据缓存在内存中直到达到一个量（文档数或是占用空间大小）才会写入到磁盘。这就会带来一个风险，若是在写入磁盘前系统崩溃，那么这些缓存数据就会丢失。es经过translog解决了这个问题，每次写操做都会写入一个临时文件translog中，这样若是系统须要恢复数据能够从translog中读取。本篇就主要分析translog的结构及写入方式。windows

这一部分主要包括两部分translog和tanslogFile，前者对外提供了对translogFile操做的相关接口，后者则是具体的translogFile，它是具体的文件。首先看一下translogFile的继承关系，以下图所示：缓存

实现了两种translogFile，它们的最大区别如名字所示就是写入时是否缓存。FsTranslogFile的接口以下所示：安全

每个translogFile都会有一个惟一Id，两个很是重要的方法add和write。add是添加对应的操做，这些操做都是在translog中定义，这里写入的只是byte类型的文件，不关注是何种操做。全部的操做都是顺序写入，所以读取的时候须要一个位置信息。add方法代码以下所示：app

 public Translog.Location add(BytesReference data) throws IOException {
        rwl.writeLock().lock();//获取读写锁，每一个文件的写入都是顺序的。
        try {
            operationCounter++;
            long position = lastPosition;
            if (data.length() >= buffer.length) {
                flushBuffer();
                // we use the channel to write, since on windows, writing to the RAF might not be reflected
                // when reading through the channel
                data.writeTo(raf.channel());//写入数据
                lastWrittenPosition += data.length();
                lastPosition += data.length();//记录位置
                return new Translog.Location(id, position, data.length());//返回由id，位置及长度肯定的操做位置信息。
            }
            if (data.length() > buffer.length - bufferCount) {
                flushBuffer();
            }
            data.writeTo(bufferOs);
            lastPosition += data.length();
            return new Translog.Location(id, position, data.length());
        } finally {
            rwl.writeLock().unlock();
        }
    }

这是SimpleTranslogFile写入操做，BufferedTransLogFile写入逻辑基本相同，只是它不会马上写入到硬盘，先进行缓存。另外TranslogFile还提供了一个快照的方法，该方法返回一个FileChannelSnapshot，能够经过它next方法将translogFile中全部的操做都读出来，写入到一个shapshot文件中。代码以下：分布式

    public FsChannelSnapshot snapshot() throws TranslogException {
        if (raf.increaseRefCount()) {
            boolean success = false;
            try {
                rwl.writeLock().lock();
                try {
                    FsChannelSnapshot snapshot = new FsChannelSnapshot(this.id, raf, lastWrittenPosition, operationCounter);
                    snapshot.seekTo(this.headsuccess = true;
                    returnerSize);
                     snapshot;
                } finally {
                    rwl.writeLock().unlock();
                }
            } catch (FileNotFoundException e) {
                throw new TranslogException(shardId, "failed to create snapshot", e);
            } finally {
                if (!success) {
                    raf.decreaseRefCount(false);
                }
            }
        }
        return null;
    }

TransLogFile是具体文件的抽象，它只是负责写入和读取，并不关心读取和写入的操做类型。各类操做的定义及对TransLogFile的定义到在Translog中。它的接口以下所示：性能

这里的写入（add）就是一个具体的操做，这是一个外部调用接口，索引、删除等修改索引的操做都会构造一个对应的Operation在对索引进行相关操做的同时调用该方法。这里还要着重说明一下makeTransientCurrent方法。操做的写入时刻进行，可是根据配置TransLogFile超过限度时须要删除从新开始一个新的文件。所以在transLog中存在两个TransLogFile，current和transient。当须要更换时须要经过读写锁确保单线程操做，将current切换到transient上来，而后删除以前的current。代码以下所示：this

 public void revertTransient() {

        FsTranslogFile tmpTransient;
        rwl.writeLock().lock();
        try {
            tmpTransient = trans;//交换
            this.trans = null;
        } finally {
            rwl.writeLock().unlock();
        }
        logger.trace("revert transient {}", tmpTransient);
        // previous transient might be null because it was failed on its creation
        // for example
        if (tmpTransient != null) {
            tmpTransient.close(true);
        }
    }

translog中定义了index，create，delete及deletebyquery四种操做它们都继承自Operation。这四种操做也是四种可以改变索引数据的操做。operation代码以下所示：spa

    static interface Operation extends Streamable {
        static enum Type {
            CREATE((byte) 1),
            SAVE((byte) 2),
            DELETE((byte) 3),
            DELETE_BY_QUERY((byte) 4);

            private final byte id;

            private Type(byte id) {
                this.id = id;
            }

            public byte id() {
                return this.id;
            }

            public static Type fromId(byte id) {
                switch (id) {
                    case 1:
                        return CREATE;
                    case 2:
                        return SAVE;
                    case 3:
                        return DELETE;
                    case 4:
                        return DELETE_BY_QUERY;
                    default:
                        throw new ElasticsearchIllegalArgumentException("No type mapped for [" + id + "]");
                }
            }
        }

        Type opType();

        long estimateSize();

        Source getSource();
    }

tanslog部分就是实时记录全部的修改索引操做确保数据不丢失，所以它的实现上不上很是复杂。线程

总结：TransLog主要做用是实时记录对于索引的修改操做，确保在索引写入磁盘前出现系统故障不丢失数据。tanslog的主要做用就是索引恢复，正常状况下须要恢复索引的时候很是少，它以stream的形式顺序写入，不会消耗太多资源，不会成为性能瓶颈。它的实现上，translog提供了对外的接口，translogFile是具体的文件抽象，提供了对于文件的具体操做。code