CMU数据库（15-445）Lab1-BufferPoolManager

时间 2021-01-22

标签 html node ios c++ 算法数据库数组安全数据结构并发栏目 SQL 繁體版

原文原文链接

0. 关于环境搭建请看

http://www.javashuo.com/article/p-upeakuwf-oa.htmlhtml

1. Task1 LRU REPLACEMENT POLICY

0. 任务描述

这个任务要求咱们实如今课堂上所描述的LRU算法最近最少使用算法。node

你须要实现下面这些函数。请确保他们都是线程安全的。ios

Victim(T*) : Remove the object that was accessed the least recently compared to all the elements being tracked by the Replacer, store its contents in the output parameter and return True. If the Replacer is empty return False.
Pin(T) : This method should be called after a page is pinned to a frame in the BufferPoolManager. It should remove the frame containing the pinned page from the LRUReplacer.
Unpin(T) : This method should be called when the pin_count of a page becomes 0. This method should add the frame containing the unpinned page to the LRUReplacer.
Size() : This method returns the number of frames that are currently in the LRUReplacer.

关于Lock和Lathes的区别请看下文。c++

https://stackoverflow.com/questions/3111403/what-is-the-difference-between-a-lock-and-a-latch-in-the-context-of-concurrent-a/42464336#42464336算法

1. 实现

其实这个任务仍是蛮简单的。你只须要清楚什么是最近最少使用算法便可。数据库

LRU 算法的设计原则是：若是一个数据在最近一段时间没有被访问到，那么在未来它被访问的可能性也很小。也就是说，当限定的空间已存满数据时，应当把最久没有被访问到的数据淘汰。数组

这个题我熟啊。leetcode上有原题。并且要求在o(1)的时间复杂度实现这一任务。安全

https://leetcode-cn.com/problems/lru-cache/数据结构

为了实如今O(1)时间内进行查找。所以咱们能够用一个hash表。并且咱们要记录一个时间戳来完成记录最近最少使用的块是谁。这里咱们能够用list来实现。并发

若是咱们访问了链表中的一个元素。就把这个元素放在链表头部。这样放在链表尾部的元素必定就是最近最少使用的元素。

为了让插入和删除均为O(1)咱们能够用双向链表来实现。

这里对于pin和unpin操做实际上对于了task2。咱们为何须要pin。书上给了咱们答案。下面咱们也进行了分析

1.1 数据结构设计

struct Node{
  Node(frame_id_t v) :value(v) {}
  frame_id_t value;
  std::shared_ptr<Node> left;
  std::shared_ptr<Node> right;
};

这里咱们用了双向链表。主要是为了删除和插入均为0(1)的时间复杂度

1.2 辅助函数设置

这里咱们须要两个辅助函数remove和insert

这里的head和tail为头节点和尾节点。这样写可以减小对于边界条件判断。在构造函数内咱们进行初始化

LRUReplacer::LRUReplacer(size_t num_pages) {
  head.reset(new Node(-1));
  tail.reset(new Node(-1));
  capacity=num_pages;
  head->right=tail;
  tail->left=head;
}

关于头节点和尾节点的做用能够参考下文。

https://blog.csdn.net/qq_41809589/article/details/86550994

insert函数负责把一个节点插入到链表头部。

void LRUReplacer::insert(std::shared_ptr<Node> node) {
  if (node == nullptr) {
    return;
  }
  node->right = head->right;
  node->left = head;
  head->right->left=node;
  head->right=node;
  hash[node->value] = node;
  size++;
}

remove函数负责把一个节点从链表中移除

bool LRUReplacer::remove(const frame_id_t &value) {
  auto iter=hash.find(value);
  if(iter==hash.end())return false;
  auto  node=iter->second;
  node->right->left=node->left;
  node->left->right=node->right;
  hash.erase(value);
  size--;
  return true;
}

1.3 Victim 函数实现

注意这里必需要加锁，以防止并发错误。

若是没有能够牺牲的页直接返回false
若是有的话选择在链表尾部的页。remove它便可

bool LRUReplacer::Victim(frame_id_t *frame_id) {
  std::scoped_lock lru_clk{lru_mutex};
  if (hash.empty()) {
    return false;
  }
  auto id = tail->left;
  remove(id->value);
  *frame_id = id->value;
  return true;
}

1.4 pin 函数实现

注意这里必需要加锁，以防止并发错误。

若是这个页存在则直接remove（由于这个时候它的pin_couter=0

void LRUReplacer::Pin(frame_id_t frame_id) {
  std::scoped_lock lru_clk{lru_mutex};
  if(hash.count(frame_id))remove(frame_id);
}

1.5 unpin 函数实现

注意这里必需要加锁，以防止并发错误。

先看一下这个页是否在可替换链表中
若是它不存在的话。则须要看一下当前链表是否还有空闲位置。若是有的话则直接加入
若是没有则须要移除链表尾部的节点知道有空余位置

void LRUReplacer::Unpin(frame_id_t frame_id) {
  std::scoped_lock lru_clk{lru_mutex};
  auto iter=hash.find(frame_id);
  if(iter==hash.end()){
    if (hash.size() >= capacity) {
      // need to remove item
      while (hash.size() >= capacity) {
       auto p=tail->left;
       remove(p->value);
      }
    }
    auto newNode = std::make_shared<Node>(frame_id);
    insert(newNode);

  }
}

2. 测试

执行下面的语句便可

cd build
 make lru_replacer_test
 ./test/lru_replacer_test

能够发现成功经过

Task2 BUFFER POOL MANAGER

0. 任务描述

接下来，您须要在系统中实现缓冲池管理器(BufferPoolManager)。BufferPoolManager负责从DiskManager获取数据库页面并将它们存储在内存中。BufferPoolManage还能够在有要求它这样作时，或者当它须要驱逐一个页以便为新页腾出空间时，将脏页写入磁盘。为了确保您的实现可以正确地与系统的其他部分一块儿工做，咱们将为您提供一些已经填写好的功能。您也不须要实现实际读写数据到磁盘的代码(在咱们的实现中称为DiskManager)。咱们将为您提供这一功能。

系统中的全部内存页面均由Page对象表示。 BufferPoolManager不须要了解这些页面的内容。可是，做为系统开发人员，重要的是要了解Page对象只是缓冲池中用于存储内存的容器，所以并不特定于惟一页面。也就是说，每一个Page对象都包含一块内存，DiskManager会将其用做复制从磁盘读取的物理页面内容的位置。 BufferPoolManager将在将其来回移动到磁盘时重用相同的Page对象来存储数据。这意味着在系统的整个生命周期中，相同的Page对象可能包含不一样的物理页面。Page对象的标识符（page_id）跟踪其包含的物理页面。若是Page对象不包含物理页面，则必须将其page_id设置为INVALID_PAGE_ID。

每一个Page对象还维护一个计数器，以显示“固定”该页面的线程数。BufferPoolManager不容许释放固定的页面。每一个Page对象还跟踪它的脏标记。您的工做是判断页面在解绑定以前是否已经被修改（修改则把脏标记置为1）。BufferPoolManager必须将脏页的内容写回磁盘，而后才能重用该对象。

BufferPoolManager实现将使用在此分配的前面步骤中建立的LRUReplacer类。它将使用LRUReplacer来跟踪什么时候访问页对象，以便在必须释放一个帧觉得从磁盘复制新的物理页腾出空间时，它能够决定取消哪一个页对象

你须要实如今(src/buffer/buffer_pool_manager.cpp):的如下函数

FetchPageImpl(page_id)
NewPageImpl(page_id)
UnpinPageImpl(page_id, is_dirty)
FlushPageImpl(page_id)
DeletePageImpl(page_id)
FlushAllPagesImpl()

1. 分析

1.1 为何须要pin

其实大抵能够以下图。

考虑这样一种状况。一个块被放入缓冲区，进程从缓冲区内存中读取块的内容。可是，当这个块被读取的时候，若是一个并发进程将这个块驱逐出来，并用一个不一样的块替换它，读取旧块内容的进程(reader)将看到不正确的数据;若是块被驱逐时正在写入它，那么写入者最终会破坏替换块的内容。

所以，在进程从缓冲区块读取数据以前，确保该块不会被逐出是很重要的。为此，进程在块上执行一个pin操做;缓冲区管理器从不清除固定的块（pin值不为0的块）。当进程完成读取数据时，它应该执行一个unpin操做，容许在须要时将块取出。

所以咱们须要一个pin_couter来记录pin的数量。其实也就是引用计数的思想。

1.2 如何管理页和访问页

一句话基地址+偏移量

page(基地值)+frame_id(偏移量) 实际上就是数组寻址

这里用了hash表来实现page_table来映射page_id和frame_id

2. 实现

2.1 FetchPageImpl 实现

Page *BufferPoolManager::FetchPageImpl(page_id_t page_id)

这个函数的做用就是咱们要访问一个page。这个函数能够分为三种状况分析

若是该页在缓冲池中直接访问
若是该页不在缓冲池可是缓冲池中有空闲。从disk中取出page而后放入缓冲池以后在访问
若是该页不在缓冲池而且缓冲池也非空闲
- 须要找到一个牺牲页。把它移出（判断脏位来决定是否要写会磁盘）
- 而后和状况2同样。

2.2 UnpinPageImpl 实现

bool BufferPoolManager::UnpinPageImpl(page_id_t page_id, bool is_dirty)

函数定义如上。这里的is_dirty主要是对于两种状况

状况一。对于读操做而言is_dirty=false
状况二。对于写操做而言is_dirty=true

这个函数就是若是咱们这个线程已经完成了对这个页的操做。咱们须要unpin如下

若是这个页的pin_couter>0咱们直接--
若是这个页的pin _couter==0咱们须要给它加到Lru_replacer中。由于没有人引用它。因此它能够成为被替换的候选人

2.3 FlushPageImpl 实现

bool BufferPoolManager::FlushPageImpl(page_id_t page_id)

这个函数是要把一个page写入磁盘。

首先找到这一个页在缓冲池之中的位置
写入磁盘

2.4 NewPageImpl 实现

Page *BufferPoolManager::NewPageImpl(page_id_t *page_id)

分配一个新的page。

若是缓冲池有空闲位置。则直接放进缓冲池
不然的话。若是有页能够被牺牲掉。则牺牲它，把咱们的新页放进去
不然失败

2.5 DeletePageImpl 实现

bool BufferPoolManager::DeletePageImpl(page_id_t page_id)

这里是要咱们把缓冲池中的page移出

若是这个page根本就不在缓冲池则直接返回
若是这个page 的引用计数大于0(pin_counter>0)表示咱们不能返回
若是这个page被修改过则要写会磁盘
不然正常移除就行了。（在hash表中erase）

3. 源码解析

3.1 ResetMemory()

这个很是简单就是一个简单的内存分配。给咱们的frame分配内存区域

3.2 ReadPage

void DiskManager::ReadPage(page_id_t page_id, char *page_data)

void DiskManager::ReadPage(page_id_t page_id, char *page_data) {
  int offset = page_id * PAGE_SIZE; //PAGE_SIZE=4kb 先计算偏移。判断是否越界（由于文件大小有限制）
  // check if read beyond file length
  if (offset > GetFileSize(file_name_)) {
    LOG_DEBUG("I/O error reading past end of file");
    // std::cerr << "I/O error while reading" << std::endl;
  } else {
    // set read cursor to offset
    db_io_.seekp(offset); //把读写位置移动到偏移位置处
    db_io_.read(page_data, PAGE_SIZE); //把数据读到page_data中
    if (db_io_.bad()) {
      LOG_DEBUG("I/O error while reading");
      return;
    }
    // if file ends before reading PAGE_SIZE
    int read_count = db_io_.gcount();
    if (read_count < PAGE_SIZE) {
      LOG_DEBUG("Read less than a page");
      db_io_.clear();
      // std::cerr << "Read less than a page" << std::endl;
      memset(page_data + read_count, 0, PAGE_SIZE - read_count); //若是读取的数据小于4kb剩下的补0
    }
  }
}

3.3 WritePage

void DiskManager::WritePage(page_id_t page_id, const char *page_data) {
  size_t offset = static_cast<size_t>(page_id) * PAGE_SIZE; //先计算偏移
  // set write cursor to offset
  num_writes_ += 1; //记录写的次数
  db_io_.seekp(offset);
  db_io_.write(page_data, PAGE_SIZE); //向offset处写data
  // check for I/O error
  if (db_io_.bad()) {
    LOG_DEBUG("I/O error while writing");
    return;
  }
  // needs to flush to keep disk file in sync
  db_io_.flush(); //刷新缓冲区
}

3.4 DiskManager 构造函数

就是获取文件指针

DiskManager::DiskManager(const std::string &db_file)
    : file_name_(db_file), next_page_id_(0), num_flushes_(0), num_writes_(0), flush_log_(false), flush_log_f_(nullptr) {
  std::string::size_type n = file_name_.rfind('.');
  if (n == std::string::npos) {
    LOG_DEBUG("wrong file format");
    return;
  }
  log_name_ = file_name_.substr(0, n) + ".log";

  log_io_.open(log_name_, std::ios::binary | std::ios::in | std::ios::app | std::ios::out);
  // directory or file does not exist
  if (!log_io_.is_open()) {
    log_io_.clear();
    // create a new file
    log_io_.open(log_name_, std::ios::binary | std::ios::trunc | std::ios::app | std::ios::out);
    log_io_.close();
    // reopen with original mode
    log_io_.open(log_name_, std::ios::binary | std::ios::in | std::ios::app | std::ios::out);
    if (!log_io_.is_open()) {
      throw Exception("can't open dblog file");
    }
  }

  db_io_.open(db_file, std::ios::binary | std::ios::in | std::ios::out); //获取文件指针。而且打开输入输出流
  // directory or file does not exist
  if (!db_io_.is_open()) {
    db_io_.clear();
    // create a new file
    db_io_.open(db_file, std::ios::binary | std::ios::trunc | std::ios::out);
    db_io_.close();
    // reopen with original mode
    db_io_.open(db_file, std::ios::binary | std::ios::in | std::ios::out);
    if (!db_io_.is_open()) {
      throw Exception("can't open db file");
    }
  }
  buffer_used = nullptr;
}

4. 测试

cd build
 make buffer_pool_manager_test
 ./test/buffer_pool_manager_test