参赛日记day17-tidb性能竞赛-tikv/pd#2950

changelog

2020/10/27 day17 笔记转移到语雀了，用来单纯作技术笔记不错，左边是笔记，右边是流程图
2020/10/26 day16 继续翻译yugabytedb，发现若是任务分散到天天而不是想一天全翻译完，天天的理解会前一天更多一些
2020/10/25 day14 翻译完yugabytedb
2020/10/24 day13 中断
2020/10/23 day12 中断
2020/10/22 day11 熟悉了下redis-py的使用，以前没接触过redis
2020/10/21 day10 酱油
2020/10/20 day9晚上应该用来翻译yugabytedb的Colocated tables的，结果想把整章翻译完，最后拖来拖去一个字没翻译，因此仍是先翻译Colocated tables这个小结比较靠谱，也不要急着运行代码。Colocated tables看完了应该就能够依葫芦画瓢写文档了。
2020/10/19 day8用力过猛，熬夜把zoom讲解看完了，笔记流程图放在前面笔记里了。画流程图仍是有点效果的。

背景补充

翻译Colocated tables | YugabyteDB Docs

Colocated tables 共用表/同地办公/主机表？linux

In workloads that need lower throughput and have a small data set, the bottleneck shifts from CPU/disk/network to the number of tablets that should be hosted per node. Since each table by default requires at least one tablet per node, a YugabyteDB cluster with 5000 relations (which includes tables and indexes) will result in 5000 tablets per node. There are practical limitations to the number of tablets that YugabyteDB can handle per node since each tablet adds some CPU, disk, and network overhead. If most or all of the tables in YugabyteDB cluster are small tables, then having separate tablets for each table unnecessarily adds pressure on CPU, network and disk.ios

在须要较低吞吐量且数据集较小的工做负载中，瓶颈从CPU/磁盘/网络转移到每一个节点应该托管的分片tablet数量上。因为每一个表默认须要每一个节点至少存储一个片子tablet，因此一个拥有5000个关系（包括表和索引）的YugabyteDB集群将致使每一个节点须要5000个分片tablet。YugabyteDB每一个节点能够处理的平板电脑数量是有实际限制的，由于每增长一个分片tablet电脑都须要增长一些CPU、磁盘和网络开销。若是YugabyteDB集群中的大部分或所有表都是小表，那么为每张表单独设置分片tablet就会没必要要地增长CPU、网络和磁盘的压力。git

为何说致使每一个节点都须要5000个tablet的，每一个节点不须要包含5000个tablet？好比有5个节点，第1个节点存15000，其余三个节点用来备份，这样不就须要每一个节点都包含5000个tablet了？github

To help accommodate such relational tables and workloads, YugabyteDB supports colocating SQL tables. Colocating tables puts all of their data into a single tablet, called the colocation tablet. This can dramatically increase the number of relations (tables, indexes, etc) that can be supported per node while keeping the number of tablets per node low. Note that all the data in the colocation tablet is still replicated across three nodes (or whatever the replication factor is). Large tablets can be dynamically split at a future date if there is need to serve more throughput over a larger data set.redis

为了帮助适应这样的关系表和工做负载，YugabyteDB支持SQL表的主机托管。共用Colocating表将它们的全部数据放到一个单一的分片tablet中，称为共用分片colocation tablet。这能够极大地增长每一个节点能够支持的关系（表、索引等）的数量，同时保持每一个节点的tablet数量较少。须要注意的是，colocation tablet中的全部数据仍然要在三个节点上进行复制（或者无论复制系数是多少）。若是须要在更大的数据集上提供更多的吞吐量，能够在将来的某一天动态拆分大型分片。sql

也就是把多个小表放到一个tablet中了。数据库

Motivation This feature is desirable in a number of scenarios, some of which are described below.api

Small datasets needing HA or geo-distribution Applications that have a smaller dataset may fall into the following pattern:网络

They require large number of tables, indexes and other relations created in a single database. The size of the entire dataset is small. Typically, this entire database is less than 500 GB in size. Need high availability and/or geographic data distribution. Scaling the dataset or the number of IOPS is not an immediate concern. In this scenario, it is undesirable to have the small dataset spread across multiple nodes because this might affect performance of certain queries due to more network hops (for example, joins).

Example: User identity service for a global application. The user dataset size may not be too large, but is accessed in a relational manner, requires high availability and might need to be geo-distributed for low latency access.

动机这个功能在一些场景中是可取的，其中一些场景描述以下。

须要HA或地理分布的小型数据集 拥有较小数据集的应用程序可能属于如下模式。

它们须要在一个数据库中建立大量的表、索引和其余关系。整个数据集的大小很小。一般状况下，这个整个数据库的大小小于500GB。须要高可用性和/或地理数据分布。缩放数据集或IOPS的数量不是一个直接的问题。在这种状况下，将小数据集分布在多个节点上是不可取的，由于这可能会因为更多的网络跳数（例如，链接）而影响某些查询的性能。

**举个例子：**全局应用的用户身份服务。用户数据集规模可能不会太大，但以关系方式访问，须要高可用性，可能须要地理分布以实现低延迟访问。

有点相似CDN了。

Large datasets - a few large tables with many small tables Applications that have a large dataset may fall into the pattern where:

They need a large number of tables and indexes.
A handful of tables are expected to grow large, needing to be scaled out.
The rest of the tables will continue to remain small.

In this scenario, only the few large tables would need to be sharded and scaled out. All other tables would benefit from colocation because queries involving all tables, except the larger ones, would not need network hops.

Example: An IoT use case, where one table records the data from the IoT devices while there are a number of other tables that store data pertaining to user identity, device profiles, privacy, etc.

拥有大型数据集的应用程序可能属于这样的模式

他们须要大量的表和索引。少数表预计会变大，须要缩减规模。其他的表将继续保持小规模。

在这种状况下，只有少数大型表须要被分片和缩减。全部其余表都将从主机代管中受益，由于除了大表以外，涉及全部表的查询都不须要网络跳转。

**例子：**一个物联网用例，其中一个表记录来自物联网设备的数据，而其余一些表则存储与用户身份、设备配置文件、隐私等相关的数据。

Scaling the number of databases, each database with a small dataset There may be scenarios where the number of databases grows rapidly, while the dataset of each database is small. This is characteristic of a microservices-oriented architecture, where each microservice needs its own database. These microservices are hosted in dev, test, staging, production and other environments. The net result is a lot of small databases, and the need to be able to scale the number of databases hosted. Colocated tables allow for the entire dataset in each database to be hosted in one tablet, enabling scalability of the number of databases in a cluster by simply adding more nodes.

Example: Multi-tenant SaaS services where one database is created per customer. As new customers are rapidly on-boarded, it becomes necessary to add more databases quickly while maintaining high-availability and fault-tolerance of each database.

缩放数据库的数量，每一个数据库都有一个小的数据集 可能会有这样的场景：数据库的数量快速增加，而每一个数据库的数据集却很小。这是面向微服务架构的特色，每一个微服务都须要本身的数据库。这些微服务被托管在开发、测试、暂存、生产和其余环境中。净结果是有不少小数据库，而且须要可以扩展托管的数据库数量。Colocated表容许将每一个数据库中的整个数据集托管在一个tabalet中，经过简单地添加更多的节点，实现集群中数据库数量的可扩展性。

**例子：**多租户SaaS服务，每一个客户建立一个数据库。随着新客户的快速加入，就须要快速增长更多的数据库，同时保持每一个数据库的高可用性和容错性。

Tradeoffs Fundamentally, colocated tables have the following tradeoffs:

Higher performance - no network reads for joins. All of the data across the various colocated tables is local, which means joins no longer have to read data over the network. This improves the speed of joins. Support higher number of tables - using fewer tablets. Because multiple tables and indexes can share one underlying tablet, a much higher number of tables can be supported using colocated tables. Lower scalability - until removal from colocation tablet. The assumptions behind tables that are colocated is that their data need not be automatically sharded and distributed across nodes. If it is known a priori that a table will get large, it can be opted out of the colocation tablet at creation time. If a table already present in the colocation tablet gets too large, it can dynamically be removed from the colocation tablet to enable splitting it into multiple tablets, allowing it to scale across nodes.

权衡利弊 从根本上讲，主机表 colocated tables 有如下权衡：

**更高的性能--无需经过网络读取联接数据。**各个主机表 colocated tables 的全部数据都是本地的，这意味着join再也不须要经过网络读取数据。这提升了联接的速度。
**支持更多的表--使用更少的片。**因为多个表和索引能够共享一个底层tablet，所以使用colocated表能够支持更多数量的表。
**较低的可扩展性--直到从主机托管tablet移除。**同地办公的表背后的假设是，它们的数据不须要自动分片并分布在各个节点上。若是事先知道某个表会变得很大，那么能够在建立时将其从主机托管tablet中选择出来。若是已经存在于主机托管tablet中的表变得过大，能够动态地从主机托管tablet中移除，以实现将其分割成多个tablet，使其可以跨节点扩展。

Usage To learn more about using this feature, see Explore colocated tables.

使用方法要了解有关使用此功能的更多信息，请参见探索colocated tables。

What's next? For more information, see the architecture for colocated tables.

下一步是什么？有关更多信息，请参见 colocated tables 架构。

翻译Explore colocated tables on Linux | YugabyteDB Docs

In workloads that do very little IOPS and have a small data set, the bottleneck shifts from CPU/disk/network to the number of tablets one can host per node. Since each table by default requires at least one tablet per node, a YugabyteDB cluster with 5000 relations (tables, indexes) will result in 5000 tablets per node.There are practical limitations to the number of tablets that YugabyteDB can handle per node since each tablet adds some CPU, disk and network overhead. If most or all of the tables in YugabyteDB cluster are small tables, then having separate tablets for each table unnecessarily adds pressure on CPU, network and disk.

To help accommodate such relational tables and workloads, you can colocate SQL tables. Colocating tables puts all of their data into a single tablet, called the colocation tablet. This can dramatically increase the number of relations (tables, indexes, etc.) that can be supported per node while keeping the number of tablets per node low. Note that all the data in the colocation tablet is still replicated across three nodes (or whatever the replication factor is).

This tutorial uses the yb-ctl local cluster management utility.

在工做负载中，若是IOPS不多，数据量很小，瓶颈就会从CPU/磁盘/网络转移到每一个节点能够承载的分片tablet数量上。因为每一个表默认须要每一个节点至少有一个tablet，一个有5000个关系（表、索引）的YugabyteDB集群将致使每一个节点有5000个tablet.YugabyteDB每一个节点能够处理的tablet数量是有实际限制的，由于每一个平板电脑都会增长一些CPU、磁盘和网络开销。若是YugabyteDB集群中的大部分或所有表都是小表，那么为每一个表单独设置tablet就会没必要要地增长CPU、网络和磁盘的压力。

为了帮助适应这样的关系表和工做负载，你能够将SQL表放在一块儿。协同表将它们的全部数据放到一个单一的tablet中，称为colocation tablet。这能够极大地增长每一个节点能够支持的关系（表、索引等）数量，同时保持每一个节点的tablet较低。须要注意的是，colocation tablet中的全部数据仍然会在三个节点上进行复制（或者无论复制系数是多少）。

本教程使用yb-ctl本地集群管理实用程序。

建立一个领域 Create a universe

./bin/yb-ctl create # 这个是建立一个灵越

建立一个colocated database，为何不是建立一个colocated tablet

链接到集群使用ysqlsh，这是干吗的？

./bin/ysqlsh -h 127.0.0.1

建立一个数据库使用colocated = true这个选项，也就是在SQL里面加这么一句 WITH colocated = true; yugabyte=# CREATE DATABASE northwind WITH colocated = true; 这将建立一个数据库northwind，它的全部表都在一个tablet里面

建立表tables

链接到northwind数据库，使用标准的CREATE TABLE命令建立表。因为数据库是在colocated = true选项下建立的，因此这些表将被集中在一个tablet上。

\c northwind #这个应该是进入表的意思吧
CREATE TABLE customers (
    customer_id bpchar,
    company_name character varing(40) NOT NULL, # character是干吗用的
    contact_title character varying(30),
  PRIMARY KEY(customer_id ASC) # ASC是干吗的
);
CREATE TABLE categories (
    category_id smallint, # 咱们知道每一个对象都由不少属性，表示存多个相同对象的东西
    category_name character varying(15) NOT NULL, # 
    description text;
  PRIMARY KEY(category_id ASC) # ASC 是什么？
);
#又建立了一个表，这些表都是一个业务涉及的多对象
CREATE TABLE suppliers (
    supplier_id smallint,
    company_name character varying(40) NOT NULL,
    contact_name character varying(30),
    contact_title character varying(30),
  PRIMARY KEY(supplier_id ASC)
);
#商品，这个对象会同时和多个对象打交道
CREATE TABLE products (
    product_id smallint,
    product_name character varying(40) NOT NULL,
    supplier_id smallint,
    category_id smallint,
    quantity_per_unit character varying(20),
    unit_price real,
  PRIMARY KEY(product_id ASC),
  FOREIGN KEY (category_id) REFERENCES categories,
  FOREIGN KEY (supplier_id) REFERENCES suppliers
);

若是你在主界面中进入表格视图，你会看到全部的表格都有相同的。

选择退出同地办公表 Opt out table from colocation，这个和反亲和特别像啊，看来仍是要把pd里面的rule placement代码阅读下

YugabyteDB能够灵活地选择一个表退出colocation托管。在这种状况下，表将使用本身的一组tablet，而不是使用与colocated database相同的tablets。这对于扩展可能很大的表颇有用。您能够在建立表时使用colocated = false选项来实现这一点。

CREATE TABLE orders (
    order_id smallint NOT NULL PRIMARY KEY,
    customer_id bpchar,
    order_date date,
    ship_address character varying(60),
    ship_city character varying(15),
    ship_postal_code character varying(10),
    FOREIGN KEY (customer_id) REFERENCES customers
) WITH (colocated = false);

若是你进入主界面的表格视图，你会看到订单表有本身的一套tablet。

翻译yugabyte-db/ysql-colocated-tables.md at master · yugabyte/yugabyte-db

读写colocated表格中的数据

你可使用标准的 YSQL DML 语句来读取和写入colocated表中的数据。YSQL的查询规划器和执行器将处理将数据路由到正确的平板。

下一步是什么？有关更多信息，请参见colocated表的架构。

概念

键值对记录

能够表示为 (row:string,column:string,time:int64) -> string

ACID

是指数据库管理系统（DBMS）在寫入或更新資料的過程中，為保證事务（transaction）是正確可靠的，所必須具備的四个特性：原子性（atomicity，或稱不可分割性）、一致性（consistency）、隔离性（isolation，又称独立性）、持久性（durability）。

tablet

region

tidb源码注释

为了方便后面的源码阅读，这里把竞赛的版本提取成分支，而后再放到gitee方便些注释。

community/high-performance-tidb-challenge-cn.md at master · pingcap/community

一共3个仓库，先克隆到本身的仓库，若是是用gitee也能够不用克隆，而后切换分支，建立新分支。

tidb:1bfeff96c7439ed672f8362cf67573666a43f781
tikv:dcd2f8f4076d847151fdf58e9c0ba333f242d374
pd:c05ef6f95773941db5c1060174f5a62e8f864e88

git clone https://github.com/eatcosmos/tidb.git && cd ~/git/tidb
git reset --hard 1bfeff96c7439ed672f8362cf67573666a43f781 && git checkout -b 1bfeff-dev && git push --set-upstream origin 1bfeff-dev
git reset --hard 1bfeff96c7439ed672f8362cf67573666a43f781 && git checkout -b 1bfeff-comment && git push --set-upstream origin 1bfeff-comment

git clone https://github.com/eatcosmos/tikv.git && cd ~/git/tikv
git reset --hard dcd2f8f4076d847151fdf58e9c0ba333f242d374 && git checkout -b dcd2f8-dev && git push --set-upstream origin dcd2f8-dev
git reset --hard dcd2f8f4076d847151fdf58e9c0ba333f242d374 && git checkout -b dcd2f8-comment && git push --set-upstream origin dcd2f8-comment

git clone https://github.com/eatcosmos/pd.git && cd ~/git/pd
git reset --hard c05ef6f95773941db5c1060174f5a62e8f864e88 && git checkout -b c05ef6-dev && git push --set-upstream origin c05ef6-dev
git reset --hard c05ef6f95773941db5c1060174f5a62e8f864e88 && git checkout -b c05ef6-comment && git push --set-upstream origin c05ef6-comment

#开发版 github git clone --single-branch --branch 1bfeff-dev https://github.com/eatcosmos/tidb.git git clone --single-branch --branch dcd2f8-dev https://github.com/eatcosmos/tikv.git git clone --single-branch --branch c05ef6-dev https://github.com/eatcosmos/pd.git

#注释版 gitee https://gitee.com/eatcosmos/tidb/tree/1bfeff-comment/ https://gitee.com/eatcosmos/tikv/tree/dcd2f8-comment/ https://gitee.com/eatcosmos/pd/tree/c05ef6-comment/

学习方法

发现对比学习效果最好，原本对tikv的结构比较模糊，看来看去也不是很肯定，可是看了和他相似的yugabytedb，经过微小的差别对比达到加深理解的目的。
须要一气呵成反复看，避免时间被消息打散。中断的几天基本是由于，每次看到中途就去找其余资料，其实不必，不懂的就跳过去，不要去搜其余资料。等看完了，再集中去补充。
若是有交流的渠道最好，能够把疑问和想法放上去，没有就本身先记录下来。