CRC

昨天同事作redis技术分享, 讲到了使用CRC算法用于辅助定位key在redis分片中的哈希槽值。如今给本身扫盲下CRC算法, 直接从Wikipedia拷贝其产生的背景及定义,翻译下。html

Cyclic redundancy check(循环冗余校验)
From Wikipedia, the free encyclopedia

------------


/* 介绍该算法目的 */
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. 
循环冗余校验是一个错误校验码,一般在网络和存储设备中 监测对原数据的意外修改(根据错误校验码断定原始数据是否被破坏)

------------


Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. 
进入系统的数据块会被附加一个简短的校验值,这个校验值怎么获得的呢?——> 基于数据块中的多项式除法的余数

------------


On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for error correction (see bitfilters).[1]
在检索原始数据时, 会基于多项式除法重复计算, 在检查值不匹配的状况下, 能够对损坏的数据纠正。CRCs除了检测原始数据是否被破坏,还能够用于纠错。

------------


CRCs are so called because the check (data verification) value is a redundancy (it expands the message without adding information) and the algorithm is based on cyclic codes. CRCs are popular because they are simple to implement in binary hardware, easy to analyze mathematically, and particularly good at detecting common errors caused by noise in transmission channels. Because the check value has a fixed length, the function that generates it is occasionally used as a hash function.
之因此称之为CRCs,在于数据校验码是原始数据块的冗余值(它无需增长信息,只扩展了原始数据块)而且冗余值的算法基于循环代码。CRCs算法容易被接受,为啥呢 ——> 1. 该算法在二进制硬件中实现起来简单,易于数学分析;2. 尤为善于校验 由噪音(啥噪音呢——> 消息通道之间传输信息而产生的噪音)引发的常见错误。通常来讲, 校验码都有个固定的长度,生成校验码的函数有时候也用于哈希函数, 好比redis 肯定key 所在的哈希槽位置。

------------


The CRC was invented by W. Wesley Peterson in 1961; the 32-bit CRC function of Ethernet and many other standards is the work of several researchers and was published in 1975.

------------


/* CRC的自我介绍*/
Introduction
CRCs are based on the theory of cyclic error-correcting codes. The use of systematic cyclic codes, which encode messages by adding a fixed-length check value, for the purpose of error detection in communication networks, was first proposed by W. Wesley Peterson in 1961.[2] Cyclic codes are not only simple to implement but have the benefit of being particularly well suited for the detection of burst errors: contiguous sequences of erroneous data symbols in messages. This is important because burst errors are common transmission errors in many communication channels, including magnetic and optical storage devices. Typically an n-bit CRC applied to a data block of arbitrary length will detect any single error burst not longer than n bits and the fraction of all longer error bursts that it will detect is (1 − 2^−n).
CRCs算法的产生是基于循环错误纠正码的思想。系统循环码经过在原始数据块(消息)中增长一段固定长度的校验值以达到在网络通讯中检测错误的目的。循环码不只易于实现并且也很适合对突发错误的监测,啥是突发错误呢 ——> 原始数据块中的一段连续的错误数据标识。检测到这种突发错误很重要,由于在不少通讯通道中(也包含磁性和光存储设备通讯)突发错误就是一般的传输错误。一般一个做用于任意长度的数据块的n-bit的CRC算法会检测到任何单个不会长过n-bit的突发错误,而且该算法校验的全部比n-bit更长的错误突发几率是 1-2^-n

------------


Specification of a CRC code requires definition of a so-called generator polynomial. This polynomial becomes the divisor in a polynomial long division, which takes the message as the dividend and in which the quotient is discarded and the remainder becomes the result. The important caveat is that the polynomial coefficients are calculated according to the arithmetic of a finite field, so the addition operation can always be performed bitwise-parallel (there is no carry between digits).
循环冗余码的规范须要一个叫作生成器多项式的定义。这个多项式成为多项式长除法中的除数, 多项式长除法会把消息做为被除数而且获得的商会被丢弃余数是结果(<消息或者叫作数据块>对<定义的生成器多项式>取余获得的结果就是冗余校验码)。值得注意的是多项系数会按照有限域算法,所以加法操做老是被按位并行执行(位之间无进位)。

------------


In practice, all commonly used CRCs employ the Galois field of two elements, GF(2). The two elements are usually called 0 and 1, comfortably matching computer architecture.
在实践中, 全部经常使用的 CRCs 使用两个元素的伽罗瓦场, GF (2)。这两个元素一般被称为0和 1, 恰好匹配计算机体系结构。

------------


A CRC is called an n-bit CRC when its check value is n bits long. For a given n, multiple CRCs are possible, each with a different polynomial. Such a polynomial has highest degree n, which means it has n + 1 terms. In other words, the polynomial has a length of n + 1; its encoding requires n + 1 bits. Note that most polynomial specifications either drop the MSB or LSB, since they are always 1. The CRC and associated polynomial typically have a name of the form CRC-n-XXX as in the table below.
当校验值是n位长度时,CRC会被称做n位循环冗余校验。对于给定的n(位)可能组合多个CRCs, 每一个CRC都对应一个惟一的多项式。一个最高位为n的多项式表示的是它有n+1项。换句话说, 多项式的长度为n+1; 他的编码须要n+1位。注意大多数的多项式规范会丢弃MSB(最高有效位) 或者 LSB(最低有效位)由于他们最高位或者最低位一直是1。CRC和其关联的多项式一般会有一个诸如这种命名: CRC-n-XXX

------------


The simplest error-detection system, the parity bit, is in fact a 1-bit CRC: it uses the generator polynomial x + 1 (two terms), and has the name CRC-1.
最简单的错误校验系统, 奇偶校验位其实是1位CRC:它使用的生成多项式 x + 1(2项),将它命名为CRC-1。

------------

/* CRC的应用 */
Application
A CRC-enabled device calculates a short, fixed-length binary sequence, known as the check value or CRC, for each block of data to be sent or stored and appends it to the data, forming a codeword.
具备循环冗余校验的设备会计算出一段固定长度的二进制序列,称这个二进制序列为校验值或者CRC, 遍历每一个要被发送或者存储的数据块,将计算出的校验值附加到每一个数据块中,造成一个码字(代号)——> (block of data) + (check value) = codeword

------------


When a codeword is received or read, the device either compares its check value with one freshly calculated from the data block, or equivalently, performs a CRC on the whole codeword and compares the resulting check value with an expected residue constant.
当码字被接受或者读取时, 设备会比较码字中的校验码和新计算出来的校验码,或者使用其余对等的对比方式:对整个码字执行CRC算法,将其与一个指望的常数比较。

------------


If the CRC values do not match, then the block contains a data error.
若是CRC校验值不匹配,那么证实该数据块包含数据错误。

------------


The device may take corrective action, such as rereading the block or requesting that it be sent again. Otherwise, the data is assumed to be error-free (though, with some small probability, it may contain undetected errors; this is inherent in the nature of error-checking).[3]
当与校验值匹配失败时,设备可能会纠正错误,好比从新读取数据块或者请求发送方从新发送数据块。若是匹配成功,被传过来的数据块被认为没有错误。

------------

/* 数据整合?! */
Data integrity
CRCs are specifically designed to protect against common types of errors on communication channels, where they can provide quick and reasonable assurance of the integrity of messages delivered. However, they are not suitable for protecting against intentional alteration of data.
循环冗余校验通常用于防止在通讯过程当中普通类型的错误,在网络通讯场景中循环冗余校验能够为数据块整合提供快速合理的保证。但是,循环冗余校验不能保证故意篡改数据。啥意思呢 ——> 只能最大限度保证数据的正确性, 而不能保证数据安全性。

------------


Firstly, as there is no authentication, an attacker can edit a message and recompute the CRC without the substitution being detected. When stored alongside the data, CRCs and cryptographic hash functions by themselves do not protect against intentional modification of data. Any application that requires protection against such attacks must use cryptographic authentication mechanisms, such as message authentication codes or digital signatures (which are commonly based on cryptographic hash functions).
首先,因为没有安全验证, 黑客能够编辑该消息而且从新计算CRC(校验值),并不会监测到数据块已经被修改。当校验值被存储在数据块中时,CRCs和用于加密的哈希函数本身并不会防止数据被恶意篡改。任何须要防止数据被恶意篡改的应用必须使用加密验证机制, 例如消息认证码或者数字签名(数字签名一般基于机密的哈希函数)。

------------


Secondly, unlike cryptographic hash functions, CRC is an easily reversible function, which makes it unsuitable for use in digital signatures.[4]
第二,不像用于加密的哈希函数,循环冗余校验是一个很容易可逆的函数,这会使得它不适用于数字签名。

------------


Thirdly, CRC is a linear function with a property that
![CRC](https://oscimg.oschina.net/oscnet/e2a8377c003a01f7649c6f6406a7726cf90.jpg "CRC")
as a result, even if the CRC is encrypted with a stream cipher that uses XOR as its combining operation (or mode of block cipher which effectively turns it into a stream cipher, such as OFB or CFB), both the message and the associated CRC can be manipulated without knowledge of the encryption key; this was one of the well-known design flaws of the Wired Equivalent Privacy (WEP) protocol.[5]
第三,CRC是一个具备一个特定属性的线性函数。
所以,即便CRC被加密,消息和与其关联的CRC也能够在不知道加密key的状况下备操做;这是有线对等隐私 (WEP) 协议的良好设计缺陷之一。
相关文章
相关标签/搜索