题目连接:https://leetcode.com/problems...web
这道题关键是搞懂题目意思。oop
1 byte: characters from 0 to 127 == ASCII
2 bytes: characters from 127 to 2047
3 bytes: characters from 2048 to 65535
4 bytes: characters from 65536 to 1112064spa
The leading bits tell: the length of the bytescode
1 byte: the 1st bit is 0utf-8
2 bytes:leetcode
1st byte: start with "110"get
2nd byte: start with "10"it
3 bytes:io
1st byte: start with "1110"class
2nd byte: start with "10"
3rd byte: start with "10"
4 bytes:
1st byte: start with "11110"
2nd byte: start with "10"
3rd byte: start with "10"
4th byte: start with "10"
知道意思以后,这道题就很简单了。
一个loop,每次分三步来作,loop invariant是每次data[i]都是first byte of 新的character
统计data[i]后8位里面,从前开始有多少个1,用变量ones来保存,其中ones可能的值只有0, 2, 3, 4
从 data[i+1] 开始检查,后八位中的前两位是否为'10',一共检查ones - 1
更新i的值为 i + ones
public class Solution { public boolean validUtf8(int[] data) { /* 1. check how many '1's = ones * 2. check (i + 1, i + ones - 1) for '10' * 3. update i = i + ones * valid ones: 0, 2, 3, 4 */ int i = 0; while(i < data.length) { // 1. find ones int ones = 0; while(((data[i] >> (7 - ones)) & 1) == 1) { ones++; } // invalid ones if(ones == 1 || ones > 4) return false; // 2. check 1s i++; while(ones-- > 1) { if(i >= data.length || ((data[i] >> 6) & 3) != 2) return false; // 3. update i i++; } } return true; } }
Advantage of UTF-8
implement Unicode: encode different symbols(Chinese...)
web pages are often coded in UTF-8, XML, JSON
only use binary representation: 0 and 1
endianness independent
Disadvantage of UTF-8
space: use more bytes, larger
time: calculate