如摘要描述的上下文,如今须要解决Emoji输入致使的数据请求Error的问题。数据库
问题缘由:后端
在Unicode编码中,Emoji字符最小2个字节,小部分3个字节,更多4个字节,更丧心病狂的是有些输入法(好比某狗),存在11个字节的Emoji。而咱们后端使用的数据库是MySql,默认使用Utf8编码,而默认的Utf8编码对字符的理解就是三个字节,因此当超过三个字节的Emoji入库时,就会报错执行失败(Incorrect string value '\xx\xx\xx\xx' for column 'field' at row 1),简单来讲就是放不下,这问题其实修改MySql编码为Utf8mb4能够解决,但数据库改编码这种事情,是很敏感的,反正各类缘由下,这问题得客户端解决。服务器
需求分析:网络
咱们的APP是一款汽车工具类APP,所提交的数据都是严谨且有意义的,而Emoji这种数据对咱们来讲能够说是无关紧要。以前的作法是不作过滤不作限制(除格式要求区域外),如今既然遇到了问题,那索性的,将Emoji输入所有过滤掉。函数
方案分析:工具
需求已定,那如今的问题就有两个,一是如何全局过滤UITextField的输入(UITextView之类的咱们没用),二是Emoji表情如何去过滤。这俩问题咱们分开的讨论。测试
问题追究:编码
1、全局拦截UITextField输入atom
方案一:UITextFiled输入的信息最终都走网络接口发往服务器,基于这个,能够在网络层对全部Params作过滤,可保证到达服务器的数据不包含Emoji信息。但仅仅是解决了服务器报错问题,对客户端的体验很糟糕(明明能够输入Emoji,可数据拉取以后却丢失),另外,网络包是个公共的区域,它提供全局甚至多APP服务,改这里带来的测试成本条件不容许,因此方案一否认。代理
方案二:由于这款APP的UI是纯代码编写,咱们能够扩展一个EmojiTextField,在内部代理系统默认的Delegate,对Emoji在输入时作屏蔽,这样能提供最好的输入体验,并且输入的信息和数据库信息一致。可这个方案须要替换全局的UITextField,虽然Xcode能很容易作到这点,但作法略显粗暴,作待选方案。
方案三:Runtime,我一直认为OC的强大至少一半归功于Runtime,它能在运行时修改函数表,而"Category"特性又提供了函数的扩展和覆盖,这让方案三的实现成为可能。
具体实现,导入全局的Category,这里叫UITextField+Emoji,覆盖+load函数(该函数在当前类读入内存时会收到消息),在该函数中经过Runtime替换初始化和Delegate相关函数,在初始化函数中,咱们代理业务代码设置的Delegate,当用户有输入操做时,会触发咱们代理的Delegate,在处理完Emoji校验以后,再路由给业务代码的Delegate。
咱们项目采用.pch作全局h导入,在这里咱们导入UITextField+Emoji的Gategory
/// 过滤Emoji字符 #import "UITextField+EmojiText.h"
在.m中,咱们在+load函数内替换相关函数指针,让原有函数指向咱们实现的IMP
void exchangeMethod(Class class, SEL oSEL, SEL nSEL) { Method oMethod = class_getInstanceMethod(class, oSEL); Method nMethod = class_getInstanceMethod(class, nSEL); // 验证当前实例是否实现originalSEL,避免返回父类SEL BOOL ok = class_addMethod(class, oSEL, method_getImplementation(nMethod), method_getTypeEncoding(nMethod)); if (ok) { class_replaceMethod(class, nSEL, method_getImplementation(oMethod), method_getTypeEncoding(oMethod)); } else { method_exchangeImplementations(oMethod, nMethod); } } + (void) load { // setDelegate,拦截Delegate设置,默认走Emoji过滤 exchangeMethod([self class], @selector(setDelegate:), @selector(emoji_setDelegate:)); // getDelegate,返回业务代码设置的Delegate,确保set和get统一 exchangeMethod([self class], @selector(delegate), @selector(emoji_delegate)); // 几种初始化状况 exchangeMethod([self class], @selector(init), @selector(emoji_init)); exchangeMethod([self class], @selector(initWithFrame:), @selector(emoji_initWithFrame:)); exchangeMethod([self class], @selector(initWithCoder:), @selector(emoji_initWithCoder:)); // 释放内部持有资源 exchangeMethod([self class], @selector(dealloc), @selector(emoji_dealloc)); }
相关替换的函数实现
- (id) emoji_init { id ret = [self emoji_init]; // 由于执行了函数指针替换,setDelegate会走emoji_setDelegate,这里调用setDelegate是为了确保没有设置delegate的业务代码一样过滤Emoji self.delegate = nil; return ret; } - (id) emoji_initWithFrame:(CGRect)frame { id ret = [self emoji_initWithFrame:frame]; self.delegate = nil; return ret; } - (id) emoji_initWithCoder:(NSCoder *)aDecoder { id ret = [self emoji_initWithCoder:aDecoder]; self.delegate = nil; return ret; } - (void) emoji_setDelegate:(id<UITextFieldDelegate>)delegate { // 若是没有设置过delegate,须要设置内部代理的Delegate,不然让替换内部originalDelegate id<UITextFieldDelegate> del = [self emoji_delegate]; if (!del) { EmojiDelegate *emojiDelegate = [[EmojiDelegate alloc] initWithTextField:self]; emojiDelegate.originalDelegate = delegate; [self emoji_setDelegate:emojiDelegate]; } else { EmojiDelegate *emojiDelegate = (EmojiDelegate *) del; emojiDelegate.originalDelegate = delegate; } } - (id<UITextFieldDelegate>) emoji_delegate { return ((EmojiDelegate *)[self emoji_delegate]).originalDelegate; } - (void) emoji_dealloc { // EmojiDelegate默认是retain的,须要手动释放一次资源 [[self emoji_delegate] release]; [self emoji_setDelegate:nil]; [self emoji_dealloc]; }
EmojiDelegate的具体实现,很简单很单纯的一个代理
@interface EmojiDelegate : NSObject<UITextFieldDelegate> @property(nonatomic, weak) UITextField *textField; @property(nonatomic, weak) id<UITextFieldDelegate> originalDelegate; @property(nonatomic, strong) NSString *prevText; // 上次的输入结果 - (id) initWithTextField:(UITextField *)textField; @end @implementation EmojiDelegate - (id) initWithTextField:(UITextField *)textField { self = [super init]; self.textField = textField; [textField addTarget:self action:@selector(textFieldDidChange:) forControlEvents:UIControlEventEditingChanged]; return self; } - (void) dealloc { [_textField removeTarget:self action:@selector(textFieldDidChange:) forControlEvents:UIControlEventEditingChanged]; self.originalDelegate = nil; self.prevText = nil; [super dealloc]; } - (BOOL)textFieldShouldBeginEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldBeginEditing:)]) { return [self.originalDelegate textFieldShouldBeginEditing:textField]; } return YES; } - (void)textFieldDidBeginEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldDidBeginEditing:)]) { return [self.originalDelegate textFieldDidBeginEditing:textField]; } } - (BOOL)textFieldShouldEndEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldEndEditing:)]) { return [self.originalDelegate textFieldShouldEndEditing:textField]; } return YES; } - (void)textFieldDidEndEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldDidEndEditing:)]) { return [self.originalDelegate textFieldDidEndEditing:textField]; } } - (BOOL)textField:(UITextField *)textField shouldChangeCharactersInRange:(NSRange)range replacementString:(NSString *)string { if (string.length == 0) { return YES; } /// 过滤emoji // 忽略系统默认的emoji键盘 if ([[[textField textInputMode] primaryLanguage] isEqualToString:@"emoji"]) { return NO; } // 验证string的emoji字符 if ([string containEmoji]) { return NO; } if ([self.originalDelegate respondsToSelector:@selector(textField:shouldChangeCharactersInRange:replacementString:)]) { return [self.originalDelegate textField:textField shouldChangeCharactersInRange:range replacementString:string]; } return YES; } - (BOOL)textFieldShouldClear:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldClear:)]) { return [self.originalDelegate textFieldShouldClear:textField]; } return NO; } - (BOOL)textFieldShouldReturn:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldReturn:)]) { return [self.originalDelegate textFieldShouldReturn:textField]; } return NO; } /** * 监听UITextField文本变更,规避中文输入法联想输入Emoji问题 */ - (void) textFieldDidChange:(UITextField *)textField { if ([textField markedTextRange] == nil) { NSString *text = textField.text; if ([text containEmoji]) { NSUInteger location = [textField selectedRange].location - 2; textField.text = _prevText; if (location > _prevText.length) { location = _prevText.length; } [textField setSelectedRange:NSMakeRange(location, 0)]; } else { self.prevText = text; } } }
这里重点在containEmoji函数,内部验证输入string是否包含Emoji元素。
至此,全局拦截问题解决,至于具体的Emoji过滤,接下来讨论。
2、字符串过滤Emoji
和大多数人同样,在解决这个问题时,百度谷歌bing一通搜索,找到了不少种解决方案,可实际效果都不尽人意,Emoji这问题比想象中要麻烦的多。
作以前作下简单扫盲,Emoji来源就很少说了,只要知道在某个版本的Unicode编码中加入了Emoji,而且不是放一块的,也就说在Unicode编码中,Emoji的地址没有规律可寻,那只能去硬匹配,可Emoji数量几百上千,这一个个去匹配实在太蠢了,咱得缩小匹配范围。
相信如今你们都用的UTF8编码,这是一种变长编码,提到变长,那确定会有一个描述头,几个内容体,UTF8是同样的。
在一个字节中,若是第一个bit位是0,那么表明当前为单字节字符,0以后的7位bit为数据部分,表明在Unicode中的序号
对应的,若是第一位是1开头,表明是多字节字符,若是第二位是0,表明这个字节是多字节字符的数据字节,跟在头字节后面;若是前面有多个1,则几个1表明该字符有几个字节(包含当前字节),例如:
110xxxxx // 表明有两个字节,后面必定跟着一个10开头的数据字节 >>110xxxxx 10xxxxxx 1110xxxx // 表明有三个字节,后面跟着两个10开头的数据字节 >>1110xxxx 10xxxxxx 10xxxxxx
推理可知,Utf8中一个字符最长7个字节,其中数据位6个字节,其中Emoji在Unicode中分布在二、三、四、4+长度的地址中,其中长度为2的Emoji大部分是文字字符,这些咱们能够放行,四、4+的Emoji可所有过滤,而咱们可见文字基本都分部在3字节地址中,这里重点须要过滤3字节的Emoji(3字节的Emoji已经能够入库了,但为了统一体验,仍是须要过滤掉),幸运的是3字节的Emoji不是不少,硬匹配也算说得过去。
根据从Unicode官网找到的资料,匹配三字节Unicode
- (BOOL) emojiInUnicode:(short)code { if (code == 0x0023 || code == 0x002A || (code >= 0x0030 && code <= 0x0039) || code == 0x00A9 || code == 0x00AE || code == 0x203C || code == 0x2049 || code == 0x2122 || code == 0x2139 || (code >= 0x2194 && code <= 0x2199) || code == 0x21A9 || code == 0x21AA || code == 0x231A || code == 0x231B || code == 0x2328 || code == 0x23CF || (code >= 0x23E9 && code <= 0x23F3) || (code >= 0x23F8 && code <= 0x23FA) || code == 0x24C2 || code == 0x25AA || code == 0x25AB || code == 0x25B6 || code == 0x25C0 || (code >= 0x25FB && code <= 0x25FE) || (code >= 0x2600 && code <= 0x2604) || code == 0x260E || code == 0x2611 || code == 0x2614 || code == 0x2615 || code == 0x2618 || code == 0x261D || code == 0x2620 || code == 0x2622 || code == 0x2623 || code == 0x2626 || code == 0x262A || code == 0x262E || code == 0x262F || (code >= 0x2638 && code <= 0x263A) || (code >= 0x2648 && code <= 0x2653) || code == 0x2660 || code == 0x2663 || code == 0x2665 || code == 0x2666 || code == 0x2668 || code == 0x267B || code == 0x267F || (code >= 0x2692 && code <= 0x2694) || code == 0x2696 || code == 0x2697 || code == 0x2699 || code == 0x269B || code == 0x269C || code == 0x26A0 || code == 0x26A1 || code == 0x26AA || code == 0x26AB || code == 0x26B0 || code == 0x26B1 || code == 0x26BD || code == 0x26BE || code == 0x26C4 || code == 0x26C5 || code == 0x26C8 || code == 0x26CE || code == 0x26CF || code == 0x26D1 || code == 0x26D3 || code == 0x26D4 || code == 0x26E9 || code == 0x26EA || (code >= 0x26F0 && code <= 0x26F5) || (code >= 0x26F7 && code <= 0x26FA) || code == 0x26FD || code == 0x2702 || code == 0x2705 || (code >= 0x2708 && code <= 0x270D) || code == 0x270F || code == 0x2712 || code == 0x2714 || code == 0x2716 || code == 0x271D || code == 0x2721 || code == 0x2728 || code == 0x2733 || code == 0x2734 || code == 0x2744 || code == 0x2747 || code == 0x274C || code == 0x274E || (code >= 0x2753 && code <= 0x2755) || code == 0x2757 || code == 0x2763 || code == 0x2764 || (code >= 0x2795 && code <= 0x2797) || code == 0x27A1 || code == 0x27B0 || code == 0x27BF || code == 0x2934 || code == 0x2935 || (code >= 0x2B05 && code <= 0x2B07) || code == 0x2B1B || code == 0x2B1C || code == 0x2B50 || code == 0x2B55 || code == 0x3030 || code == 0x303D || code == 0x3297 || code == 0x3299 // 第二段 || code == 0x23F0) { return YES; } return NO; }
另外还有很古老的一套Emoji,采用Unicode私有区域,如今基本没用了,不过仍是过滤下
/** * 一种非官方的, 采用私有Unicode 区域 * e0 - e5 01 - 59 */ - (BOOL) emojiInSoftBankUnicode:(short)code { return ((code >> 8) >= 0xE0 && (code >> 8) <= 0xE5 && (Byte)(code & 0xFF) < 0x60); }
另外就是对输入string的过滤,须要过滤掉字节长度为非3的字符,而后校验3字节的unicode编码
- (BOOL) containEmoji { NSUInteger len = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding]; if (len < 3) { // 大于2个字符须要验证Emoji(有些Emoji仅三个字符) return NO; } // 仅考虑字节长度为3的字符,大于此范围的所有作Emoji处理 NSData *data = [self dataUsingEncoding:NSUTF8StringEncoding]; Byte *bts = (Byte *)[data bytes]; Byte bt; short v; for (NSUInteger i = 0; i < len; i++) { bt = bts[i]; if ((bt | 0x7F) == 0x7F) { // 0xxxxxxx ASIIC编码 continue; } if ((bt | 0x1F) == 0xDF) { // 110xxxxx 两个字节的字符 i += 1; continue; } if ((bt | 0x0F) == 0xEF) { // 1110xxxx 三个字节的字符(重点过滤项目) // 计算Unicode下标 v = bt & 0x0F; v = v << 6; v |= bts[i + 1] & 0x3F; v = v << 6; v |= bts[i + 2] & 0x3F; // NSLog(@"%02X%02X", (Byte)(v >> 8), (Byte)(v & 0xFF)); if ([self emojiInSoftBankUnicode:v] || [self emojiInUnicode:v]) { return YES; } i += 2; continue; } if ((bt | 0x3F) == 0xBF) { // 10xxxxxx 10开头,为数据字节,直接过滤 continue; } return YES; // 不是以上状况的字符所有超过三个字节,作Emoji处理 } return NO; }
而后将相关函数封装为NSString (Emoji)
完工。
相关参考资料: