为Msql数据库编码UTF8过滤emoj表情
public class EmojiFilterTest {
private static final String NON_UTF8_CHAR_PATTERN = "[^\\u0000-\\uD7FF\\uE000-\\uFFFF]";
@Test
public void testFilterEmoji() {
String input = "可以吃猪[表情猪]";
Pattern filterPattern = Pattern.compile(NON_UTF8_CHAR_PATTERN);
final String out = input.replaceAll(filterPattern.pattern(), "??");
System.out.println(out);
Assert.assertTrue(!out.contains("[表情猪]"));
}
}
说明: 由于我的博客不支持UTF8MB4,因此这篇文章也无法保存emoj.所以上面的表情猪就是:下图的代替:
参考:
- https://stackoverflow.com/questions/56800767/how-to-find-a-character-which-can-t-be-stored-in-a-mysql-utf8-column-in-java
- https://stackoverflow.com/questions/3220031/how-to-filter-or-replace-unicode-characters-that-would-take-more-than-3-bytes
- # utf-8的中文是一个字符占几个字节
- # 字符编码笔记:ASCII,Unicode 和 UTF-8
- 浅谈 UTF-8 编码
- # emoji Unicode字符表
- unicode -csdn