为Msql数据库编码UTF8过滤emoj表情

  |   0 评论   |   939 浏览

public class EmojiFilterTest {

    private static final String NON_UTF8_CHAR_PATTERN = "[^\\u0000-\\uD7FF\\uE000-\\uFFFF]";

    @Test
    public void testFilterEmoji() {
        String input = "可以吃猪[表情猪]";
        Pattern filterPattern = Pattern.compile(NON_UTF8_CHAR_PATTERN);
        final String out = input.replaceAll(filterPattern.pattern(), "??");
        System.out.println(out);
        Assert.assertTrue(!out.contains("[表情猪]"));
    }
}

说明: 由于我的博客不支持UTF8MB4,因此这篇文章也无法保存emoj.所以上面的表情猪就是:下图的代替:

image.png

参考:

  1. https://stackoverflow.com/questions/56800767/how-to-find-a-character-which-can-t-be-stored-in-a-mysql-utf8-column-in-java
  2. https://stackoverflow.com/questions/3220031/how-to-filter-or-replace-unicode-characters-that-would-take-more-than-3-bytes
  3. # utf-8的中文是一个字符占几个字节
  4. # 字符编码笔记:ASCII,Unicode 和 UTF-8
  5. 浅谈 UTF-8 编码
  6. # emoji Unicode字符表
  7. unicode -csdn

评论

发表评论


取消