Tomcat 性能调优过程记录系列 - 00 - 背景介绍: 关于图片上传Multipart性能和Thrift流式传输 ^置顶！

2023-01-18/2023-03-16 | 1 评论 | 3,188 浏览

Tomcat 性能调优过程记录系列 - 00 - 关于图片上传Multipart性能和Thrift流式传输

背景

这里是第一篇, 简单交待一下背景: 我们有一个拍照检查的服务(solar-thor). 其大致流程如下:

用户提交拍照检查请求. 使用 multipart/form-data 方式上传图片.
服务端使用Spring的经典代码进行接收图片. 把图片放入到内存数组中.
把图片传输给研究组进行识别.
- 同步阻塞调用
返回结果后.把识别结果存储DB.
把image数组异步存储到阿里云OSS.

服务端接收图片代码

服务端用于接收图片的代码平淡无奇. 99%的Java工程师第一次写此业务需求写出来的代码肯定与此无异.

@PostMapping(value = "check")
    public APIResponse<CheckResult> check(
        @RequestParam(value = "image", required = false) MultipartFile fileIn,
        @RequestParam(value = "otherParam", defaultValue = "0") int otherParam,
        ...){
    ImageFile file = adaptFrom(fileIn);

    RpcCheckParam checkParam = adaptFrom(file);

    // 同步阻塞调用RPC.耗时较长,可以理解为秒级时长. 一个特别耗时的同步阻塞调用.
    resp = algorithmRpcClient.check(checkParam);
    saveResult2DB(resp);
  
    asyncSaveImage2OSS(file);
    return APIResponse.reuturn(resp);
}

// 从Spring的MultiPartFile读取图片字节数组
ImageFile adaptFrom(MultipartFile file) throws IOException {

        if (file == null) {
            return null;
        }

        ImageFile imageFile = new ImageFile();

        imageFile.setName(file.getName());
        imageFile.setOriginalFilename(file.getOriginalFilename());
        imageFile.setContentType(file.getContentType());
        imageFile.setSize(file.getSize());
        imageFile.setBytes(file.getBytes());

        return imageFile;
    }

代码框架的缺陷

这里说几点坑, 但是不会过多展开和说明了.简单说一下

Thrift RPC的缺陷

不支持Java简单的数组.

thrift这个跨语言的RPC框架. 真的是天生难用. 对于Java来说是相当不好用.(容我吐槽一下.) 比如上面的image byte array. 在这个场景下我们就不能直接传递给RPC接口作为参数. 而在thrift中如果要把数组传递给Server端. 那只能转换为List<Byte> , 由于Java的容器又不能够存储原生对象. 因此必须转换为Byte包装类进行存储. 这就导致了服务没办法直接把字节数组传递给RPC作为参数. 需要拷贝一份,且不是简单的拷贝. 而是要用比较重的容器对象来存储Byte进行参数传递. 于是我们需要用到如下的代码:

public static List<Byte> toThrift(byte[] bytes) {
        if (bytes == null) {
            return new ArrayList<Byte>();
        }
        List<Byte> list = new ArrayList<Byte>();
        for (byte b : bytes) {
            list.add(b);
        }
        return list;
    }

如果拷贝一张200k的图片,那需要for循环20万次才能完成拷贝.

测试代码:

@Test
    public void testListCopy() {
        // 申请一个200k的数组 , 模拟一个普通图片的大小
        byte[] source = new byte[1024 * 200];

        for (int i = 0; i < 100000; i++) {
            source = new byte[1024 * 200];
            source[new Random().nextInt(source.length)] = (byte) i;
            long timeStart = System.nanoTime();
            List<Byte> target = ThriftUtils.toThrift(source);
            final long end = System.nanoTime();
            if (i == 0 || i % 10000 == 0 || i< 10) {
                System.out.println("[ " + i + " ]ThriftUtils.toThrift(source): " + (end - timeStart)/1000000.0+ " ms");
            }
            target = null;
        }
    }

这个是-server 参数进行运行下的测试数据

[ 0 ]ThriftUtils.toThrift(source): 1101.437922 ms
[ 1 ]ThriftUtils.toThrift(source): 2.928576 ms
[ 2 ]ThriftUtils.toThrift(source): 3.262643 ms
[ 3 ]ThriftUtils.toThrift(source): 3.165539 ms
[ 4 ]ThriftUtils.toThrift(source): 0.873871 ms
[ 5 ]ThriftUtils.toThrift(source): 0.829455 ms
[ 6 ]ThriftUtils.toThrift(source): 0.838755 ms
[ 7 ]ThriftUtils.toThrift(source): 0.839013 ms
[ 8 ]ThriftUtils.toThrift(source): 0.838507 ms
[ 9 ]ThriftUtils.toThrift(source): 0.837551 ms
[ 10000 ]ThriftUtils.toThrift(source): 0.774455 ms
[ 20000 ]ThriftUtils.toThrift(source): 0.899857 ms
[ 30000 ]ThriftUtils.toThrift(source): 1.623953 ms
[ 40000 ]ThriftUtils.toThrift(source): 0.770835 ms
[ 50000 ]ThriftUtils.toThrift(source): 0.839494 ms
[ 60000 ]ThriftUtils.toThrift(source): 0.762577 ms
[ 70000 ]ThriftUtils.toThrift(source): 0.803329 ms
[ 80000 ]ThriftUtils.toThrift(source): 0.781306 ms
[ 90000 ]ThriftUtils.toThrift(source): 1.00646 ms

如果支持数组这个拷贝就可以用System.arrayCopy来完成. 或者不用拷贝,直接用不香么.

@Test
    public void testSystemCopy() {
        // 申请一个200k的数组 , 模拟一个普通图片的大小
        byte[] source = new byte[1024 * 200];

        for (int i = 0; i < 100000; i++) {

            // 每次初始化一次源数组
            source = new byte[1024 * 200];
            // 随机修改一个字节
            source[new Random().nextInt(source.length)] = (byte) i;

            long timeStart = System.nanoTime();
            byte[] targetArray = new byte[source.length];
            System.arraycopy(source, 0, targetArray, 0, source.length);
            final long end = System.nanoTime();
            if (i == 0 || i % 10000 == 0 || i< 10) {
                System.out.println("[ " + i + " ]System.arraycopy(source): " + (end - timeStart)/1000000.0+ " ms");
            }
        }
    }

下面是测试得到的数据, 大概0.02ms就可以完成拷贝. 但是由于Thrift本身不支持数组. 只能别扭的把数组转换为List<Byte>. 拷贝效率差了40倍左右吧.

[ 0 ]System.arraycopy(source): 0.123009 ms
[ 1 ]System.arraycopy(source): 0.243032 ms
[ 2 ]System.arraycopy(source): 0.215267 ms
[ 3 ]System.arraycopy(source): 0.213393 ms
[ 4 ]System.arraycopy(source): 0.216643 ms
[ 5 ]System.arraycopy(source): 0.309075 ms
[ 6 ]System.arraycopy(source): 0.425692 ms
[ 7 ]System.arraycopy(source): 0.200957 ms
[ 8 ]System.arraycopy(source): 0.242191 ms
[ 9 ]System.arraycopy(source): 0.700115 ms
[ 10000 ]System.arraycopy(source): 0.034585 ms
[ 20000 ]System.arraycopy(source): 0.024358 ms
[ 30000 ]System.arraycopy(source): 0.025498 ms
[ 40000 ]System.arraycopy(source): 0.025441 ms
[ 50000 ]System.arraycopy(source): 0.04112 ms
[ 60000 ]System.arraycopy(source): 0.024462 ms
[ 70000 ]System.arraycopy(source): 0.025688 ms
[ 80000 ]System.arraycopy(source): 0.02046 ms
[ 90000 ]System.arraycopy(source): 0.020455 ms

不支持流

Thrift的一个不支持流的官方Issue: Add a stream type - issues.apache.org/jira/b...

thrift本身也不支持流传输. 这个似乎很多RPC都不支持此功能. 但是如果可以的话, 我们使用流式传输会很好的减轻solar-thor服务端的压力. 不用把积压到的数据一次性的攒够再发给算法组. 如果能够支持流, 可以在HTTP层接收到数据后就可以慢慢的把数据传输给RPC了. 这样我们可以源源不断的把数据传输给需要的服务. solar-thor就只做一个有类似于管道的服务即可. 负载可以大大的降低.

‍

上面在Thrift支持流后, 想要直接把流串起来进行转发. 直接使用上面的 multipart/form-data 的方式实际是不可以的. 这个是后话, 后面不再说. 简单说是因为Severt的一些规范和multipart类型的数据的具体处理实现方式.导致我们无法再以流的方式直接接收HTTP的输入流.

‍

Spring form/Multipart的缺陷

不支持流‍

使用了此接口来接收文件的话. 实际上也是没有办法进行流式传输的. 具体的分析后面的文章中再详细说明. 这里先简单记住这个结论.

以下内容引用自tomcat-performance-with-spring-boot-api-for-file-upload - stackoverflow

There is no other way to do it with multipart. The problem with multipart that to properly segement parts from the requst they need sometimes skipped or be repeatable. That is impossible within memory w/o having memory to explode. Therefore, Commons FileUpload caches them on disk after a certain threshold is reached. Multipart requests are the worst way for that. I highly recommend to use either PUT or POST with content type application/octet-stream. You can take the bare request input stream and pass to HttpClient to stream to your backend server. I did this already 5 years ago and it works for gigabytes. I have posted the solution in the Apache HttpClient mailing list.

There is one possibility how this could work under specific conditions:

All parts are in the correct physical order you want to read
Your write to a backend is fast enough to sustain the read from the front

Consume the root part and then go over to the next physical one, process the request body lazily. JAX-WS RI (Metro) has a very nice handling of multipart requests for XOP/MTOM. Learn from that because you won't be able to make it any better.

这里说明了几个事情.

一个是form/multipart形式的数据接收在处理时. 会进行内存缓存. 如果文件过大的话. 还会对文件进行缓存.(以临时文件的形式存在于Java的临时文件目录)
Spring似乎是在没有完全接收到数据的时候, 不会触发调用Controller层代码. (这个先说结论, 看起来的确是这样的. 一般情况下不会有问题.但是在网络环境比较恶劣的高并发场景下时, 会存在大量请求阻塞在处理之前. 这也是这个优化系列要查明并解决的核心问题)

现在的问题

上面介绍了服务的基本信息. 在一次服务在线压测中, 发现我们的服务无法达到我们预期的QPS容量. 并且, 服务的资源并没有完全打满. CPU 以及内存资源都还有余量. 但是QPS就是上不去. 这也是这个系列要解决的问题: 找到原因, 解决掉它!

‍

总结:

以上是我们服务的现状, 后面就是我们怎样做性能压测及优化的过程记录. 有可能后面我们会解决这个问题. 也有可能直接无疾而终. 不管怎样,希望此系列文章对你有所帮助. 目前一些我们已知的结论:

thrift为了不同语言的最大兼容性,居然不支持字节数组. 只能使用List的强类型的容器.
List 中只能保存Java的包装类型. 在进行拷贝时效率十分低下. 却没有其它办法可以优化.
Thrift 不支持流传输. 这个是一个缺憾. 我们只能在数据准备好的时候.一次性的把数据传输给Server.
使用了Spring的 form/multipart 方式,不支持流式处理上传数据.

‍

any 2023-08-10 15:57 回复»

hello world

RBA的技术分享