Optimize your Data Flow

To maximize throughput, monitor the size of the queues across your dataflow. If a large queue builds up before a processor, and the queues for the subsequent processors are empty, the processor is causing a bottleneck. This isn't unusual because some tasks are expected to take longer. To improve throughput you can configure a processor to process multiple FlowFiles concurrently.

Sometimes, allowing a processor to use more resources just moves the bottleneck to a processor further down the pipeline. You might need to repeat this process several times and monitor the queue sizes until the CPU threads are distributed correctly.