Data input to multiple processing pipeline instances needs to be synchronized that the pipeline knows which instance should process one specific input.
If the input frequency is very high and the processing time is very fast, the synchronization overhead increased in the past with the number of instances. E.g. 10 pipeline instances all had to ask whether they can process the same input id. In high frequency cases this could take more time than the processing itself, which negates the effect of scaling instances.
This issue has been solved when the processing pipeline is scaled to 4 or more instances. The watch streams which are opened to the database filter now only for input data which is relevant for the own processing group to avoid synchronization overhead.
Each processing pipeline is now assigned to a processing group which has ideally 2 members. So only 2 instances get to know a single input data. This means when a pipeline is scaled to 10 instances, only 2 will have to ask to process the same input id. For increased reliability each group has 2 to 3 members and not only one.
After this improvement, scaling over 4 or more instances actually increases the throughput for high frequency input processing. This scaling mechanism is only active for pipelines using the input data trigger.
For more general information about pipelines, refer to Pipelines user guide.
