Skip to content
Lew Dawson

Apache Nifi: ListS3 Processor Is Stateful

TodayILearned, Java, ApacheNifi, Nifi, Programming, NotObvious1 min read

The ListS3 processor in Apache Nifi has quirk that's not obvious at first glance. While playing with it recently, I was confounded by its behavior at first. When first starting the processor, it saw all objects in the bucket. When the task triggered a second time, the processor appeared to see nothing as no new FlowFiles were emitted. After reading the manual, I discovered semi-ambiguous language in there about this:

This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.

If you look at the code, you can also see that it keeps a map of keys its seen and the last modified timestamp for each. Turns out the processor is stateful.

The only scenarios were the processor will emit object metadata are when:

  1. It is a new object in the bucket

  2. It has been modified since last looking (i.e., the timestamp changed)

  3. It is a new instance of the processor

  4. The processor's state has been cleared

    Regarding the final point, you can clear the processor's state by right clicking on the stopped processor, selecting View state, and clicking on Clear state.

© 2024 by Lew Dawson. All rights reserved.