Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations

6 Dec 2023


      On Mon, Dec 04, 2023 at 07:51:01PM +0100, Jan Kara wrote:
...
Hello Yury!
On Sun 03-12-23 11:23:47, Yury Norov wrote:
...
Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.
The target patterns may look like this:
for (idx = 0; idx < nbits; idx++)
   	if (test_and_clear_bit(idx, bitmap))
   		do_something(idx);
Or like this:
do {
   	bit = find_first_bit(bitmap, nbits);
   	if (bit >= nbits)
   		return nbits;
   } while (!test_and_clear_bit(bit, bitmap));
   return bit;
In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:
for_each_test_and_clear_bit(idx, bitmap, nbits)
   	do_something(idx);
Or:
   return find_and_clear_bit(bitmap, nbits);
These are fine cleanups but they actually don't address the case that has
triggered all these changes - namely the xarray use of find_next_bit() in
xas_find_chunk().
...
...
This series is a result of discussion [1]. All find_bit() functions imply
exclusive access to the bitmaps. However, KCSAN reports quite a number
of warnings related to find_bit() API. Some of them are not pointing
to real bugs because in many situations people intentionally allow
concurrent bitmap operations.
If so, find_bit() can be annotated such that KCSAN will ignore it:
    bit = data_race(find_first_bit(bitmap, nbits));

No, this is not a correct thing to do. If concurrent bitmap changes can
happen, find_first_bit() as it is currently implemented isn't ever a safe
choice because it can call __ffs(0) which is dangerous as you properly note
above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
implementation to fix this issue but you disliked that. So other option we
have is adding find_first_bit() and find_next_bit() variants that take
volatile 'addr' and we have to use these in code like xas_find_chunk()
which cannot be converted to your new helpers.
Here is some examples when concurrent operations with plain find_bit()
are acceptable:
- two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.
In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly first) set or clear
bit, correspondingly.
In such cases, KCSAN may be safely silenced.
...
...
This series addresses the other important case where people really need
atomic find ops. As the following patches show, the resulting code
looks safer and more verbose comparing to opencoded loops followed by
atomic bit flips.
In [1] Mirsad reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.
It was actually me who reported the regression here [2] but whatever :)
[2] https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz
My apologize.
...
...
Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.
So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.
Well, all users where any concurrency can happen should switch. Otherwise
they are prone to the (admittedly mostly theoretical) data race issue.
						Honza

-- 
Jan Kara jack@suse.com
SUSE Labs, CR