The C++20 standard introduced a specialization of std::
for shared pointers: std::
. How does it work?
Recall that a normal shared_ptr
consists of two pointers: A stored pointer that the shared_ptr
returns when you call get()
and a pointer to a control block which holds the strong reference count, the weak reference count, and a pointer to the managed object.
The atomic version of the shared_ptr
has the same layout, with one change: The bottom two bits of the pointer to the control block are used as flags.
Exercise: Why use the control block pointer instead of the stored pointer to store the flags?
Both the glibc++ libstdc++ and msvc implementations use the bottom bit of the control block pointer as a lock bit: Before performing an operation on the atomic shared pointer, the implementation atomically sets the lock bit to indicate that an atomic operation is in progress. If anybody tries to set the lock bit and finds that it’s already set, they wait for bit to clear. When the owner of the lock bit completes the atomic operation, it clears the lock bit, allowing any waiting threads to proceed.
The difference between libstdc++ and msvc is how they wait for the lock bit to clear.
The libstdc++ implementation treats the lock bit as a spinlock. If the bit is set, it just goes into a loop checking the bit until it finally clears.
The msvc implementation uses the second-from-bottom bit of the pointer as a unlock-notify bit. If the lock bit is set, msvc sets the unlock-notify bit and then calls wait()
to wait for a notification. When the lock bit is cleared, msvc also clears the unlock-notify bit, and if the unlock-notify bit was previously set, it calls notify_all()
to wake up all waiters. This wakes up the locking thread so it can try to lock the now-unlocked shared pointer. (This also wakes up any app threads which called wait()
, but wait()
will internally re-check the condition and go back to sleep if the wake was spurious.)
For wait()
and notify_one()
/notify_all()
, both libstdc++ and msvc use the technique of waiting for a value to change. The msvc implementation uses WaitOnAddress
if available; otherwise it falls back to a manually-managed version built out of condition variables. (Conditions variables are available starting in Windows Vista. The last version of msvc to support Windows XP was Visual Studio 2017.) The libstdc++ implementation also uses a manually-managed version, built out of futexes if available, else condition variables.
So atomic shared pointers are basically the same as normal shared pointers, just with a lock hiding inside the control block pointer.
Bonus reading: What it means when you convert between different shared_ptr
s. Phantom and indulgent shared pointers.
Bonus viewing: A lock-free std::atomic<std::shared_ptr> (video). The presentation of the lock-free implementation begins at 27:50.
Bonus chatter: Since the atomic shared pointer is locked for all operations, you can think of it as having a std::
built in. You therefore get full serialization on both read and write operations.
But if your use of the shared_
is mostly-read, rarely-write, then you will probably get better performance with a shared_mutex
because a shared_mutex
allows multiple owners in read (shared) mode, which allows multiple threads to copy the shared_
simultaneously, rather than making them wait for each other.
Bonus bonus chatter: The presence of an internal lock means that if one thread gets unscheduled while it holds the lock, all the other threads are unable to make progress. And gcc’s use of a spinlock rather than a blocking wait makes it vulnerable to priority inversion deadlocks: If the thread that owns the spinlock is running at a lower priority than the thread that is spinning waiting for the lock, the higher priority spinning thread will consume all the CPU waiting for the lower priority thread to release the lock. But the lower priority thread can’t release the lock because it’s getting starved of CPU by the higher priority spinning thread.
Bonus bonus bonus chatter: Wait, what about clang libcxx?
Oh, as of this writing, clang libcxx hasn’t implemented atomic<shared_ptr<T>>
yet.
Answer to exercise: The library controls the allocation of the control block, so it can ensure that the pointer is 4-byte aligned, thereby leaving two free bits for flags. On the other hand, the caller controls the stored pointer, and it might not be 4-byte aligned. (For example, it might be a pointer to a char
.)
While priority inversion is a thing, from atomic I’d expect as quick as possible operation.
And mutex impl can be done manually.
The
notify_all
line in MSVC STL has a comment that saysWhy can’t
notify_one
be used if we don’t count waiters?I don’t know. Maybe you can read the original PR for clues.
Re where to steal bits from; even if the managed type has the correct alignment, stealing those bits would depend on being handled a correct pointer. That would prevent the user from stealing bits themselves or using pointer types to store “handles” that don’t actually point to anything (I don’t know if that’s valid, but 100% someone does it and would complain if you broke them).
std::shared_pointer itself doesn’t seem to make any guarantees on alignment. As long as the user passes in a custom deleter, then this would be perfectly valid.
I would guess that the main reason alignment was pointed out here is that pointers to char (character type, signed and unsigned), short (signed and unsigned) and wchar_t are a lot more common than bit stealing.