Not entirely sure what you mean. You might need to clarify that.
If you're worried about async-compute/multiple-dispatches causing issues, it isn't magical. You have to deliberately use async to need to worry.
(Between all the threads and thread groups.)
You can't do this. You can only emulate by queueing up a chain of dispatches to do whatever sync it is you need to do in intermediate kernels based on the results of the previous ones.
If you aren't familiar with it you should probably look into how Prefix-Sums are used to compact buffers, it has these sorts of problems and it's a building-block used all the time.
All synchronization is group-based, there are no global syncs other than when the compute shader returns. Even then all you can really synchronize is memory access.
If you have something like a coalesce that needs to happen for each group before you can move onto the next compute shader (or continue in the current one) it's fairly common to block and do that on the first-thread:
__appropriate_barrier__ (based on prior accesses)
if this_thread_ID == first_group_thread_ID then
perform coalesce work
__appropriate_barrier__ (based on accesses in the if, if more work is to be done in same kernel)