Revision tags: v7.1.1, v7.0.4, v7.1.0, v6.6.5-MP1, v6.6.5, v6.6.4, v7.0.2, v7.0.1, v6.6.3, v7.0.0, v6.6.2, v6.5.2, v6.0.5, v6.6.1, v6.5.1-MP5, v6.6.0, v6.5.1-MP3, v6.5.1, v6.0.4 |
|
#
34688619 |
| 15-Jan-2020 |
James Harrison <james.harrison@couchbase.com> |
MB-37529: Permit locking of multiple ranges of seqList
Backfilling from the seqList requires a range of items to be "protected" from being updated and moved during the lifetime of the backfill. This
MB-37529: Permit locking of multiple ranges of seqList
Backfilling from the seqList requires a range of items to be "protected" from being updated and moved during the lifetime of the backfill. This is to ensure the resulting set of items is consistent at the end of the backfill.
To do this, backfills take a range read marking seqnos X to Y, and front end ops will not relocate items from inside this range. Instead, a new item will be appended, and the old one marked as stale, to be cleaned up later after the backfill has completed.
However, previously only a single range read was allowed at a given time. This means that an ephemeral vbucket can only serve a single backfill at a time, all others will be delayed. This can have a significant impact, given that a vbucket might need to backfill for 3 replicas, indexer nodes, the projector for XCDR etc. all at the same time. This is made more likely under high memory usage, if cursor dropping is triggered (as in-memory streams will be forced back to backfill).
This patch allows concurrent read-only access to ranges of the seqList, allowing concurrent backfills.
There is also specific support for exclusive write access to ranges of the seqList; this is required for tombstone purging. Exclusive range locks block shared range locks that would overlap the exclusive range. Exclusive locks will not succeed if a shared range lock exists which would overlap it.
As described above, range locks still do not block front end ops.
The "locking" is read-preferring; as a consequence the tombstone purger will be prevented from locking the needed seqno range as long as a single backfill holds a lock covering any of those seqnos. As constant backfills are not normally expected read-preference is acceptable for now. However, given the importance of purging tombstones/stale items in terms of managing memory usage (as noted briefly in MB-25218) write-preference in this situation may be worth assessing.
Change-Id: I17bd544f091415bc6ae336551d92ddb2104f51ca Reviewed-on: http://review.couchbase.org/121952 Well-Formed: Build Bot <build@couchbase.com> Reviewed-by: Dave Rigby <daver@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
Revision tags: v6.5.0, v6.0.3, v5.5.6 |
|
#
f6ed4896 |
| 09-Aug-2019 |
Dave Rigby <daver@couchbase.com> |
MB-35458 [SR]: Move SyncWrite completion to bg DurabilityCompletionTask
[[Re-apply after fixing error in DurabilityCompletionTask::run (skipping last vBucket).]]
Change how SyncWrites which are
MB-35458 [SR]: Move SyncWrite completion to bg DurabilityCompletionTask
[[Re-apply after fixing error in DurabilityCompletionTask::run (skipping last vBucket).]]
Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort() into a background task - DurabilityCompletionTask.
+Background+
There are two reasons for making this change:
a) Performance - specifically latency of front-end worker threads.
By moving completion into a background task, we reduce the amount of work done on the thread which actually detected the SyncWrite was resolved - typically the front-end DCP threads when a DCP_SEQNO_ACK is processed. Given that we SEQNO_ACK at the end of Snapshot, A single SEQNO_ACK could result in committing multiple SyncWrites. Committing one SyncWrite is similar to a normal front-end Set operation, so there is potentially a non-trivial amount of work needed to be done when completing SyncWrites, which could tie up the front-end thread (causing other Connections to have to wait) for a noticable amount of time.
b) Simplification of lock management.
Doing completion in a background task simplifies lock management, for example we avoid lock inversions with earlier locks acquired during dcpSeqnoAck when attemping to later call notifySeqnoAvailable when this was done on the original thread.
+Problem+
While (a) was the first reason identified for making this change (see MB-33092), (b) is the reason this change is being made now. During testing the following lock-order-inversion was seen:
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph:
Stream::streamMutex => StreamContainer::rwlock => Stream::streamMutex
The crux of the issue is the processing of DCP_SEQNO_ACKNOWLEDGED messages by the DcpProducer - this acquires the Stream::streamMutex before calling VBucket::seqnoAcknowledged(), however that function currently results in VBucket::commit() being called to synchronously complete the SyncWrite; which in turn must nodify all connected replica that a new seqno is available, requiring StreamContainer::rwlock to be acquired:
Mutex StreamContainer::rwlock acquired here while holding mutex Stream::streamMutex in thread T15: ... #6 StreamContainer<std::shared_ptr<Stream> >::rlock() #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) ... #13 VBucket::commit(...) #14 ActiveDurabilityMonitor::commit(...) #15 ActiveDurabilityMonitor::processCompletedSyncWriteQueue() #16 ActiveDurabilityMonitor::seqnoAckReceived(...) #17 VBucket::seqnoAcknowledged(...) #18 ActiveStream::seqnoAck(...) #19 DcpProducer::seqno_acknowledged(...) ...
Mutex Stream::streamMutex previously acquired by the same thread here: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::seqnoAck(...) #5 DcpProducer::seqno_acknowledged(...) ...
This conflicts with the ordering seen when sending items out on the DCP connection - inside DcpProducer::step() where the StreamContainer::rwlock is acquired first, then ActiveStream::mutex acquired later:
Mutex Stream::streamMutex acquired here while holding mutex StreamContainer::rwlock in thread T15: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::next() #5 DcpProducer::getNextItem() #6 DcpProducer::step(dcp_message_producers*) ...
Mutex StreamContainer::rwlock previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002c98b) ... #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) #5 StreamContainer<>::ResumableIterationHandle::ResumableIterationHandle() #6 StreamContainer<>::startResumable() #7 DcpProducer::getNextItem() #8 DcpProducer::step(dcp_message_producers*) ...
+Solution+
The processing of resolved SyncWrites moved into a new background task. Instead of immediately processing them within ActiveDM::seqnoAckReceived(), that function notifies the new NonIO DurabilityCompletionTask that there are SyncWrites waiting for completion.
DurabilityCompletionTask maintains a bool per vBucket indicating if there are SyncWrites for that vBucket pending completion. When the task is run, for each flag which is true it calls VBucket::processResolvedSyncWrites() for the associated VBucket.
+Implementation Notes+
Currently there is just a single DurabilityCompletionTask (per Bucket), this was chosen as 1 task per vBucket (i.e. 1024 per Bucket) would be inefficient for our current background task scheduler (both in terms of latency to schedule each task for only one vBucket's worth of work, and in terms of managing that many tasks in the future queue).
However, that does _potentially_ mean there's fewer resources (threads) available to complete SyncWrites on - previously that work could be done concurrently on all frontend threads (~O(num_cpus). Now the same work only has 1 thread available to run on (there's only a single DurabilityCompletionTask).
_If_ this becomes a bottleneck we could look at increasing the number of DurabilityCompletionTask - e.g. sharding all vBuckets across multiple tasks like flusher / bgfetcher.
Change-Id: I33ecfa78b03b4d2120b5d05f54984b24ce038fd8 Reviewed-on: http://review.couchbase.org/113749 Reviewed-by: Ben Huddleston <ben.huddleston@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
#
e39b54f7 |
| 21-Aug-2019 |
Dave Rigby <daver@couchbase.com> |
MB-35629: Revert "MB-35458 [SR]: Move SyncWrite completion to bg DurabilityCompletionTask"
The introduction of the background commit has introduced a (severe) performance regression - SyncWrites are
MB-35629: Revert "MB-35458 [SR]: Move SyncWrite completion to bg DurabilityCompletionTask"
The introduction of the background commit has introduced a (severe) performance regression - SyncWrites are getting stuck and timing out. Suspect an issue with the wakeup / notificaiotn of the DurabilityCompletionTask; reverting change to restore performance while investigating.
This reverts commit 7fdb98a995f6298678e3ac7b443f9454e21ffae1.
Change-Id: I8d08044d3f6ad3084f7724ead961b8d7530006f1 Reviewed-on: http://review.couchbase.org/113637 Reviewed-by: Ben Huddleston <ben.huddleston@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
#
7fdb98a9 |
| 09-Aug-2019 |
Dave Rigby <daver@couchbase.com> |
MB-35458 [SR]: Move SyncWrite completion to bg DurabilityCompletionTask
Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort()
MB-35458 [SR]: Move SyncWrite completion to bg DurabilityCompletionTask
Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort() into a background task - DurabilityCompletionTask.
+Background+
There are two reasons for making this change:
a) Performance - specifically latency of front-end worker threads.
By moving completion into a background task, we reduce the amount of work done on the thread which actually detected the SyncWrite was resolved - typically the front-end DCP threads when a DCP_SEQNO_ACK is processed. Given that we SEQNO_ACK at the end of Snapshot, A single SEQNO_ACK could result in committing multiple SyncWrites. Committing one SyncWrite is similar to a normal front-end Set operation, so there is potentially a non-trivial amount of work needed to be done when completing SyncWrites, which could tie up the front-end thread (causing other Connections to have to wait) for a noticable amount of time.
b) Simplification of lock management.
Doing completion in a background task simplifies lock management, for example we avoid lock inversions with earlier locks acquired during dcpSeqnoAck when attemping to later call notifySeqnoAvailable when this was done on the original thread.
+Problem+
While (a) was the first reason identified for making this change (see MB-33092), (b) is the reason this change is being made now. During testing the following lock-order-inversion was seen:
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph:
Stream::streamMutex => StreamContainer::rwlock => Stream::streamMutex
The crux of the issue is the processing of DCP_SEQNO_ACKNOWLEDGED messages by the DcpProducer - this acquires the Stream::streamMutex before calling VBucket::seqnoAcknowledged(), however that function currently results in VBucket::commit() being called to synchronously complete the SyncWrite; which in turn must nodify all connected replica that a new seqno is available, requiring StreamContainer::rwlock to be acquired:
Mutex StreamContainer::rwlock acquired here while holding mutex Stream::streamMutex in thread T15: ... #6 StreamContainer<std::shared_ptr<Stream> >::rlock() #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) ... #13 VBucket::commit(...) #14 ActiveDurabilityMonitor::commit(...) #15 ActiveDurabilityMonitor::processCompletedSyncWriteQueue() #16 ActiveDurabilityMonitor::seqnoAckReceived(...) #17 VBucket::seqnoAcknowledged(...) #18 ActiveStream::seqnoAck(...) #19 DcpProducer::seqno_acknowledged(...) ...
Mutex Stream::streamMutex previously acquired by the same thread here: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::seqnoAck(...) #5 DcpProducer::seqno_acknowledged(...) ...
This conflicts with the ordering seen when sending items out on the DCP connection - inside DcpProducer::step() where the StreamContainer::rwlock is acquired first, then ActiveStream::mutex acquired later:
Mutex Stream::streamMutex acquired here while holding mutex StreamContainer::rwlock in thread T15: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::next() #5 DcpProducer::getNextItem() #6 DcpProducer::step(dcp_message_producers*) ...
Mutex StreamContainer::rwlock previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002c98b) ... #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) #5 StreamContainer<>::ResumableIterationHandle::ResumableIterationHandle() #6 StreamContainer<>::startResumable() #7 DcpProducer::getNextItem() #8 DcpProducer::step(dcp_message_producers*) ...
+Solution+
The processing of resolved SyncWrites moved into a new background task. Instead of immediately processing them within ActiveDM::seqnoAckReceived(), that function notifies the new NonIO DurabilityCompletionTask that there are SyncWrites waiting for completion.
DurabilityCompletionTask maintains a bool per vBucket indicating if there are SyncWrites for that vBucket pending completion. When the task is run, for each flag which is true it calls VBucket::processResolvedSyncWrites() for the associated VBucket.
+Implementaiton Notes+
Currently there is just a single DurabilityCompletionTask (per Bucket), this was chosen as 1 task per vBucket (i.e. 1024 per Bucket) would be inefficient for our current background task scheduler (both in terms of latency to schedule each task for only one vBucket's worth of work, and in terms of managing that many tasks in the future queue).
However, that does _potentially_ mean there's fewer resources (threads) available to complete SyncWrites on - previously that work could be done concurrently on all frontend threads (~O(num_cpus). Now the same work only has 1 thread available to run on (there's only a single DurabilityCompletionTask).
_If_ this becomes a bottleneck we could look at increasing the number of DurabilityCompletionTask - e.g. sharding all vBuckets across multiple tasks like flusher / bgfetcher.
Change-Id: I87897af1e3fd0a57e5abc2cb1ba9f795a9d3c63e Reviewed-on: http://review.couchbase.org/113141 Tested-by: Build Bot <build@couchbase.com> Reviewed-by: Ben Huddleston <ben.huddleston@couchbase.com>
show more ...
|
Revision tags: v6.0.2, v5.5.5 |
|
#
5954243e |
| 15-May-2019 |
Ben Huddleston <ben.huddleston@couchbase.com> |
Use MockEphemeralVBucket in VBucketDurabilityTest
To add some commit and SyncDelete tests in the future, we need to be able to manually check seqList stats and correct the collections stats for Ephe
Use MockEphemeralVBucket in VBucketDurabilityTest
To add some commit and SyncDelete tests in the future, we need to be able to manually check seqList stats and correct the collections stats for Ephemeral to prevent counter underflow assertions. The underflow occurs because we don't hit the public functions in VBucketDurabilityTest (we would need an engine to do so) we miss a call to update the collections stats on a set.
Change-Id: If72a0b97c9b56415d68caec1fadb22e143ac63aa Reviewed-on: http://review.couchbase.org/109199 Tested-by: Build Bot <build@couchbase.com> Reviewed-by: Dave Rigby <daver@couchbase.com>
show more ...
|
#
bfdd494d |
| 30-Apr-2019 |
Dave Rigby <daver@couchbase.com> |
Convert item_eviction_policy_t to enum class
Change-Id: Id5aaf7db03703fb9ca1b2df733bcb2d44a1f3dab Reviewed-on: http://review.couchbase.org/108449 Tested-by: Build Bot <build@couchbase.com> Reviewed-
Convert item_eviction_policy_t to enum class
Change-Id: Id5aaf7db03703fb9ca1b2df733bcb2d44a1f3dab Reviewed-on: http://review.couchbase.org/108449 Tested-by: Build Bot <build@couchbase.com> Reviewed-by: Daniel Owen <owend@couchbase.com>
show more ...
|
#
bfe28015 |
| 23-Apr-2019 |
Paolo Cocchi <paolo.cocchi@couchbase.com> |
MB-33860 [SR]: Introduce the SeqnoAckCallback
This is a pre-requirement for the PassiveDurabilityMonitor to call back to VBucket for sending a SeqnoAck message on the PassiveStream. Note that new ca
MB-33860 [SR]: Introduce the SeqnoAckCallback
This is a pre-requirement for the PassiveDurabilityMonitor to call back to VBucket for sending a SeqnoAck message on the PassiveStream. Note that new callback is not used yet, so there is no test in this patch. The callback will be used and tested in follow-up patches where the PassiveDM will actually trigger seqno-acks at high_prepared_seqno updates.
Change-Id: I03b1030257e1a92d96cc173e9bd260a6df0b346b Reviewed-on: http://review.couchbase.org/108149 Reviewed-by: Dave Rigby <daver@couchbase.com> Tested-by: Dave Rigby <daver@couchbase.com>
show more ...
|
Revision tags: v5.5.4 |
|
#
04c1c8ec |
| 20-Mar-2019 |
Trond Norbye <trond.norbye@gmail.com> |
Remove config.h
Change-Id: I79eb8c762971255db9d36a5f6461a8a6d0f29249 Reviewed-on: http://review.couchbase.org/106517 Tested-by: Build Bot <build@couchbase.com> Reviewed-by: Dave Rigby <daver@couchba
Remove config.h
Change-Id: I79eb8c762971255db9d36a5f6461a8a6d0f29249 Reviewed-on: http://review.couchbase.org/106517 Tested-by: Build Bot <build@couchbase.com> Reviewed-by: Dave Rigby <daver@couchbase.com>
show more ...
|
Revision tags: v6.0.1 |
|
#
ec5434bf |
| 08-Jan-2019 |
Dave Rigby <daver@couchbase.com> |
MB-32535 [SR]: Notify SyncWrite client when request complete
When a SyncWrite request is complete, notify the client which make the request so the response can be sent back.
To do this we use notif
MB-32535 [SR]: Notify SyncWrite client when request complete
When a SyncWrite request is complete, notify the client which make the request so the response can be sent back.
To do this we use notifyIOComplete() method on the engine; however unlike other existing instances where the notification is done by a background task; we are calling it from the DurabilityMonitor which doesn't have direct access to the engine instance. Additionally unlike existing EWOULDBLOCK notification for stores this isn't a idempotent operation - if we performed the store() again then that would cause /another/ mutation to occur. To solve this, we:
1. Define a new SyncWriteCompleted callback object, an instance of which is passed into VBucket when it is created. This is typically instantiated with a function / lambda from KVBucket which calls engine.notifyIOComplete().
2. From VBucket::notifyClientOfCommit() this callback object is invoked.
3. Inside EPEngine::storeIfInner() if this is a SyncWrite operation, use engineSpecific to record that this is a SyncWrite which has successfully been prepared, and that the next call to store() should simply return the completed status & CAS of the now-committed item.
Change-Id: I38a6dae3aeb98dcedd1ae1ec321db054ff1b7cd7 Reviewed-on: http://review.couchbase.org/103446 Reviewed-by: Paolo Cocchi <paolo.cocchi@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
Revision tags: v5.5.3, v5.1.3, v6.0.0 |
|
#
6cfd6a6c |
| 14-Sep-2018 |
Dave Rigby <daver@couchbase.com> |
Ensure headers 'include what they use'
As a follow-up from the previous commit; now we have the ability to compile each header standalone, fix any missing #includes where header didn't fully define
Ensure headers 'include what they use'
As a follow-up from the previous commit; now we have the ability to compile each header standalone, fix any missing #includes where header didn't fully define all headers it requires.
Change-Id: Ib3d7e8274b0e736a52c83e8332891bd70f23c729 Reviewed-on: http://review.couchbase.org/99723 Reviewed-by: Daniel Owen <owend@couchbase.com> Reviewed-by: Trond Norbye <trond.norbye@gmail.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
Revision tags: v5.5.2 |
|
#
12561b64 |
| 03-Sep-2018 |
Christopher Farman <christopher.farman@couchbase.com> |
MB-30552: Replace VBucket(Map)::id_type with Vbid type
Seeming as VBucket::id_type was changed from uint16_t to Vbid, the use of VBucket::id_type can be clarified throughout the codebase by changing
MB-30552: Replace VBucket(Map)::id_type with Vbid type
Seeming as VBucket::id_type was changed from uint16_t to Vbid, the use of VBucket::id_type can be clarified throughout the codebase by changing to the shorter, consistent type 'Vbid'. Following suit, VBucketMap::id_type has been done in the same patch to avoid potential confusion. Further, the addtion of Vbid::id_type can be used if the implemented type of Vbid is needed.
Change-Id: I2d41f75fd9e6684fb425b1fe7d17ae0e6a64e2e3 Reviewed-on: http://review.couchbase.org/99143 Reviewed-by: Dave Rigby <daver@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
Revision tags: v5.5.1, v5.1.2, v5.5.0, v5.1.1, v5.1.0, v5.0.1, v5.0.0 |
|
#
2aeac7ff |
| 14-Jun-2017 |
Dave Rigby <daver@couchbase.com> |
MB-24765: Make EphTombstoneHTCleaner visit HashTable incrementally
Change the EphTombstoneHTCleaner task to visit the HashTable incrementally, yielding back to the scheduler every ephemeral_metadata
MB-24765: Make EphTombstoneHTCleaner visit HashTable incrementally
Change the EphTombstoneHTCleaner task to visit the HashTable incrementally, yielding back to the scheduler every ephemeral_metadata_purge_chunk_duration (default: 20ms). This significantly reduces the impact this background task may have on the scheduling of other NonIO tasks, given that prevously it was seen to take a number of seconds to run.
This is implemented in a similar way to the DefragmenterTask - use HashTable::pauseResumeVisit() instead of ::visit() to perform the visitation of each HashTable (which allows the visitor to yield and pause execution).
One difference with the DefragmenterTask is that EphTombstoneHTCleaner does *not* snooze between runs if it hasn't yet completed - it simply returns true, which will cause the scheduler to re-schedule it as soon as possible. The reason for this is that tombstone purging can make a material difference on the size of data stored - and hence we want to complete a visitation of the whole bucket as quickly as possible.
Change-Id: If07eb6a0845dffa6e49670731fab01479f266c06 Reviewed-on: http://review.couchbase.org/79555 Reviewed-by: Manu Dhundi <manu@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
#
467c4742 |
| 31-May-2017 |
Manu Dhundi <manu@couchbase.com> |
MB-24470: Do not add temp items to the sequence list
In certain cases of delete_with_meta() we add a short lived temp item in the hashtable and then immediately generate a valid seqno to it, hence m
MB-24470: Do not add temp items to the sequence list
In certain cases of delete_with_meta() we add a short lived temp item in the hashtable and then immediately generate a valid seqno to it, hence making it a non-temp item. In Ephemeral buckets, this item while temp is put onto the sequence list with a negative seqno (temp items have negative seqno in ep-engine). Having a negative seqno item in the sequence list can cause problems for range (sequential) readers like DCP, tombstone purge.
This commit ensures that we do not add to temp item on the sequence list. Temp item is still added on the hash table of legacy reasons. When the temp item is updated, that is made "non-temp", we put it onto the sequence list under the "list write lock" and generate a valid sequence number to it.
The commit also adds some module tests for the same.
Change-Id: I8a98cee401af11f59858d0a2f39cff3cbd540cbf Reviewed-on: http://review.couchbase.org/78689 Reviewed-by: Dave Rigby <daver@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
show more ...
|
#
ef22f9b0 |
| 25-May-2017 |
Dave Rigby <daver@couchbase.com> |
Move ep-engine to engines/ep
|