History log of /6.0.3/kv_engine/engines/ep/src/taskqueue.cc (Results 1 - 25 of 56)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v7.0.2, v6.6.3, v7.0.1, v7.0.0, v6.6.2, v6.5.2, v6.6.1, v6.0.5, v6.6.0, v6.5.1, v6.0.4, v6.5.0, v6.0.3
# e7ed2862 27-Aug-2019 Dave Rigby <daver@couchbase.com>

MB-35649: Remove ExecutorThread::waketime

Since the fix for MB-35649, this member variable is no longer used for
thread scheduling (wakeup); sleep time is now directly calculated from

MB-35649: Remove ExecutorThread::waketime

Since the fix for MB-35649, this member variable is no longer used for
thread scheduling (wakeup); sleep time is now directly calculated from
the futureQueue contents.

As such remove it.

Change-Id: Iaff9f9e7d19f12416000dd3a9b31bb41d5e9e12d
Reviewed-on: http://review.couchbase.org/114231
Well-Formed: Build Bot <build@couchbase.com>
Tested-by: Build Bot <build@couchbase.com>
Reviewed-by: James Harrison <james.harrison@couchbase.com>

show more ...


# e629ac43 23-Aug-2019 Dave Rigby <daver@couchbase.com>

MB-35649: Fix lost (delayed) wakeups in ep background tasks

+Summary+

The ExecutorPool scheduler which manages all the background tasks
suffers from a race conditon in task wake

MB-35649: Fix lost (delayed) wakeups in ep background tasks

+Summary+

The ExecutorPool scheduler which manages all the background tasks
suffers from a race conditon in task wakeup which results in lost
task wakeups - i.e. tasks which should be woken immediately are not
waken until the next ExecutorThread polling interval (default 2s).

Amongst the tasks which suffer from this bug are
ActiveStreamCheckpointProcessorTask - recall this is the task
responsible for preparing outbound DCP messages to send to
replicas. As such, this bug results in intermittent 2s "pauses" in
SyncWrites completing (for #replicas > 0), as the DCP_PREPARE messages
are delayed in being sent out by 2s.

By fixing this bug we *significantly* improve the tail scheduler
wakeup latencies (i.e. how 'late' tasks are run compared to when the
task wanted to run), by up to 1000x (3 orders of magnitude). +

+Problem+

The bug is related to how an ExecutorThread's sleep time is
calculated, in the presence of an external caller waking up the task
after it has completed.

Consider a task like ActiveStreamCheckpointProcessorTask (others
behave the same).

* In an idle bucket it will be set to sleep forever as there will be
no DCP mutations to prepare - the Task will be in the AuxIO
futureQueue with a waketime of <FOREVER>.

* When new mutations occur in the frontend, the Task will be woken up
(via TaskQueue::_wake()) which:

a) Updates the Tasks' waketime to <NOW>.

b) Re-sorts the futureQueue if necessary, ensuring the soonest task
(either ActiveStreamCheckpointProcessorTask or another task just
woken) is at the front of the queue.

c) If any threads of the given type (AuxIO) are currently sleeping,
then wakes up threads equal to the number of tasks ready to
execute.

d) Those sleeping thread(s) will then check the futureQueue and pick
the next task to run.

This works correctly assuming that at least 1 AuxIO thread is sleeping
(in fact for simplicity it's sufficient to assume there's only 1 AuxIO
thread) - at (d) the sleeping thread is woken, and the Task is run().

However, consider what happens if the Task has /just/ finished a
previous run() iteration just before the wake() call occurs - there is
a race which results in a lost (delayed) wakeup:

<<<AuxIO Thread>>>

1) Assume Task::run() has just returned to ExecutorThread::run().

2) Task is reschedule()'d back in the futureQueue with a waketime of
<FOREVER>. Assume no other tasks are in the futureQueue.

3) setWaketime() is called with futureQueue min waketime - <FOREVER>.

4) TaskQueue::_nextTask() called. There are no tasks ready to run, so
it is about to call TaskQueue::_sleepThenFetchNextTask()...

<<<Waking thread>>>

5) TaskQueue::_wake() called, which acquires
TaskQueue::mutex, and performs operations
described above, crucially moving Task to the
front of the futureQueue. FutureQueue min waketime
= <NOW>. Releases TaskQueue::mutex.

<<<AuxIO Thread>>

6) ... calls TaskQueue::_sleepThenFetchNextTask(), acquires
TaskQueue::mutex and calls _doSleep. This reads the threads
waketime (as set at 3) which is <FORVER>, then calculates the
Thread's sleep time as min(2, minWaketime) - i.e. min(2, FOREVER)
or 2 seconds.

7) Thread goes to sleep for 2 seconds, even though there's a task in
futureQueue expecting to be run <NOW>.

<END>

While this is a complex scheduling system, arguably the crux of the
problem is the use of an independent member variable
(ExecutorThread::waketime) to determine when to wake the thread, which
is set independently of when the futureQueue is modifed (the
futureQueue contents ultimately dictate when a thread should be
woken).

+Solution+

Instead of pre-calculating a Thread's waketime and then applying it
later on, simply calculate the waketime directly from the contents of
the futureQueue just before we sleep - and crucially with the
TaskQueue::mutex held (which guards the futureQueue), preventing any
other threads racing to update it.

(A follow-up patch will remove ExecutorThread::waketime as it is now
unused).

+Alternative Solutions+

- We _could_ possibly extend the scope of the TaskQueue::mutex to also
guard ExecutorThread::waketime, however this feels complex and
breaks encapsulation - one objects' mutex is guarding state from
another. As such this approach wasn't seriously considered.

+Results+

Scheduler wait time histograms (i.e. how long a task waited to run after
it expected to start), ~100,000 samples for each:

Before:

ActiveStreamCheckpointProcessorTask[auxIO] (101277 total)
...
12us - 14us : ( 52.2389%) 7852 ███████████████████▏
...
82us - 86us : ( 99.0166%) 142 ▎
...
494us - 510us : ( 99.9832%) 1
510us - 2031ms : (100.0000%) 17
Avg : ( 315us)

* p50 = 14us, p99 = 86us, p99.99 = 2,031,000us, p100= 2,031,000us

After:

ActiveStreamCheckpointProcessorTask[auxIO] (100820 total)
...
17us - 20us : ( 54.7897%) 13725 █████████████████████████████████▊
...
110us - 118us : ( 99.1232%) 218 ▌
...
830us - 862us : ( 99.9901%) 1
...
1854us - 2302us : (100.0000%) 1
Avg : ( 26us)

* p50 = 20us, p99 = 118us, p99.99 = 862us, p100= 2302us

Note that p99.99 is ~1000x better (862 microseconds vs 2031
/milli/seconds).

Additionally the op/s observed with SyncWrites(level=majority) are
much smoother - we no longer see the op/s rate drop to zero for ~2s at
a time; they are ~constant.

+Further Work+

Arguably the only reason this problem has gone unnoticed for so long
is the 2s "fallback" upper bound applied to an ExecutorThread's
sleep time - while the wakeup is lost, the problem will self-correct
itself after 2s. If we hadn't had that "feature" then we would have
spotted this problem long ago (would have manifested as DCP streams
stopping forever and never recovering).

In the interests of creating a targetted, minimal fix, and given we've
have this fallback in place for a long time I'm not removing it in
this patch, however it _should_ be done in a follow-up.

Change-Id: Ie8569589f521c326b4f357bed1ad27c0790e7e88
Reviewed-on: http://review.couchbase.org/113769
Reviewed-by: James Harrison <james.harrison@couchbase.com>
Tested-by: Build Bot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
(cherry picked from commit 32c0ae9f18a36552761eb307c6fe8ecf349b6f7c)
Reviewed-on: http://review.couchbase.org/114121
Well-Formed: Build Bot <build@couchbase.com>

show more ...


Revision tags: v5.5.4, v5.5.5, v5.5.6, v6.0.1, v5.5.3, v6.0.0, v5.1.3, v5.5.2, v5.5.1, v5.1.2, v5.1.1, v5.0.1, v5.1.0, v5.0.0
# fe6b9e03 12-Jul-2017 Dave Rigby <daver@couchbase.com>

MB-23797: Revert "Introduce testing exception for rescheduled dead tasks"

In http://review.couchbase.org/#/c/76394/ the ability to reschedule a
cancelled (and now in state TASK_DEAD) Glo

MB-23797: Revert "Introduce testing exception for rescheduled dead tasks"

In http://review.couchbase.org/#/c/76394/ the ability to reschedule a
cancelled (and now in state TASK_DEAD) GlobalTask was fixed.

It did not appear that any tasks other than the ItemPager for
ephemeral buckets are rescheduled once dead, but to avoid changing
existing behaviour a "paranoia" check was introduced (6e6429b) to
throw an exception if any task other than the ItemPager is
rescheduled, to bring them to our attention.

This patch has had 3 months of soak time, and hasn't exposed any
unexpected wakups, but to ensure we don't fire such exceptions in
production the check is being reverted.

This reverts commit 6e6429b0b1e20aa44211452198a7b5c80b9ae835.

Change-Id: I82f8d4068e03c451493ef7ac17bd3d21c37f4166
Reviewed-on: http://review.couchbase.org/80594
Tested-by: Build Bot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>

show more ...


# ef22f9b0 25-May-2017 Dave Rigby <daver@couchbase.com>

Move ep-engine to engines/ep


# 6e6429b0 07-Apr-2017 James Harrison <00jamesh@gmail.com>

Introduce testing exception for rescheduled dead tasks

In http://review.couchbase.org/#/c/76394/ the ability to reschedule a
cancelled (and now in state TASK_DEAD) GlobalTask was fixed.

Introduce testing exception for rescheduled dead tasks

In http://review.couchbase.org/#/c/76394/ the ability to reschedule a
cancelled (and now in state TASK_DEAD) GlobalTask was fixed.

it does not appear that any tasks other than the ItemPager for ephemeral
buckets are rescheduled once dead, but to avoid changing existing
behaviour this introduces an exception if any task other than the
ItemPager is rescheduled, to bring them to our attention.

NB: This patch will be reverted to remove this exception for Spock.
(MB-23797)

Change-Id: If44b7cf8a71c3dc4d85fba98d65c4f608d449460
Reviewed-on: http://review.couchbase.org/76551
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: Build Bot <build@couchbase.com>

show more ...


# 66b2fdae 06-Apr-2017 James Harrison <00jamesh@gmail.com>

MB-23750: Fix toggling ephemeral ejection policy

If going

auto_delete -> fail_new_data -> auto_delete

the itemPager task was rescheduled, but as it had previously been

MB-23750: Fix toggling ephemeral ejection policy

If going

auto_delete -> fail_new_data -> auto_delete

the itemPager task was rescheduled, but as it had previously been
cancelled the state remained TASK_DEAD, and so ExecutorThread::run
behaves as if the task has just been cancelled and needs cleaning up.
By resetting scheduled tasks back to TASK_RUNNING if they are dead, this
can be avoided.

Change-Id: Id007a15fdaeb80a79828da0cade031a424e653cf
Reviewed-on: http://review.couchbase.org/76394
Tested-by: Build Bot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>

show more ...


Revision tags: v4.6.2_ep
# d26ea7d1 20-Mar-2017 Dave Rigby <daver@couchbase.com>

Change GlobalTask::getDescription to return cb::const_char_buffer

The pure virtual method GlobalTask::getDescription() returns a
std::string object. This method is actually called very f

Change GlobalTask::getDescription to return cb::const_char_buffer

The pure virtual method GlobalTask::getDescription() returns a
std::string object. This method is actually called very frequently by
the ExecutorThreads (to log task execution / slow tasks), and hence
passing a concrete object is potentially costly, requiring an
allocation (and then deletion) for each call.

Given that the description is mostly constant for the lifetime of a
Task, change the method to return cb::const_char_buffer - i.e. a
read-only non-owning view. This reduces to passing essentially two
words, removing all memory allocation.

Change-Id: I3cd68798d97401a3d5b50c9720c87dade2a1b32e
Reviewed-on: http://review.couchbase.org/75435
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: Build Bot <build@couchbase.com>

show more ...


Revision tags: v4.6.2_mc, v4.6.1_ep
# e7ce1c4f 08-Feb-2017 James Harrison <00jamesh@gmail.com>

MB-22041 [3/N]: Allow dynamic changes to number of threads

Reader, Writer, AuxIO and NonIO threads can now be dynamically stopped
and started using the interface previously used to set t

MB-22041 [3/N]: Allow dynamic changes to number of threads

Reader, Writer, AuxIO and NonIO threads can now be dynamically stopped
and started using the interface previously used to set the maximum
number of these threads - i.e., setMaxReaders.

The behaviour is outwardly the same - the max controls how many tasks of
a given type can be run concurrently, but this can now be dynamically
tuned both upwards and downwards.

Change-Id: I8e10487e5b57466fbe299e7e043a90dd6c8d5aa8
Reviewed-on: http://review.couchbase.org/73304
Tested-by: Build Bot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>

show more ...


Revision tags: v4.6.0_ep
# 3e85e50a 25-Nov-2016 Will Gardner <willg@rdner.io>

Remove LockHolder implementation

Replaces LockHolder with an alias to std::lock_guard. Replaces
instantiations of LockHolder which require `unlock` or ability
to move the lock with s

Remove LockHolder implementation

Replaces LockHolder with an alias to std::lock_guard. Replaces
instantiations of LockHolder which require `unlock` or ability
to move the lock with std::unique_lock and refactors some uses
so they can use a regular std::lock_guard.

Change-Id: Ic2d4e500d807ca7c22af591bdaad417f37718bbd
Reviewed-on: http://review.couchbase.org/70363
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>

show more ...


# d8577c54 15-Nov-2016 Daniel Owen <owend@couchbase.com>

MB-20079: Use std::chrono::steady_clock (ProcessClock)

Change task scheduling to use ProcessClock which is not
affected by changes to wall clock time.

Change-Id: I2fc9688abb782f

MB-20079: Use std::chrono::steady_clock (ProcessClock)

Change task scheduling to use ProcessClock which is not
affected by changes to wall clock time.

Change-Id: I2fc9688abb782fe2c9e80efb6da840be3643d4a5
Reviewed-on: http://review.couchbase.org/69899
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>

show more ...


Revision tags: v4.5.1-MP1_mc, v4.6.0-DP_mc, v4.6.0-DP_ep, v4.5.1-MP1_ep, v4.1.2-MP2_mc, v4.5.1_mc
# 63e4ba7a 27-Aug-2016 Sundar Sridharan <sundar.sridharan@gmail.com>

MB-20735: Fix incorrect waketime for executor threads

When a thread is done with a task from a particular queue,
it calls doneWork() which was resetting its current Queue type
result

MB-20735: Fix incorrect waketime for executor threads

When a thread is done with a task from a particular queue,
it calls doneWork() which was resetting its current Queue type
resulting in subsequent reschedule to pick infinity as
earliest thread waketime and end up sleeping longer when the
readyQueue is empty.
Simplify the reschedule logic as part of this fix.
+Add unit test for validation.

Change-Id: Ifdad1d21e624982460046393d4c377247da90735
Reviewed-on: http://review.couchbase.org/67112
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>

show more ...


Revision tags: v4.6.0_mc, v4.1.2-MP1_ep
# 50838e8a 25-Jul-2016 Jim Walker <jim@couchbase.com>

MB-20235: Fix wake's incorrect work counting

MB-20061 introduced a better futureQueue object but has
introduced a ExecutorPool work accounting bug. The work
counter should only be in

MB-20235: Fix wake's incorrect work counting

MB-20061 introduced a better futureQueue object but has
introduced a ExecutorPool work accounting bug. The work
counter should only be increased for tasks in the readyQueue,
MB-20061 would sometimes add 1 for items in the futureQueue.
The counter would then never reach 0 and the fetchNextTask loop
would never yield (causing high CPU).

Note that when MB-18453 is merged up from sherlock, this code
will be tweaked again to remove any readyQueue push code
from the wake path.

Change-Id: I010de48666fc5327bd2f5912f10dad7177ec5a37
Reviewed-on: http://review.couchbase.org/66147
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>

show more ...


Revision tags: v3.1.6_ep
# 0009ef75 30-Jun-2016 Jim Walker <jim@couchbase.com>

MB-18453: Make task scheduling fairer

The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().

This occurs because ExecutorPool:

MB-18453: Make task scheduling fairer

The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().

This occurs because ExecutorPool::wake() pushes tasks
into the readyQueue enabling frequent wakes to trigger
the starvation bug.

The fix is to remove readyQueue.push from wake, so that we only
push to the readyQueue. The fetch side of scheduling only looks at
the futureQueue once the readyQueue is empty, thus the identified
starvation won't happen.

A unit-test demonstrates the fix using the single-threaded harness and
expects that two tasks of differing priorities get executed, rather
than the wake() starving the low-priority task.

This test drives:
- ExecutorPool::schedule
- ExecutorPool::reschedule
- ExecutorPool::wake

These are all the methods which can add tasks into the scheduler
queue.

The fetch side is also covered:
- ExecutorPool::fetchNextTask

This commit is an update to a previous commit that was reverted due
to performance issues. The original commit was reverted to minimise
disruption.

- original commit is e22c9ebeda1aac
- revert is 27cb1120e3e37

Change-Id: I70a4dcf7cd1c3a6f04548e9bbc3f95e24cdf50ad
Reviewed-on: http://review.couchbase.org/65929
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>

show more ...


# 24c408e6 30-Jun-2016 Jim Walker <jim@couchbase.com>

MB-20061: Improved futureQueue heap property fix

MB-9986 found that we break the heap property of the underlying
container of the std::priority_queue used in scheduling. This is
beca

MB-20061: Improved futureQueue heap property fix

MB-9986 found that we break the heap property of the underlying
container of the std::priority_queue used in scheduling. This is
because we use the wakeTime to sort the queue, but then later
change the value of a task which is still in the queue, thus
order is broken. MB-9986 emptied the futureQueue and rebuilt it.

In this fix we utilise the C++ standard which shows that we can
access the container of the priority_queue. Thus we can sub-class
it and create a method that allows us to update a tasks time and
maintain the heap order.

Change-Id: I238d36ea684d59ef06326183fa1f16c04f8d29ad
Reviewed-on: http://review.couchbase.org/65394
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>

show more ...


# 27cb1120 14-Jul-2016 Jim Walker <jim@couchbase.com>

Revert "MB-18453: Make task scheduling fairer"

When running in a >1 node cluster memcached CPU is running
very high. The original fix has introduced a problem which
needs further inv

Revert "MB-18453: Make task scheduling fairer"

When running in a >1 node cluster memcached CPU is running
very high. The original fix has introduced a problem which
needs further investigation (fetchTask is very very cpu hot).

This reverts commit e22c9ebeda1aac2fc8f4325cc39a93c3bcefffab.

Change-Id: If53a3a60692fbaaef4e54462f99284a8044cd899
Reviewed-on: http://review.couchbase.org/65780
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>

show more ...


# e22c9ebe 30-Jun-2016 Jim Walker <jim@couchbase.com>

MB-18453: Make task scheduling fairer

The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().

This occurs because ExecutorPool:

MB-18453: Make task scheduling fairer

The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().

This occurs because ExecutorPool::wake() pushes tasks
into the readyQueue enabling frequent wakes to trigger
the starvation bug.

The fix is to remove readyQueue.push from wake, so that we only
push to the readyQueue. The fetch side of scheduling only looks at
the futureQueue once the readyQueue is empty, thus the identified
starvation won't happen.

A unit-test demonstrates the fix using the single-threaded harness and
expects that two tasks of differing priorities get executed, rather
than the wake() starving the low-priority task.

This test drives:
- ExecutorPool::schedule
- ExecutorPool::reschedule
- ExecutorPool::wake

These are all the methods which can add tasks into the scheduler
queue.

The fetch side is also covered:
- ExecutorPool::fetchNextTask

Change-Id: Ie797a637ce4e7066e3155751ff467bc65d083646
Reviewed-on: http://review.couchbase.org/65385
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>

show more ...


Revision tags: v4.5.0_mc, v4.5.0_ep
# 1b4f629d 25-May-2016 Jim Walker <jim@couchbase.com>

MB-19428: Don't schedule backfills on dead vbuckets

This is an updated version of the patch which uses
a virtual function to achieve the same outcome as:

b0032858bdf750a595270be

MB-19428: Don't schedule backfills on dead vbuckets

This is an updated version of the patch which uses
a virtual function to achieve the same outcome as:

b0032858bdf750a595270be84fc422c3611fdc38

Only ActiveStream implements setActive, all other stream
types use the base class version which is a no-op.

Change-Id: If339be926e099838d8d574013a72d8ea1c364da6
Reviewed-on: http://review.couchbase.org/64391
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>

show more ...


Revision tags: v4.1.1_ep, v3.1.5_ep, v4.1.1_mc, v3.1.4_ep, v3.1.4_mc, v3.1.5_mc, v3.1.3_ep, v4.1.0_ep, v3.1.2_ep, v4.1.0_mc
# 644c09df 06-Oct-2015 Dave Rigby <daver@couchbase.com>

MB-19253: Fix race in void ExecutorPool::doWorkerStat

As reported by ThreadSanitizer (see below), there is a race between
setting the current task associated with a ExecutorThread and re

MB-19253: Fix race in void ExecutorPool::doWorkerStat

As reported by ThreadSanitizer (see below), there is a race between
setting the current task associated with a ExecutorThread and reading
the name of that thread.

Unfortunately there doesn't seem to be a straightforward way to solve
this without adding a new mutex; currentTask (the variable the race is
on) is a SingleThreadedRCPtr, which is non-trivial to make thread-safe
(i.e. atomic). I did consider changing currenTask (and all other
ExTask variables) to be a std::shared_ptr as in C++11 this has support
for updating atomically, however the support in mainstream compilers
apparently isn't quite there yet.

Therefore I've just added a mutex to guard currentTask.

ThreadSanitizer report:

WARNING: ThreadSanitizer: data race (pid=27332)
Read of size 8 at 0x7d340000c8f0 by main thread (mutexes: write M19366):
#0 ExecutorThread::getTaskableName() const /home/couchbase/couchbase/ep-engine/src/atomic.h:309 (ep.so+0x0000000e6178)
#1 EventuallyPersistentEngine::getStats(void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/couchbase/ep-engine/src/ep_engine.cc:4346 (ep.so+0x0000000bc4dd)
#2 EvpGetStats(engine_interface*, void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/couchbase/ep-engine/src/ep_engine.cc:213 (ep.so+0x0000000ab49e)
#3 mock_get_stats(engine_interface*, void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:239 (engine_testapp+0x0000004c54ad)
#4 test_worker_stats(engine_interface*, engine_interface_v1*) /home/couchbase/couchbase/ep-engine/tests/ep_testsuite.cc:8901 (ep_testsuite.so+0x000000039768)
#5 execute_test(test, char const*, char const*) /home/couchbase/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000004c40b2)
#6 __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226 (libc.so.6+0x00000002176c)

Previous write of size 8 at 0x7d340000c8f0 by thread T5:
#0 ExecutorThread::run() /home/couchbase/couchbase/ep-engine/src/atomic.h:322 (ep.so+0x0000000e9906)
#1 launch_executor_thread(void*) /home/couchbase/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000e9795)
#2 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.00b591814417.18511.i (libplatform.so.0.1.0+0x000000003d91)

Change-Id: Id02f7a98b40b952a415cf9027a8f2243af38fc4d
Reviewed-on: http://review.couchbase.org/62973
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>

show more ...


Revision tags: v3.1.2_mc, v3.1.1_mc, v3.1.1_ep, v4.0.0_ep, v4.0.0_mc, v3.1.0_ep, v3.1.0_mc, v3.1.6_mc, v3.0.2-MP2_mc, v3.0.2_ep, v3.0.2_mc
# c8b050ee 12-Nov-2014 Dave Rigby <daver@couchbase.com>

MB-19248: Fix race in TaskQueue.{ready,future,pending}Queue access

Fix race as identified by ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=4243)
Read of si

MB-19248: Fix race in TaskQueue.{ready,future,pending}Queue access

Fix race as identified by ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=4243)
Read of size 8 at 0x7d04000fde60 by main thread (mutexes: write M1367):
#0 std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> >::operator++() /usr/include/c++/4.8/bits/stl_list.h:235 (ep.so+0x00000013a129)
#1 std::iterator_traits<std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> > >::difference_type std::__distance<std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> > >(std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> >, std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> >, std::input_iterator_tag) /usr/include/c++/4.8/bits/stl_iterator_base_funcs.h:82 (ep.so+0x000000138b67)
#2 std::iterator_traits<std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> > >::difference_type std::distance<std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> > >(std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> >, std::_List_const_iterator<SingleThreadedRCPtr<GlobalTask> >) /usr/include/c++/4.8/bits/stl_iterator_base_funcs.h:118 (ep.so+0x000000136b63)
#3 std::list<SingleThreadedRCPtr<GlobalTask>, std::allocator<SingleThreadedRCPtr<GlobalTask> > >::size() const /usr/include/c++/4.8/bits/stl_list.h:874 (ep.so+0x000000135538)
#4 ExecutorPool::doTaskQStat(EventuallyPersistentEngine*, void const*, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) ep-engine/src/executorpool.cc:654 (ep.so+0x0000001331b6)
#5 EventuallyPersistentEngine::doWorkloadStats(void const*, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) ep-engine/src/ep_engine.cc:4198 (ep.so+0x000000112ed3)
#6 EventuallyPersistentEngine::getStats(void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) ep-engine/src/ep_engine.cc:4454 (ep.so+0x000000114cb4)
#7 EvpGetStats ep-engine/src/ep_engine.cc:217 (ep.so+0x000000102b14)
#8 mock_get_stats memcached/programs/engine_testapp/engine_testapp.c:195 (exe+0x0000000026de)
#9 test_workload_stats ep-engine/tests/ep_testsuite.cc:7094 (ep_testsuite.so+0x00000004e931)
#10 execute_test memcached/programs/engine_testapp/engine_testapp.c:1055 (exe+0x0000000059fc)
#11 main memcached/programs/engine_testapp/engine_testapp.c:1313 (exe+0x000000006606)

Previous write of size 8 at 0x7d04000fde60 by thread T5 (mutexes: write M45):
#0 RCValue::_rc_decref() const ep-engine/src/atomic.h:293 (ep.so+0x000000096797)
#1 __gnu_cxx::new_allocator<std::_List_node<SingleThreadedRCPtr<GlobalTask> > >::allocate(unsigned long, void const*) /usr/include/c++/4.8/ext/new_allocator.h:104 (ep.so+0x00000017b42b)
#2 std::_List_base<SingleThreadedRCPtr<GlobalTask>, std::allocator<SingleThreadedRCPtr<GlobalTask> > >::_M_get_node() /usr/include/c++/4.8/bits/stl_list.h:334 (ep.so+0x00000017b0dc)
#3 std::_List_node<SingleThreadedRCPtr<GlobalTask> >* std::list<SingleThreadedRCPtr<GlobalTask>, std::allocator<SingleThreadedRCPtr<GlobalTask> > >::_M_create_node<SingleThreadedRCPtr<GlobalTask> const&>(SingleThreadedRCPtr<GlobalTask> const&) /usr/include/c++/4.8/bits/stl_list.h:502 (ep.so+0x00000017a25a)
#4 void std::list<SingleThreadedRCPtr<GlobalTask>, std::allocator<SingleThreadedRCPtr<GlobalTask> > >::_M_insert<SingleThreadedRCPtr<GlobalTask> const&>(std::_List_iterator<SingleThreadedRCPtr<GlobalTask> >, SingleThreadedRCPtr<GlobalTask> const&) /usr/include/c++/4.8/bits/stl_list.h:1561 (ep.so+0x000000178e7b)
#5 std::list<SingleThreadedRCPtr<GlobalTask>, std::allocator<SingleThreadedRCPtr<GlobalTask> > >::push_back(SingleThreadedRCPtr<GlobalTask> const&) /usr/include/c++/4.8/bits/stl_list.h:1016 (ep.so+0x00000017817b)
#6 TaskQueue::_fetchNextTask(ExecutorThread&, bool) ep-engine/src/taskqueue.cc:126 (ep.so+0x000000176932)
#7 TaskQueue::fetchNextTask(ExecutorThread&, bool) ep-engine/src/taskqueue.cc:142 (ep.so+0x000000176acf)
#8 ExecutorPool::_nextTask(ExecutorThread&, unsigned char) ep-engine/src/executorpool.cc:214 (ep.so+0x000000130707)
#9 ExecutorPool::nextTask(ExecutorThread&, unsigned char) ep-engine/src/executorpool.cc:229 (ep.so+0x0000001307a5)
#10 ExecutorThread::run() ep-engine/src/executorthread.cc:78 (ep.so+0x000000149da0)
#11 launch_executor_thread ep-engine/src/executorthread.cc:34 (ep.so+0x00000014990a)
#12 platform_thread_wrap platform/src/cb_pthreads.c:19 (libplatform.so.0.1.0+0x000000002d8b)
#13 __tsan_write_range ??:0 (libtsan.so.0+0x00000001b1c9)

Fix by adding new helper methods which will return the size of the
various queues (while holding the TaskQueue's mutex).

Change-Id: If5e8d357e45803d78c4ba6ed1475e6e1a90e1c89
Reviewed-on: http://review.couchbase.org/62969
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>

show more ...


# 23ba840b 05-Oct-2015 abhinavdangeti <abhinav@couchbase.com>

MB-19224: Address possible data race with global task's waketime

WARINING: ThreadSanitizer: data race (pid=7728)

Write of size 8 at 0x7d140000e8a8 by main thread (mutexes: write M11

MB-19224: Address possible data race with global task's waketime

WARINING: ThreadSanitizer: data race (pid=7728)

Write of size 8 at 0x7d140000e8a8 by main thread (mutexes: write M11519, write M11535):
0 TaskQueue::_wake(SingleThreadedRCPtr<GlobalTask>&) /home/abhinav/couchbase/ep-engine/src/taskqueue.cc:272 (ep.so+0x000000142b59)
1 TaskQueue::wake(SingleThreadedRCPtr<GlobalTask>&) /home/abhinav/couchbase/ep-engine/src/taskqueue.cc:299 (ep.so+0x00000014382e)
2 ExecutorPool::_stopTaskGroup(unsigned long, task_type_t, bool) /home/abhinav/couchbase/ep-engine/src/executorpool.cc:568 (ep.so+0x0000000f2f46)
3 ExecutorPool::stopTaskGroup(unsigned long, task_type_t, bool) /home/abhinav/couchbase/ep-engine/src/executorpool.cc:585 (ep.so+0x0000000f31ee)
4 ~EventuallyPersistentStore /home/abhinav/couchbase/ep-engine/src/ep.cc:468 (ep.so+0x0000000830f6)
5 ~EventuallyPersistentEngine /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:6326 (ep.so+0x0000000d42ba)
6 EvpDestroy(engine_interface*, bool) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:141 (ep.so+0x0000000b3fbc)
7 mock_destroy(engine_interface*, bool) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:98 (engine_testapp+0x0000000ba027)
8 destroy_bucket(engine_interface*, engine_interface_v1*, bool) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:995 (engine_testapp+0x0000000b952e)
9 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

Previous write of size 8 at 0x7d140000e8a8 by thread T10:
0 GlobalTask::snooze(double) /home/abhinav/couchbase/ep-engine/src/tasks.cc:56 (ep.so+0x00000013b6fa)
1 ConnManager::run() /home/abhinav/couchbase/ep-engine/src/connmap.cc:151 (ep.so+0x00000005032e)
2 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:112 (ep.so+0x0000000f86da)
3 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f82e5)
4 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d31)

Change-Id: Ib11f9b3cd6919e292f84cc08260eabd8e1381aa6
Reviewed-on: http://review.couchbase.org/62913
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>

show more ...


# 5dbbb7c2 23-Jul-2015 Sundar Sridharan <sundar.sridharan@gmail.com>

MB-19223: Switch to hrtime from timeval in Global Thread Pool

This has small improvements in memory and cpu usage.
Also fixes several ThreadSanitizer races from unit tests - for example:

MB-19223: Switch to hrtime from timeval in Global Thread Pool

This has small improvements in memory and cpu usage.
Also fixes several ThreadSanitizer races from unit tests - for example:

WARNING: ThreadSanitizer: data race (pid=21672)
Write of size 8 at 0x7d140000e7b8 by main thread (mutexes: write M14972, write M14985):
#0 memcpy <null> (engine_testapp+0x000000453040)
#1 TaskQueue::_wake(SingleThreadedRCPtr<GlobalTask>&) /home/daver/repos/couchbase/server/ep-engine/src/taskqueue.cc:255 (ep.so+0x0000002577f6)
#2 TaskQueue::wake(SingleThreadedRCPtr<GlobalTask>&) /home/daver/repos/couchbase/server/ep-engine/src/taskqueue.cc:282 (ep.so+0x000000257c73)
#3 ExecutorPool::_wake(unsigned long) /home/daver/repos/couchbase/server/ep-engine/src/executorpool.cc:320 (ep.so+0x0000001acc76)
#4 ExecutorPool::wake(unsigned long) /home/daver/repos/couchbase/server/ep-engine/src/executorpool.cc:328 (ep.so+0x0000001ace13)
#5 Flusher::wait() /home/daver/repos/couchbase/server/ep-engine/src/flusher.cc:41 (ep.so+0x0000001cc4ff)
#6 EventuallyPersistentStore::stopFlusher() /home/daver/repos/couchbase/server/ep-engine/src/ep.cc:402 (ep.so+0x0000000d54d5)
#7 ~EventuallyPersistentStore /home/daver/repos/couchbase/server/ep-engine/src/ep.cc:364 (ep.so+0x0000000d49cb)
#8 ~EventuallyPersistentEngine /home/daver/repos/couchbase/server/ep-engine/src/ep_engine.cc:5778 (ep.so+0x000000161043)
#9 EvpDestroy(engine_interface*, bool) /home/daver/repos/couchbase/server/ep-engine/src/ep_engine.cc:143 (ep.so+0x000000135efa)
#10 mock_destroy /home/daver/repos/couchbase/server/memcached/programs/engine_testapp/engine_testapp.c:61 (engine_testapp+0x0000004bb9d6)
#11 destroy_engine /home/daver/repos/couchbase/server/memcached/programs/engine_testapp/engine_testapp.c:998 (engine_testapp+0x0000004bb646)
#12 execute_test /home/daver/repos/couchbase/server/memcached/programs/engine_testapp/engine_testapp.c:1048 (engine_testapp+0x0000004baa11)
#13 main /home/daver/repos/couchbase/server/memcached/programs/engine_testapp/engine_testapp.c:1296 (engine_testapp+0x0000004b8861)

Previous read of size 8 at 0x7d140000e7b8 by thread T14:
#0 ExecutorThread::run() /home/daver/repos/couchbase/server/ep-engine/src/executorthread.cc:106 (ep.so+0x0000001e3488)
#1 launch_executor_thread(void*) /home/daver/repos/couchbase/server/ep-engine/src/executorthread.cc:34 (ep.so+0x0000001e2a5a)
#2 platform_thread_wrap /home/daver/repos/couchbase/server/platform/src/cb_pthreads.c:19 (libplatform.so.0.1.0+0x0000000035dc)

Change-Id: I78fdddb832251fc062058c04f75f8d22c4c2f68d
Reviewed-on: http://review.couchbase.org/62912
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>

show more ...


# 8cbe913f 12-Nov-2014 Dave Rigby <daver@couchbase.com>

MB-19222: Fix race condition in TaskQueue shutdown

There is a bug in the use of ExecutorThread.state when sleeping a
TaskQueue - TaskQueue::_doSleep() doesn't atomically transition the

MB-19222: Fix race condition in TaskQueue shutdown

There is a bug in the use of ExecutorThread.state when sleeping a
TaskQueue - TaskQueue::_doSleep() doesn't atomically transition the
state from RUNNING -> SLEEPING. This can cause a deadlock when
shutting down a ExecutorThread:

Thread A: Thread B:
-------------------------------- ------------------------------
if (t.state == RUNNING) { // true
t.state = SHUTDOWN
t.state = SLEEPING cb_join_thread(Thread A)
// wait forever
...
if (t.state == SHUTDOWN) { // FALSE
exit(0) // NEVER REACHED
}

Fix by changing ExecutorThread.state to be an AtomicValue, and use
compare-and-exchange to move from RUNNING -> SLEEPING (and SLEEPING ->
RUNNING).

Change-Id: I9fab90a83978ae2aa6a0dcdd3b079a1c2f369402
Reviewed-on: http://review.couchbase.org/62911
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>

show more ...


# b9334207 30-Nov-2015 Dave Rigby <daver@couchbase.com>

Mutex modernization [2/2]: Use std::condition_variable for SyncObject

Change Mutex to a typedef to std::mutex, and use
std::condition_variable for the implementation of SyncObject.

Mutex modernization [2/2]: Use std::condition_variable for SyncObject

Change Mutex to a typedef to std::mutex, and use
std::condition_variable for the implementation of SyncObject.

Note these two changes are mutually dependent hence being done in the
same patch.

Change-Id: Ife74822e9c4e04efefff446e964a952b672a1f91
Reviewed-on: http://review.couchbase.org/57328
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>

show more ...


# df3730be 30-Nov-2015 Dave Rigby <daver@couchbase.com>

Mutex modernization [1/2]: Replace with std::mutex

Update the API of Mutex to match that of std::mutex, and SyncObject to
match that of std::condition_variable in preparation for replaci

Mutex modernization [1/2]: Replace with std::mutex

Update the API of Mutex to match that of std::mutex, and SyncObject to
match that of std::condition_variable in preparation for replacing
Mutex and SyncObject with the C++11 standard library equivilents.

Change-Id: I5625d980b11144f681f7e717df87c8b5f323dc7c
Reviewed-on: http://review.couchbase.org/57327
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>

show more ...


# 052a2cb2 06-Oct-2015 Dave Rigby <daver@couchbase.com>

Fix race in void ExecutorPool::doWorkerStat

As reported by ThreadSanitizer (see below), there is a race between
setting the current task associated with a ExecutorThread and reading

Fix race in void ExecutorPool::doWorkerStat

As reported by ThreadSanitizer (see below), there is a race between
setting the current task associated with a ExecutorThread and reading
the name of that thread.

Unfortunately there doesn't seem to be a straightforward way to solve
this without adding a new mutex; currentTask (the variable the race is
on) is a SingleThreadedRCPtr, which is non-trivial to make thread-safe
(i.e. atomic). I did consider changing currenTask (and all other
ExTask variables) to be a std::shared_ptr as in C++11 this has support
for updating atomically, however the support in mainstream compilers
apparently isn't quite there yet.

Therefore I've just added a mutex to guard currentTask.

ThreadSanitizer report:

WARNING: ThreadSanitizer: data race (pid=27332)
Read of size 8 at 0x7d340000c8f0 by main thread (mutexes: write M19366):
#0 ExecutorThread::getTaskableName() const /home/couchbase/couchbase/ep-engine/src/atomic.h:309 (ep.so+0x0000000e6178)
#1 EventuallyPersistentEngine::getStats(void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/couchbase/ep-engine/src/ep_engine.cc:4346 (ep.so+0x0000000bc4dd)
#2 EvpGetStats(engine_interface*, void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/couchbase/ep-engine/src/ep_engine.cc:213 (ep.so+0x0000000ab49e)
#3 mock_get_stats(engine_interface*, void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:239 (engine_testapp+0x0000004c54ad)
#4 test_worker_stats(engine_interface*, engine_interface_v1*) /home/couchbase/couchbase/ep-engine/tests/ep_testsuite.cc:8901 (ep_testsuite.so+0x000000039768)
#5 execute_test(test, char const*, char const*) /home/couchbase/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000004c40b2)
#6 __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226 (libc.so.6+0x00000002176c)

Previous write of size 8 at 0x7d340000c8f0 by thread T5:
#0 ExecutorThread::run() /home/couchbase/couchbase/ep-engine/src/atomic.h:322 (ep.so+0x0000000e9906)
#1 launch_executor_thread(void*) /home/couchbase/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000e9795)
#2 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.00b591814417.18511.i (libplatform.so.0.1.0+0x000000003d91)

Change-Id: Id02f7a98b40b952a415cf9027a8f2243af38fc4d
Reviewed-on: http://review.couchbase.org/55829
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>

show more ...


123