Name Date Size

..11-Feb-20204 KiB

.gitignoreH A D11-Feb-20201.1 KiB

.mailmapH A D11-Feb-2020435

CMakeLists.txtH A D11-Feb-202020.9 KiB

configuration.jsonH A D11-Feb-202022.9 KiB

docs/H11-Feb-20204 KiB

DoxyfileH A D11-Feb-202064 KiB

LICENSEH A D11-Feb-202011.1 KiB

management/H11-Feb-20204 KiB

README.mdH A D11-Feb-20208 KiB

src/H11-Feb-20204 KiB

tests/H11-Feb-20204 KiB

tools/H11-Feb-20204 KiB

wrapper/H11-Feb-20204 KiB

README.md

1# Eventually Persistent Engine
2## Threads
3Code in ep-engine is executing in a multithreaded environment, two classes of
4thread exist.
5
61. memcached's threads, for servicing a client and calling in via the
7[engine API] (https://github.com/couchbase/memcached/blob/master/include/memcached/engine.h)
82. ep-engine's threads, for running tasks such as the document expiry pager
9(see subclasses of `GlobalTasks`).
10
11## Synchronisation Primitives
12
13There are three mutual-exclusion primitives available in ep-engine.
14
151. `Mutex` exclusive lock - [mutex.h](./src/mutex.h)
162. `RWLock` shared, reader/writer lock - [rwlock.h](./src/rwlock.h)
173. `SpinLock` 1-byte exclusive lock - [atomix.h](./src/atomic.h)
18
19A conditional-variable is also available called `SyncObject`
20[syncobject.h](./src/syncobject.h). `SyncObject` glues a `Mutex` and
21conditional-variable together in one object.
22
23These primitives are managed via RAII wrappers - [locks.h](./src/locks.h).
24
251. `LockHolder` - for acquiring a `Mutex` or `SyncObject`.
262. `MultiLockHolder` - for acquiring an array of `Mutex` or `SyncObject`.
273. `WriterLockHolder` - for acquiring write access to a `RWLock`.
284. `ReaderLockHolder` - for acquiring read access to a `RWLock`.
295. `SpinLockHolder` - for acquiring a `SpinLock`.
30
31## Mutex
32The general style is to create a `LockHolder` when you need to acquire a
33`Mutex`, the constructor will acquire and when the `LockHolder` goes out of
34scope, the destructor will release the `Mutex`. For certain use-cases the
35caller can explicitly lock/unlock a `Mutex` via the `LockHolder` class.
36
37```c++
38Mutex mutex;
39void example1() {
40    LockHolder lockHolder(&mutex);
41    ...
42    return;
43}
44
45void example2() {
46    LockHolder lockHolder(&mutex);
47    ...
48    lockHolder.unlock();
49    ...
50    lockHolder.lock();
51    ...
52    return;
53}
54```
55
56A `MultiLockHolder` allows an array of locks to be conveniently acquired and
57released, and similarly to `LockHolder` the caller can choose to manually
58lock/unlock at any time (with all locks locked/unlocked via one call).
59
60```c++
61Mutex mutexes[10];
62Object objects[10];
63void foo() {
64    MultiLockHolder lockHolder(&mutexes, 10);
65    for (int ii = 0; ii < 10; ii++) {
66        objects[ii].doStuff();
67    }
68    return;
69}
70```
71
72## RWLock
73
74`RWLock` allows many readers to acquire it and exclusive access for a writer.
75`ReadLockHolder` acquires the lock for a reader and `WriteLockHolder` acquires
76the lock for a writer. Neither classes enable manual lock/unlock, all
77acquisitions and release are performed via the constructor and destructor.
78
79```c++
80RWLock rwLock;
81Object thing;
82
83void foo1() {
84    ReaderLockHolder rlh(&rwLock);
85    if (thing.getData()) {
86    ...
87    }
88}
89
90void foo2() {
91    WriterLockHolder wlh(&rwLock);
92    thing.setData(...);
93}
94```
95
96## SyncObject
97
98`SyncObject` inherits from `Mutex` and is thus managed via a `LockHolder` or
99`MultiLockHolder`. The `SyncObject` provides the conditional-variable
100synchronisation primitive enabling threads to block and be woken.
101
102The wait/wakeOne/wake method is provided by the `SyncObject`.
103
104Note that `wake` will wake up a single blocking thread, `wakeOne` will wake up
105every thread that is blocking on the `SyncObject`.
106
107```c++
108SyncObject syncObject;
109bool sleeping = false;
110void foo1() {
111    LockHolder lockHolder(&syncObject);
112    sleeping = true;
113    syncObject.wait(); // the mutex is released and the thread put to sleep
114    // when wait returns the mutex is reacquired
115    sleeping = false;
116}
117
118void foo2() {
119    LockHolder lockHolder(&syncObject);
120    if (sleeping) {
121        syncObject.notifyOne();
122    }
123}
124```
125
126## SpinLock
127
128A `SpinLock` uses a single byte for the lock and our own code to spin until the
129lock is acquired. The intention for this lock is for low contention locks.
130
131The RAII pattern is just like for a Mutex.
132
133
134```c++
135SpinLock spinLock;
136void example1() {
137    SpinLockHolder lockHolder(&spinLock);
138    ...
139    return;
140}
141```
142
143## _UNLOCKED convention
144
145ep-engine has a function naming convention that indicates the function should
146be called with a lock acquired.
147
148For example the following `doStuff_UNLOCKED` method indicates that it expect a
149lock to be held before the function is called. What lock should be acquired
150before calling is not defined by the convention.
151
152```c++
153void Object::doStuff_UNLOCKED() {
154}
155
156void Object::run() {
157    LockHolder lockHolder(&mutex);
158    doStuff_UNLOCKED();
159    return;
160}
161```
162## Thread Local Storage (ObjectRegistry).
163
164Threads in ep-engine are servicing buckets and when a thread is dispatched to
165serve a bucket, the pointer to the `EventuallyPersistentEngine` representing
166the bucket is placed into thread local storage, this avoids the need for the
167pointer to be passed along the chain of execution as a formal parameter.
168
169Both threads servicing frontend operations (memcached's threads) and ep-engine's
170own task threads will save the bucket's engine pointer before calling down into
171engine code.
172
173Calling `ObjectRegistry::onSwitchThread(enginePtr)` will save the `enginePtr`
174in thread-local-storage so that subsequent task code can retrieve the pointer
175with `ObjectRegistry::getCurrentEngine()`.
176
177## Tasks
178
179A task is created by creating a sub-class (the `run()` method is the entry point
180of the task) of the `GlobalTask` class and it is scheduled onto one of 4 task
181queue types. Each task should be declared in `src/tasks.defs.h` using the TASK
182macro. Using this macro ensures correct generation of a task-type ID, priority,
183task name and ultimately ensures each task gets its own scheduling statistics.
184
185The recipe is simple.
186
187### Add your task's class name with its priority into `src/tasks.defs.h`
188 * A lower value priority is 'higher'.
189```
190TASK(MyNewTask, 1) // MyNewTask has priority 1.
191```
192
193### Create your class and set its ID using `MY_TASK_ID`.
194
195```
196class MyNewTask : public GlobalTask {
197public:
198    MyNewTask(EventuallyPersistentEngine* e)
199        : GlobalTask(e/*engine/,
200                     MY_TASK_ID(MyNewTask),
201                     0.0/*snooze*/){}
202...
203```
204
205### Define pure-virtual methods in `MyNewTask`
206* run method
207
208The run method is invoked when the task is executed. The method should return
209true if it should be scheduled again. If false is returned, the instance of the
210task is never re-scheduled and will deleted once all references to the instance are
211gone.
212
213```
214bool run() {
215   // Task code here
216   return schedule again?;
217}
218```
219
220* Define the `getDescription` method to aid debugging and statistics.
221```
222std::string getDescription() {
223    return "A brief description of what MyNewTask does";
224}
225```
226
227### Schedule your task to the desired queue.
228```
229ExTask myNewTask = new MyNewTask(&engine);
230myNewTaskId = ExecutorPool::get()->schedule(myNewTask, NONIO_TASK_IDX);
231```
232
233The 4 task queue types are:
234* Readers -  `READER_TASK_IDX`
235 * Tasks that should primarily only read from 'disk'. They generally read from
236the vbucket database files, for example background fetch of a non-resident document.
237* Writers (they are allowed to read too) `WRITER_TASK_IDX`
238 * Tasks that should primarily only write to 'disk'. They generally write to
239the vbucket database files, for example when flushing the write queue.
240* Auxilliary IO `AUXIO_TASK_IDX`
241 * Tasks that read and write 'disk', but not necessarily the vbucket data files.
242* Non IO `NONIO_TASK_IDX`
243 * Tasks that do not perform 'disk' I/O.
244
245### Utilise `snooze`
246
247The snooze value of the task sets when the task should be executed. The initial snooze
248value is set when constructing `GlobalTask`. A value of 0.0 means attempt to execute
249the task as soon as scheduled and 5.0 would be 5 seconds from being scheduled
250(scheduled meaning when `ExecutorPool::get()->schedule(...)` is called).
251
252The `run()` function can also call `snooze(double snoozeAmount)` to set how long
253before the task is rescheduled.
254
255It is **best practice** for most tasks to actually do a sleep forever from their run function:
256
257```
258  snooze(INT_MAX);
259```
260
261Using `INT_MAX` means sleep forever and tasks should always sleep until they have
262real work todo. Tasks **should not periodically poll for work** with a snooze of
263n seconds.
264
265### Utilise `wake()`
266When a task has work todo, some other function should be waking the task using the wake method.
267
268```
269ExecutorPool::get()->wake(myNewTaskId)`
270```
271