README.md
1moss
2----
3
4moss provides a simple, fast, persistable, ordered key-val collection
5implementation as a 100% golang library.
6
7moss stands for "memory-oriented sorted segments".
8
9[](https://travis-ci.org/couchbase/moss) [](https://coveralls.io/github/couchbase/moss?branch=master) [](https://godoc.org/github.com/couchbase/moss) [](https://goreportcard.com/report/github.com/couchbase/moss)
10
11Features
12========
13
14* ordered key-val collection API
15* 100% go implementation
16* key range iterators
17* snapshots provide for isolated reads
18* atomic mutations via a batch API
19* merge operations allow for read-compute-write optimizations
20 for write-heavy use cases (e.g., updating counters)
21* concurrent readers and writers don't block each other
22* child collections allow multiple related collections to be atomically grouped
23* optional, advanced API's to avoid extra memory copying
24* optional lower-level storage implementation, called "mossStore",
25 that uses an append-only design for writes and mmap() for reads,
26 with configurable compaction policy; see: OpenStoreCollection()
27* mossStore supports navigating back through previous commit points in
28 read-only fashion, and supports reverting to previous commit points.
29* optional persistence hooks to allow write-back caching to a
30 lower-level storage implementation that advanced users may wish to
31 provide (e.g., you can hook moss up to leveldb, sqlite, etc)
32* event callbacks allow the monitoring of asynchronous tasks
33* unit tests
34* fuzz tests via go-fuzz & smat (github.com/mschoch/smat);
35 see README-smat.md
36* moss store's diagnostic tool: [mossScope](https://github.com/couchbase/mossScope)
37
38License
39=======
40
41Apache 2.0
42
43Example
44=======
45
46 import github.com/couchbase/moss
47
48 c, err := moss.NewCollection(moss.CollectionOptions{})
49 c.Start()
50 defer c.Close()
51
52 batch, err := c.NewBatch(0, 0)
53 defer batch.Close()
54
55 batch.Set([]byte("car-0"), []byte("tesla"))
56 batch.Set([]byte("car-1"), []byte("honda"))
57
58 err = c.ExecuteBatch(batch, moss.WriteOptions{})
59
60 ss, err := c.Snapshot()
61 defer ss.Close()
62
63 ropts := moss.ReadOptions{}
64
65 val0, err := ss.Get([]byte("car-0"), ropts) // val0 == []byte("tesla").
66 valX, err := ss.Get([]byte("car-not-there"), ropts) // valX == nil.
67
68 // A Get can also be issued directly against the collection
69 val1, err := c.Get([]byte("car-1"), ropts) // val1 == []byte("honda").
70
71For persistence, you can use...
72
73 store, collection, err := moss.OpenStoreCollection(directoryPath,
74 moss.StoreOptions{}, moss.StorePersistOptions{})
75
76Design
77======
78
79The design is similar to a (much) simplified LSM tree, with a stack of
80sorted, immutable key-val arrays or "segments".
81
82To incorporate the next Batch of key-val mutations, the incoming
83key-val entries are first sorted into an immutable "segment", which is
84then atomically pushed onto the top of the stack of segments.
85
86For readers, a higher segment in the stack will shadow entries of the
87same key from lower segments.
88
89Separately, an asynchronous goroutine (the "merger") will continuously
90merge N sorted segments to keep stack height low.
91
92In the best case, a remaining, single, large sorted segment will be
93efficient in memory usage and efficient for binary search and range
94iteration.
95
96Iterations when the stack height is > 1 are implementing using a N-way
97heap merge.
98
99In this design, the stack of segments is treated as immutable via a
100copy-on-write approach whenever the stack needs to be "modified". So,
101multiple readers and writers won't block each other, and taking a
102Snapshot is also a similarly cheap operation by cloning the stack.
103
104See also the DESIGN.md writeup.
105
106Limitations and considerations
107==============================
108
109NOTE: Keys in a Batch must be unique. That is, myBatch.Set("x",
110"foo"); myBatch.Set("x", "bar") is not supported. Applications that
111do not naturally meet this requirement might maintain their own
112map[key]val data structures to ensure this uniqueness constraint.
113
114Max key length is 2^24 (24 bits used to track key length).
115
116Max val length is 2^28 (28 bits used to track val length).
117
118Metadata overhead for each key-val operation is 16 bytes.
119
120Read performance characterization is roughly O(log N) for key-val
121retrieval.
122
123Write performance characterization is roughly O(M log M), where M is
124the number of mutations in a batch when invoking ExecuteBatch().
125
126Those performance characterizations, however, don't account for
127background, asynchronous processing for the merging of segments and
128data structure maintenance.
129
130A background merger task, for example, that is too slow can eventually
131stall ingest of new batches. (See the CollectionOptions settings that
132limit segment stack height.)
133
134As another example, one slow reader that holds onto a Snapshot or onto
135an Iterator for a long time can hold onto a lot of resources. Worst
136case is the reader's Snapshot or Iterator may delay the reclaimation
137of large, old segments, where incoming mutations have obsoleted the
138immutable segments that the reader is still holding onto.
139
140Error handling
141==============
142
143Please note that the background goroutines of moss may run into
144errors, for example during optional persistence operations. To be
145notified of these cases, your application can provide (highly
146recommended) an optional CollectionOptions.OnError callback func which
147will be invoked by moss.
148
149Logging
150=======
151
152Please see the optional CollectionOptions.Log callback func and the
153CollectionOptions.Debug flag.
154
155Performance
156===========
157
158Please try `go test -bench=.` for some basic performance tests.
159
160Each performance test will emit output that generally looks like...
161
162 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
163 spec: {numItems:1000000 keySize:20 valSize:100 batchSize:100 randomLoad:false noCopyValue:false accesses:[]}
164 open || time: 0 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s
165 load || time: 840 (ms) | 1190476 wop/s | 139508 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 1190476 wop/s | 139508 wkb/s | 0 rop/s | 0 rkb/s
166 drain || time: 609 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 0 rop/s | 0 rkb/s
167 close || time: 0 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 0 rop/s | 0 rkb/s
168 reopen || time: 0 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 0 rop/s | 0 rkb/s
169 iter || time: 81 (ms) | 0 wop/s | 0 wkb/s | 12344456 rop/s | 1446616 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 12344456 rop/s | 1446616 rkb/s
170 close || time: 2 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 12344456 rop/s | 1446616 rkb/s
171 total time: 1532 (ms)
172 file size: 135 (MB), amplification: 1.133
173 BenchmarkStore_numItems1M_keySize20_valSize100_batchSize100-8
174
175There are various phases in each test...
176
177* open - opening a brand new moss storage instance
178* load - time to load N sequential keys
179* drain - additional time after load for persistence to complete
180* close - time to close the moss storage instance
181* reopen - time to reopen the moss storage instance (OS/filesystem caches are still warm)
182* iter - time to sequentially iterate through key-val items
183* access - time to perform various access patterns, like random or sequential reads and writes
184
185The file size measurement is after final compaction, with
186amplification as a naive calculation to compare overhead against raw
187key-val size.
188
189Contributing changes
190====================
191
192Please see the CONTRIBUTING.md document.
193