README.md
1moss
2----
3
4moss provides a simple, fast, persistable, ordered key-val collection
5implementation as a 100% golang library.
6
7moss stands for "memory-oriented sorted segments".
8
9[](https://travis-ci.org/couchbase/moss) [](https://coveralls.io/github/couchbase/moss?branch=master) [](https://godoc.org/github.com/couchbase/moss) [](https://goreportcard.com/report/github.com/couchbase/moss)
10
11Features
12========
13
14* ordered key-val collection API
15* 100% go implementation
16* key range iterators
17* snapshots provide for isolated reads
18* atomic mutations via a batch API
19* merge operations allow for read-compute-write optimizations
20 for write-heavy use cases (e.g., updating counters)
21* concurrent readers and writers don't block each other
22* child collections allow multiple related collections to be atomically grouped
23* optional, advanced API's to avoid extra memory copying
24* optional lower-level storage implementation, called "mossStore",
25 that uses an append-only design for writes and mmap() for reads,
26 with configurable compaction policy; see: OpenStoreCollection()
27* mossStore supports navigating back through previous commit points in
28 read-only fashion, and supports reverting to previous commit points.
29* optional persistence hooks to allow write-back caching to a
30 lower-level storage implementation that advanced users may wish to
31 provide (e.g., you can hook moss up to leveldb, sqlite, etc)
32* event callbacks allow the monitoring of asynchronous tasks
33* unit tests
34* fuzz tests via go-fuzz & smat (github.com/mschoch/smat);
35 see README-smat.md
36* moss store's diagnostic tool: [mossScope](https://github.com/couchbase/mossScope)
37
38License
39=======
40
41Apache 2.0
42
43Example
44=======
45
46 import github.com/couchbase/moss
47
48 c, err := moss.NewCollection(moss.CollectionOptions{})
49 defer c.Close()
50
51 batch, err := c.NewBatch(0, 0)
52 defer batch.Close()
53
54 batch.Set([]byte("car-0"), []byte("tesla"))
55 batch.Set([]byte("car-1"), []byte("honda"))
56
57 err = c.ExecuteBatch(batch, moss.WriteOptions{})
58
59 ss, err := c.Snapshot()
60 defer ss.Close()
61
62 ropts := moss.ReadOptions{}
63
64 val0, err := ss.Get([]byte("car-0"), ropts) // val0 == []byte("tesla").
65 valX, err := ss.Get([]byte("car-not-there"), ropts) // valX == nil.
66
67 // A Get can also be issued directly against the collection
68 val1, err := c.Get([]byte("car-1"), ropts) // val1 == []byte("honda").
69
70For persistence, you can use...
71
72 store, collection, err := moss.OpenStoreCollection(directoryPath,
73 moss.StoreOptions{}, moss.StorePersistOptions{})
74
75Design
76======
77
78The design is similar to a (much) simplified LSM tree, with a stack of
79sorted, immutable key-val arrays or "segments".
80
81To incorporate the next Batch of key-val mutations, the incoming
82key-val entries are first sorted into an immutable "segment", which is
83then atomically pushed onto the top of the stack of segments.
84
85For readers, a higher segment in the stack will shadow entries of the
86same key from lower segments.
87
88Separately, an asynchronous goroutine (the "merger") will continuously
89merge N sorted segments to keep stack height low.
90
91In the best case, a remaining, single, large sorted segment will be
92efficient in memory usage and efficient for binary search and range
93iteration.
94
95Iterations when the stack height is > 1 are implementing using a N-way
96heap merge.
97
98In this design, the stack of segments is treated as immutable via a
99copy-on-write approach whenever the stack needs to be "modified". So,
100multiple readers and writers won't block each other, and taking a
101Snapshot is also a similarly cheap operation by cloning the stack.
102
103See also the DESIGN.md writeup.
104
105Limitations and considerations
106==============================
107
108NOTE: Keys in a Batch must be unique. That is, myBatch.Set("x",
109"foo"); myBatch.Set("x", "bar") is not supported. Applications that
110do not naturally meet this requirement might maintain their own
111map[key]val data structures to ensure this uniqueness constraint.
112
113Max key length is 2^24 (24 bits used to track key length).
114
115Max val length is 2^28 (28 bits used to track val length).
116
117Metadata overhead for each key-val operation is 16 bytes.
118
119Read performance characterization is roughly O(log N) for key-val
120retrieval.
121
122Write performance characterization is roughly O(M log M), where M is
123the number of mutations in a batch when invoking ExecuteBatch().
124
125Those performance characterizations, however, don't account for
126background, asynchronous processing for the merging of segments and
127data structure maintenance.
128
129A background merger task, for example, that is too slow can eventually
130stall ingest of new batches. (See the CollectionOptions settings that
131limit segment stack height.)
132
133As another example, one slow reader that holds onto a Snapshot or onto
134an Iterator for a long time can hold onto a lot of resources. Worst
135case is the reader's Snapshot or Iterator may delay the reclaimation
136of large, old segments, where incoming mutations have obsoleted the
137immutable segments that the reader is still holding onto.
138
139Error handling
140==============
141
142Please note that the background goroutines of moss may run into
143errors, for example during optional persistence operations. To be
144notified of these cases, your application can provide (highly
145recommended) an optional CollectionOptions.OnError callback func which
146will be invoked by moss.
147
148Logging
149=======
150
151Please see the optional CollectionOptions.Log callback func and the
152CollectionOptions.Debug flag.
153
154Performance
155===========
156
157Please try `go test -bench=.` for some basic performance tests.
158
159Each performance test will emit output that generally looks like...
160
161 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
162 spec: {numItems:1000000 keySize:20 valSize:100 batchSize:100 randomLoad:false noCopyValue:false accesses:[]}
163 open || time: 0 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s
164 load || time: 840 (ms) | 1190476 wop/s | 139508 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 1190476 wop/s | 139508 wkb/s | 0 rop/s | 0 rkb/s
165 drain || time: 609 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 0 rop/s | 0 rkb/s
166 close || time: 0 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 0 rop/s | 0 rkb/s
167 reopen || time: 0 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 0 rop/s | 0 rkb/s
168 iter || time: 81 (ms) | 0 wop/s | 0 wkb/s | 12344456 rop/s | 1446616 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 12344456 rop/s | 1446616 rkb/s
169 close || time: 2 (ms) | 0 wop/s | 0 wkb/s | 0 rop/s | 0 rkb/s || cumulative: 690131 wop/s | 80874 wkb/s | 12344456 rop/s | 1446616 rkb/s
170 total time: 1532 (ms)
171 file size: 135 (MB), amplification: 1.133
172 BenchmarkStore_numItems1M_keySize20_valSize100_batchSize100-8
173
174There are various phases in each test...
175
176* open - opening a brand new moss storage instance
177* load - time to load N sequential keys
178* drain - additional time after load for persistence to complete
179* close - time to close the moss storage instance
180* reopen - time to reopen the moss storage instance (OS/filesystem caches are still warm)
181* iter - time to sequentially iterate through key-val items
182* access - time to perform various access patterns, like random or sequential reads and writes
183
184The file size measurement is after final compaction, with
185amplification as a naive calculation to compare overhead against raw
186key-val size.
187
188Contributing changes
189====================
190
191Please see the CONTRIBUTING.md document.
192