• Home
  • History
  • Annotate
  • only in this directory
Name Date Size

..11-Jan-20224 KiB

alloctrack.goH A D11-Jan-2022631

bintree.goH A D11-Jan-20229.5 KiB

bintree_test.goH A D11-Jan-20223.8 KiB

constants.goH A D11-Jan-20226.5 KiB

expression.goH A D11-Jan-20224.5 KiB

expression_compact.goH A D11-Jan-20221.7 KiB

expression_json.goH A D11-Jan-20227.5 KiB

expression_utils.goH A D11-Jan-20223.3 KiB

expressionstats.goH A D11-Jan-20222.6 KiB

fastlitescape.goH A D11-Jan-20225.3 KiB

fastlitparse.goH A D11-Jan-20221.7 KiB

fastlitparse_bench_test.goH A D11-Jan-20221.7 KiB

fastMatcher.goH A D11-Jan-202221.4 KiB

fastMatcher_bench_test.goH A D11-Jan-20222.1 KiB

fastMatcher_test.goH A D11-Jan-20227.1 KiB

fastMatcherDef.goH A D11-Jan-20224.3 KiB

fastval.goH A D11-Jan-202227.8 KiB

fastval_date.goH A D11-Jan-20222.8 KiB

fastval_math.goH A D11-Jan-202213.6 KiB

filterExprParser.goH A D11-Jan-202240.6 KiB

filterExprParser_test.goH A D11-Jan-202243.1 KiB

filterExprParserPcre_test.goH A D11-Jan-20222 KiB

jsontokenizer.goH A D11-Jan-20229.8 KiB

jsontokenizer_test.goH A D11-Jan-20225.2 KiB

matcher.goH A D11-Jan-2022246

pcre_wrapper.goH A D11-Jan-2022784

pcre_wrapper_disabled.goH A D11-Jan-2022411

pcre_wrapper_interface.goH A D11-Jan-2022136

randdata.goH A D11-Jan-20222 KiB

README.mdH A D11-Jan-20225.8 KiB

simpleParser.goH A D11-Jan-202252.4 KiB

simpleParser_test.goH A D11-Jan-202262.3 KiB

simpleParserPcre_test.goH A D11-Jan-20225.9 KiB

slowmatcher.goH A D11-Jan-20226.2 KiB

testdata/H11-Jan-20224 KiB

testdata_test.goH A D11-Jan-2022561

transform.goH A D11-Jan-202214.9 KiB

utils.goH A D11-Jan-2022806

README.md

1JSONSM is a high-speed JSON matching library.  Used for performing complex
2expression matching against JSON data.
3
4# Goals
5- Enable extremely high speed expression matching against arbitrary JSON.
6
7# What We've Accomplished
8- Precompile expressions to the most efficient execution-time format.
9- Single-Pass complex expression matching
10- Zero allocations at match-time (this is really related to high-speed matching)
11- N1QL-like parser to allow simplified expression entry.
12
13# Match Tree
14The match tree represents the implied schema of a document, as described by
15the set of expressions from the top-level expression being compiled.  For
16instance, in the case of an expression such as 
17`$doc.age > 14 AND $doc.birthday.year != 2018`, it is clear that the document
18should have an `age` field, and also a field called `birthday` which is an
19object itself, containing a `year` field.  This leads to an implied
20structure of something like the following:
21
22- $doc
23  - age
24  - birthday
25    - year
26
27This structure is used to allow us to scan through the JSON, quickly
28determining if there is any specific processing that needs to be performed at
29any particular node within it.
30
31# Match Tree Operations
32Each node within the match tree nodes contain op nodes which represent the
33actual actions that need to be taken when that node is reached inside of the
34JSON being matched.  These operations represent things such as 'less than',
35'greater than', etc...  These operations are assigned a bucket index which
36matches up with a slot within the binary tree array.  This allows for the
37result of an operation being performed to be quickly applied to the binary
38tree for the expression.
39
40# Binary Tree
41The binary tree portion of our compiled expressions represents the binary
42operators of the source expression.  Each entry in the tree contains a
43left and right child node, a link to the entries parent as well as an
44operation type expressing how these children are combined together to
45produce the node state.  In the case of a NOT expression, the right node
46is ignored and the node result is simply the negated value.  Note that
47the binary tree consists of two components, the binary tree itself which
48is tied to a Match Tree, and then a matching execution state object
49which represents the values for a particular instance of a matcher using
50it.
51
52# Variables
53In addition to the normal operations being attached to exec nodes, there
54is an ability to specify a VariableId for a particular exec node.  This
55causes a reference to the byte-slice for the value of that node to be
56stored. This enables later stages of processing to perform computations
57against the stored values, allowing expressions such as
58`$doc.x + $doc.y > 100`.
59
60# "After"'s
61After's provide a method to perform operations against a set of variables
62once we can be sure all the neccessary variables have been parsed.  In
63the case of an expression such as `$doc.a.x + $doc.a.y > 0`, you would
64see variables being used for the two pieces, and then an `After` node
65placed on the `$doc.a` section of our Exec tree telling the Matcher
66to perform the match, since after all of that node has been processed,
67we can be certain both variables will have been populated if they were
68in the document.
69
70# Loops
71JSONSM allows various looping conditions to be executed as well.  For
72instance, the `ANY IN`, `EVERY IN` or `ANY AND EVERY IN` expression
73types.  These kinds of expressions are implemented by scanning through
74the JSON array, performing normal binary tree matching with a special
75'loop root` in the binary tree.  Within the looping, the results written
76to the binary tree only cause result propagation up until the loop root.
77Once it reaches here, depeding on which kind of loop is being performed
78and the specific result from the loop, the looping is either continued
79or cancelled, with a final result for the loop being applied once the
80entire loop has completed.
81
82# JSON Parsing
83JSON parsing is performed in two separate modes.  The initial mode is
84essentially a normal parser.  It looks for the beginning of a JSON
85object and then reads the field names according to the standard JSON
86specification.  Once a field name has been read, the name is matched
87against the Match Tree that was generated to determine if that
88particular field contributes to the overall expression.  If it is
89determined that the field has no impact on the expression, the parser
90switches to a different mode which effectively parses through the
91bytes without performing any storage except the very basic state
92needed for JSON scanning.  Once the state of the scanner returns
93to the object which was initial being read, the next field name is
94read and the logic is continued.
95
96# Binary Tree Optimization
97The binary tree is implemented as a flat array where the children of
98a particular node are guarenteed to come immediately after the parent.
99This ensures that a segment of the tree can be quickly reset after a
100loop iteration by simply zero'ing a contiguous section of the array.
101
102# String Matching Optimization
103In order to improve the performance of string matching, and to avoid
104the need to perform allocations during matching, all constant
105strings used in the original expression are pre-escaped and stored in
106their escaped format.  This allows byte-wise matching of the string
107against the bytes in the JSON without the need to perform additional
108work.  In the case of more complex string comparisons, special rules
109can still be used to perform the comparison on the escaped bytes.
110
111# Matcher State Optimization
112The matcher state size is fully calculated during compilation of the
113expression.  This allows the matcher state object to be preallocated
114once per thread which is executing.  Thanks to the various other
115optimizations being performed, all variable-length objects which need
116to be stored can be kept as a slice of bytes from the source JSON
117rather than needing to allocate any space.
118
119# License
120Copyright 2018 Couchbase, Inc. All rights reserved.
121