• Home
  • History
  • Annotate
  • only in this directory
Name Date Size

..11-Feb-20204 KiB

.gitignoreH A D11-Feb-2020191

accounting/H11-Feb-20204 KiB

algebra/H11-Feb-20204 KiB

build.shH A D11-Feb-2020550

build_tags.shH A D11-Feb-202078

clustering/H11-Feb-20204 KiB

CMakeLists.txtH A D11-Feb-20201.7 KiB

data/sampledb/H11-Feb-20204 KiB

datastore/H11-Feb-20204 KiB

dist/H11-Feb-20204 KiB

docs/H11-Feb-20204 KiB

errors/H11-Feb-20204 KiB

execution/H11-Feb-20204 KiB

expression/H11-Feb-20204 KiB

logging/H11-Feb-20204 KiB

parser/n1ql/H11-Feb-20204 KiB

plan/H11-Feb-20204 KiB

planner/H11-Feb-20204 KiB

README.mdH A D11-Feb-202013.7 KiB

resources/H11-Feb-20204 KiB

server/H11-Feb-20204 KiB

shell/H11-Feb-20204 KiB

sort/H11-Feb-20204 KiB

static/H11-Feb-20204 KiB

test/H11-Feb-20204 KiB

timestamp/H11-Feb-20204 KiB

tutorial/H11-Feb-20204 KiB

util/H11-Feb-20204 KiB

value/H11-Feb-20204 KiB

README.md

1<p align="center">
2 <a href="http://www.couchbase.com">
3  <img src="resources/couchbase_logo.png" width="600" height="150"/>
4 </a>
5 <a href="http://www.couchbase.com/n1ql">
6  <img src="resources/n1ql_logo.png" width="150" height="150"/>
7 </a>
8</p>
9
10<a href="http://www.couchbase.com/nosql-databases/downloads">
11  <img src="resources/download.png" width="150" height="50"/>
12</a>
13
14* Latest: [Query README](https://github.com/couchbase/query/blob/master/README.md)
15* Modified: 2015-04-25
16
17## Introduction
18
19This README describes the source code and implementation of the N1QL
20query engine and components. This source code is targeted for N1QL
21Developer Preview 4, Beta and GA.
22
23## Goals
24
25The goals of this implementation are:
26
27* Language completeness for GA
28
29* GA code base
30
31* Source code aesthetics
32    + Design, object orientation
33    + Data structures, algorithms
34    + Modularity, readability
35
36## Features
37
38This N1QL implementation provides the following features:
39
40* __Read__
41    + __SELECT__
42    + __EXPLAIN__
43
44* __DDL__
45    + __CREATE / DROP INDEX__
46    + __CREATE PRIMARY INDEX__
47
48* __DML__
49    + __UPDATE__
50    + __DELETE__
51    + __INSERT__
52    + __UPSERT__
53    + __MERGE__
54
55    The ACID semantics of the DML statements have not yet been decided
56    or implemented. Nor has the underlying support in Couchbase
57    Server. At this time, only the DML syntax and query engine
58    processing have been provided.
59
60## Deployment architecture
61
62The query engine is a multi-threaded server that runs on a single
63node. When deployed on a cluster, multiple instances are deployed on
64separate nodes. This is only for load-balancing and availability. In
65particular, the query engine does __not__ perform distributed query
66processing, and separate instances do not communicate or interact.
67
68In production, users will have the option of colocating query engines
69on KV and index nodes, or deploying query engines on dedicated query
70nodes. Because the query engine is highly data-parallel, we have a
71goal of achieving good speedup on dedicated query nodes with high
72numbers of cores.
73
74The remainder of this document refers to a single instance of the
75query engine. At this time, load balancing, availability, and liveness
76are external concerns that will be handled later by complementary
77components.
78
79## Processing sequence
80
81* __Parse__: Text to algebra. In future, we could also add JSON to
82  algebra (e.g. if we add something like JSONiq or the Mongo query
83  API).
84
85* __Prepare__: Algebra to plan. This includes index selection.
86
87* __Execute__: Plan to results. When we add prepared statements, this
88  phase can be invoked directly on a prepared statement.
89
90## Packages
91
92### Value
93
94The value package implements JSON and non-JSON values, including
95delayed parsing. This implementation has measured a 2.5x speedup over
96dparval.
97
98Primitive JSON values (boolean, number, string, null) are implemented
99as golang primitives and incur no memory or garbage-collection
100overhead.
101
102This package also provides collation, sorting, and sets
103(de-duplication) over Values.
104
105* __Value__: Base interface.
106
107* __AnnotatedValue__: Can carry attachments and metadata.
108
109* __CorrelatedValue__: Refers and escalates to a parent Value. Used to
110  implement subqueries and name scoping.
111
112* __ParsedValue__: Delayed evaluation of parsed values, including
113  non-JSON values.
114
115* __MissingValue__: Explicit representation of MISSING values. These
116  are useful for internal processing, and can be skipped during final
117  projection of results.
118
119* __BooleanValue__, __NumberValue__, __StringValue__, __NullValue__,
120  __ArrayValue__, __ObjectValue__: JSON values.
121
122### Errors
123
124The errors package provides a dictionary of error codes and
125messages. When fully implemented, the error codes will mirror SQL, and
126the error messages will be localizable.
127
128All user-visible errors and warnings should come from this package.
129
130### Expression
131
132The expression package defines the interfaces for all expressions, and
133provides the implementation of scalar expressions.
134
135This package is usable by both query and indexing (for computed
136indexes).
137
138Expressions are evaluated within a context; this package provides a
139default context that can be used by indexing. The context includes a
140statement-level timestamp.
141
142Expressions also provide support for query planning and processing;
143this includes equivalence testing, constant folding, etc.
144
145The following types of scalar expressions are included:
146
147* arithmetic operators
148* CASE
149* Collection expressions (ANY / EVERY / ARRAY / FIRST)
150* Comparison operators (including IS operators)
151* String concat
152* Constants (including literals)
153* Functions
154* Identifiers
155* Navigation (fields, array indexing, array slicing)
156
157### Algebra
158
159The algebra package defines the full algebra and AST (abstract syntax
160tree) for all N1QL statements (using the expression package for scalar
161expressions).
162
163It includes aggregate functions, subquery expressions, parameter
164expressions, bucket references, and all the N1QL statements and
165clauses.
166
167#### Aggregate functions
168
169* __ARRAY\_AGG(expr)__
170
171* __ARRAY\_AGG(DISTINCT expr)__
172
173* __AVG(expr)__
174
175* __AVG(DISTINCT expr)__
176
177* __COUNT(*)__
178
179* __COUNT(expr)__
180
181* __COUNT(DISTINCT expr)__
182
183* __MAX(expr)__
184
185* __MIN(expr)__
186
187* __SUM(expr)__
188
189* __SUM(DISTINCT expr)__
190
191### Plan
192
193The plan package implements executable representations of
194queries. This includes both SELECTs and DML statements.
195
196When we implement prepared statements, they will be represented as
197plans and stored as JSON documents or in-memory plan objects.
198
199Plans are built from algebras using a visitor pattern. A separate
200planner / optimizer will be implemented for index selection.
201
202Plans include the following operators:
203
204* __Scans__
205
206    * __PrimaryScan__: Scans a primary index.
207
208    * __IndexScan__: Scans a secondary index.
209
210    * __KeyScan__: Does not perform a scan. Directly treats the
211      provided keys as a scan.
212
213    * __ParentScan__: Used for UNNEST. Treats the parent object as the
214      result of a scan.
215
216    * __ValueScan__: Used for the VALUES clause of INSERT and UPSERT
217      statements. Treats the provided values as the result of a scan.
218
219    * __DummyScan__: Used for SELECTs with no FROM clause. Provides a
220      single empty object as the result of a scan.
221
222    * __CountScan__: Used for SELECT COUNT(*) FROM bucket-name. Treats
223      the bucket size as the result of a scan, without actually
224      performing a full scan of the bucket.
225
226    * __IntersectScan__: A container that scans its child scanners and
227      intersects the results. Used for scanning multiple secondary
228      indexes concurrently for a single query.
229
230* __Fetch__
231
232* __Joins__
233
234    * __Join__
235
236    * __Nest__
237
238    * __Unnest__
239
240* __Filter__
241
242* __Group__: To enable data-parallelism, grouping is divided into
243  three phases. The first two phases can each be executed in a
244  data-parallel fashion, and the final phase merges the results.
245
246    * __InitialGroup__: Initial phase.
247
248    * __IntermediateGroup__: Cumulate intermediate results. This phase
249      can be chained.
250
251    * __FinalGroup__: Compute final aggregate results.
252
253* __Other SELECT operators__
254
255    * __Project__
256
257    * __Distinct__
258
259    * __Order__
260
261    * __Offset__
262
263    * __Limit__
264
265    * __Let__
266
267    * __UnionAll__: Combine the results of two queries. For UNION, we
268      perform UNION ALL followed by DISTINCT.
269
270* __Framework operators__
271
272    * __Collect__: Collect results into an array. Used for subqueries.
273
274    * __Discard__: Discard results.
275
276    * __Stream__: Stream results out. Used for returning results.
277
278    * __Parallel__: A container that executes multiple copies of its
279      child operator in parallel. Used for all data-parallelism.
280
281    * __Sequence__: A container that chains its children into a
282      sequence. Used for all execution pipelining.
283
284* __DML operators__
285
286    * __SendDelete__
287
288    * __SendInsert__
289
290    * __Set__: Used for UPDATE.
291
292    * __Unset__: Used for UPDATE.
293
294    * __Clone__: Used for UPDATE. Clones data values so that UPDATEs
295      read original values and mutate a clone.
296
297    * __SendUpdate__
298
299    * __Merge__
300
301### Execution
302
303The execution package implements query execution. The objects in this
304package mirror those in the plan package, except that these are the
305running instances.
306
307Golang channels are used extensively to implement concurrency and
308signaling.
309
310#### Subquery execution
311
312The __Context__ object supports subquery execution. It performs
313planning, execution, and collection of subquery results. It also
314performs plan and result caching for uncorrelated subqueries.
315
316### Datastore
317
318The datastore package defines the interface to the underlying database
319server.
320
321Some key differences from the previous datastore API (previously
322catalog API):
323
324* DML support
325
326* Use of channels for error handling and stop signaling
327
328* Generalized index interface that supports any combination of hash
329  and range indexing
330
331### Parser
332
333This package will contain the parser and lexer.
334
335### Server
336
337This package will contain the main engine executable and listener.
338
339### Clustering
340
341This package defines the interface to the underlying cluster management
342system.
343
344It provides a common abstraction for cluster management, including
345configuration of and the lifecycle of a cluster.
346
347### Accounting
348
349This package will contain the interface to workload tracking and
350monitoring. Accounting data can cover metrics, statistics, event 
351and potentially log data.
352
353It provides a common abstraction for recording accounting data and
354services over accounting data.
355
356### Shell
357
358This package will contain the client command-line shell.
359
360### Sort
361
362This package provides a parallel sort. It was copied from the Golang
363source and basic parallelism was added, but it has not been
364fine-tuned.
365
366### cbq
367
368This package provides a client library that will be used by the
369command-line shell to encapsulate cluster-awareness and other
370connectivity concerns.
371
372The library will implement the standard golang database APIs at
373[database/sql](http://golang.org/pkg/database/sql/) and
374[database/sql/driver](http://golang.org/pkg/database/sql/driver/).
375
376The library will connect using the [Query REST
377API](http://goo.gl/ezpmVx) and the [Query Clustering
378API](http://goo.gl/yKZ6v5).
379
380## Data parallelism
381
382The query engine is designed to be highly data-parallel. By
383data-parallel, we mean that individual stages of the execution
384pipeline are parallelized over their input data. This is in addition
385to the parallelism achieved by giving each stage its own goroutine.
386
387Below, N1QL statement execution pipelines are listed, along with the
388data-parallelization and serialization points.
389
390### SELECT
391
3921. Scan
3931. __Parallelize__
3941. Fetch
3951. Join / Nest / Unnest
3961. Let (Common subexpressions)
3971. Where (Filter)
3981. GroupBy: Initial
3991. GroupBy: Intermediate
4001. __Serialize__
4011. GroupBy: Final
4021. __Parallelize__
4031. Letting (common aggregate subexpressions)
4041. Having (aggregate filtering)
4051. __Serialize__
4061. Order By (Sort)
4071. __Parallelize__
4081. Select (Projection)
4091. __Serialize__
4101. Distinct (De-duplication)
4111. Offset (Skipping)
4121. Limit
413
414### INSERT
415
4161. Scan
4171. __Parallelize__
4181. SendInsert
4191. Returning (Projection)
420
421### DELETE
422
4231. Scan
4241. __Parallelize__
4251. Fetch
4261. Let (Common subexpressions)
4271. Where (Filter)
4281. __Serialize__
4291. Limit
4301. __Parallelize__
4311. SendDelete
4321. Returning (Projection)
433
434### UPDATE
435
4361. Scan
4371. __Parallelize__
4381. Fetch
4391. Let (Common subexpressions)
4401. Where (Filter)
4411. __Serialize__
4421. Limit
4431. __Parallelize__
4441. Clone
4451. Set / Unset
4461. SendUpdate
4471. Returning (Projection)
448
449## Steps to create a build
450
451### Get a working repository
452
453     $ export GOPATH=$HOME/query/
454     $ mkdir -p $GOPATH/src/github.com/couchbase/
455     $ cd ~/query
456     $ mkdir bin pkg
457     $ cd $GOPATH/src/github.com/couchbase/
458
459Clone the git repo into the current working directory, to get the
460source, so as to be able to make a build. This clones it into query:
461
462     $ git clone https://github.com/couchbase/query query
463     $ cd query 
464     $ ./build.sh
465
466By default, this builds the community edition of query. If you want
467the enterprise version (which includes schema inferencing), use:
468
469     $ ./build.sh -tags "enterprise"
470
471All the builds exist in their respective directories. You can find the
472cbq and cbq-engine binaries in the shell and server directories.
473
474### Creating a local build using local json files:
475
476#### Pre-requisites: 
477cbq-engine binary
478cbq binary
479Data sample set zip file(sample set of json documents)
480
481#### Steps to run:
4821.	Create a directory  
483        
484        $ mkdir ~/sample_build/tutorial/data
485
4862.	Copy the binaries cbq and cbq-engine into the ~/sample_build/. directory. 
4873.	Copy the data sample into the ~/sample_build/tutorial/data/. directory
4884.	Unzip the sample using the command 
489
490        $ unzip sampledb.zip
491
4925.	Go back to the directory containing the binaries 
493        
494        $ cd ~/sample_build/
495
4966.	First run the cbq-engine executable using the –datastore “<directory path>”  -namespace <name of subdirectory the data is in. ( here the ampersand can be used to run the process in the background and get the prompt back) :
497
498        $ ./cbq-engine -datastore "$HOME/sample_build/tutorial" -namespace data 
499
5007.	Then run the cbq executable in a new terminal. This should give you the N1QL command line interface shell. 
501        
502        $ ./cbq
503        cbq> select * from tutorial;
504
5058.	TIME TO EXPERIMENT ☺ 
506
507### Using the Admin UI 
5081.	Download the Couchbase server and install it (for the mac add it to the Applications folder)
5092.	Open up localhost:8091 and follow setup instructions
5103.	Create your own buckets and fill in data.
5114.	Connect N1QL with the Couchbase server we need to run the following command in two terminals one after the other. 
512      
513        $ ./cbq-engine –datastore “http://127.0.0.1:8091/514        $ ./cbq
515
5165.	Run the following command on the created buckets before querying them
517     
518        cbq> create primary index on [bucket_name]  
519
5206.	Run N1QL queries on the CLI.
521
522NOTE: Ctrl + D should allow you to exit the running cbq and cbq-engine processes.
523
524