• Home
  • History
  • Annotate
  • only in this directory
Name Date Size

..19-Aug-20204 KiB

.gitignoreH A D19-Aug-2020191

accounting/H19-Aug-20204 KiB

algebra/H19-Aug-20204 KiB

audit/H19-Aug-20204 KiB

auth/H19-Aug-20204 KiB

build.shH A D19-Aug-2020550

build_tags.shH A D19-Aug-202078

clustering/H19-Aug-20204 KiB

CMakeLists.txtH A D19-Aug-20202.2 KiB

data/sampledb/H19-Aug-20204 KiB

datastore/H19-Aug-20204 KiB

dist/H19-Aug-20204 KiB

distributed/H19-Aug-20204 KiB

docs/H19-Aug-20204 KiB

errors/H19-Aug-20204 KiB

etc/H19-Aug-20204 KiB

execution/H19-Aug-20204 KiB

expression/H19-Aug-20204 KiB

functions/H19-Aug-20204 KiB

inferencer/H19-Aug-20204 KiB

logging/H19-Aug-20204 KiB

parser/n1ql/H19-Aug-20204 KiB

plan/H19-Aug-20204 KiB

planner/H19-Aug-20204 KiB

plannerbase/H19-Aug-20204 KiB

prepareds/H19-Aug-20204 KiB

README.mdH A D19-Aug-202013.9 KiB

resources/H19-Aug-20204 KiB

scheduler/H19-Aug-20204 KiB

semantics/H19-Aug-20204 KiB

server/H19-Aug-20204 KiB

shell/H19-Aug-20204 KiB

sort/H19-Aug-20204 KiB

static/H19-Aug-20204 KiB

test/H19-Aug-20204 KiB

timestamp/H19-Aug-20204 KiB

tutorial/H19-Aug-20204 KiB

util/H19-Aug-20204 KiB

value/H19-Aug-20204 KiB

README.md

1<p align="center">
2 <a href="http://www.couchbase.com">
3  <img src="resources/couchbase_logo.png" width="600" height="150"/>
4 </a>
5 <a href="http://www.couchbase.com/n1ql">
6  <img src="resources/n1ql_logo.png" width="150" height="150"/>
7 </a>
8</p>
9
10<a href="http://www.couchbase.com/nosql-databases/downloads">
11  <img src="resources/download.png" width="150" height="50"/>
12</a>
13
14* Latest: [Query README](https://github.com/couchbase/query/blob/master/README.md)
15* Modified: 2015-04-25
16
17## Introduction
18
19This README describes the source code and implementation of the N1QL
20query engine and components.
21
22## Goals
23
24The goals of this implementation are:
25
26* Language completeness
27
28* GA code base
29
30* Source code aesthetics
31    + Design, object orientation
32    + Data structures, algorithms
33    + Modularity, readability
34
35## Features
36
37This N1QL implementation provides the following features:
38
39* __Read__
40    + __SELECT__
41    + __EXPLAIN__
42
43* __DDL__
44    + __CREATE / DROP INDEX__
45    + __CREATE PRIMARY INDEX__
46
47* __DML__
48    + __UPDATE__
49    + __DELETE__
50    + __INSERT__
51    + __UPSERT__
52    + __MERGE__
53
54    The ACID semantics of the DML statements have not yet been decided
55    or implemented. Nor has the underlying support in Couchbase
56    Server. At this time, only the DML syntax and query engine
57    processing have been provided.
58
59## Deployment architecture
60
61The query engine is a multi-threaded server that runs on a single
62node. When deployed on a cluster, multiple instances are deployed on
63separate nodes. This is only for load-balancing and availability. In
64particular, the query engine does __not__ perform distributed query
65processing, and separate instances do not communicate or interact.
66
67In production, users will have the option of colocating query engines
68on KV and index nodes, or deploying query engines on dedicated query
69nodes. Because the query engine is highly data-parallel, we have a
70goal of achieving good speedup on dedicated query nodes with high
71numbers of cores.
72
73The remainder of this document refers to a single instance of the
74query engine. At this time, load balancing, availability, and liveness
75are external concerns that will be handled later by complementary
76components.
77
78## Processing sequence
79
80* __Parse__: Text to algebra. In future, we could also add JSON to
81  algebra (e.g. if we add something like JSONiq or the Mongo query
82  API).
83
84* __Prepare__: Algebra to plan. This includes index selection.
85
86* __Execute__: Plan to results. When we add prepared statements, this
87  phase can be invoked directly on a prepared statement.
88
89## Packages
90
91### Value
92
93The value package implements JSON and non-JSON values, including
94delayed parsing. This implementation has measured a 2.5x speedup over
95dparval.
96
97Primitive JSON values (boolean, number, string, null) are implemented
98as golang primitives and incur no memory or garbage-collection
99overhead.
100
101This package also provides collation, sorting, and sets
102(de-duplication) over Values.
103
104* __Value__: Base interface.
105
106* __AnnotatedValue__: Can carry attachments and metadata.
107
108* __CorrelatedValue__: Refers and escalates to a parent Value. Used to
109  implement subqueries and name scoping.
110
111* __ParsedValue__: Delayed evaluation of parsed values, including
112  non-JSON values.
113
114* __MissingValue__: Explicit representation of MISSING values. These
115  are useful for internal processing, and can be skipped during final
116  projection of results.
117
118* __BooleanValue__, __NumberValue__, __StringValue__, __NullValue__,
119  __ArrayValue__, __ObjectValue__: JSON values.
120
121### Errors
122
123The errors package provides a dictionary of error codes and
124messages. When fully implemented, the error codes will mirror SQL, and
125the error messages will be localizable.
126
127All user-visible errors and warnings should come from this package.
128
129### Expression
130
131The expression package defines the interfaces for all expressions, and
132provides the implementation of scalar expressions.
133
134This package is usable by both query and indexing (for computed
135indexes).
136
137Expressions are evaluated within a context; this package provides a
138default context that can be used by indexing. The context includes a
139statement-level timestamp.
140
141Expressions also provide support for query planning and processing;
142this includes equivalence testing, constant folding, etc.
143
144The following types of scalar expressions are included:
145
146* arithmetic operators
147* CASE
148* Collection expressions (ANY / EVERY / ARRAY / FIRST)
149* Comparison operators (including IS operators)
150* String concat
151* Constants (including literals)
152* Functions
153* Identifiers
154* Navigation (fields, array indexing, array slicing)
155
156### Algebra
157
158The algebra package defines the full algebra and AST (abstract syntax
159tree) for all N1QL statements (using the expression package for scalar
160expressions).
161
162It includes aggregate functions, subquery expressions, parameter
163expressions, bucket references, and all the N1QL statements and
164clauses.
165
166#### Aggregate functions
167
168* __ARRAY\_AGG(expr)__
169
170* __ARRAY\_AGG(DISTINCT expr)__
171
172* __AVG(expr)__
173
174* __AVG(DISTINCT expr)__
175
176* __COUNT(*)__
177
178* __COUNT(expr)__
179
180* __COUNT(DISTINCT expr)__
181
182* __MAX(expr)__
183
184* __MIN(expr)__
185
186* __SUM(expr)__
187
188* __SUM(DISTINCT expr)__
189
190### Plan
191
192The plan package implements executable representations of
193queries. This includes both SELECTs and DML statements.
194
195When we implement prepared statements, they will be represented as
196plans and stored as JSON documents or in-memory plan objects.
197
198Plans are built from algebras using a visitor pattern. A separate
199planner / optimizer will be implemented for index selection.
200
201Plans include the following operators:
202
203* __Scans__
204
205    * __PrimaryScan__: Scans a primary index.
206
207    * __IndexScan__: Scans a secondary index.
208
209    * __KeyScan__: Does not perform a scan. Directly treats the
210      provided keys as a scan.
211
212    * __ParentScan__: Used for UNNEST. Treats the parent object as the
213      result of a scan.
214
215    * __ValueScan__: Used for the VALUES clause of INSERT and UPSERT
216      statements. Treats the provided values as the result of a scan.
217
218    * __DummyScan__: Used for SELECTs with no FROM clause. Provides a
219      single empty object as the result of a scan.
220
221    * __CountScan__: Used for SELECT COUNT(*) FROM bucket-name. Treats
222      the bucket size as the result of a scan, without actually
223      performing a full scan of the bucket.
224
225    * __IntersectScan__: A container that scans its child scanners and
226      intersects the results. Used for scanning multiple secondary
227      indexes concurrently for a single query.
228
229* __Fetch__
230
231* __Joins__
232
233    * __Join__
234
235    * __Nest__
236
237    * __Unnest__
238
239* __Filter__
240
241* __Group__: To enable data-parallelism, grouping is divided into
242  three phases. The first two phases can each be executed in a
243  data-parallel fashion, and the final phase merges the results.
244
245    * __InitialGroup__: Initial phase.
246
247    * __IntermediateGroup__: Cumulate intermediate results. This phase
248      can be chained.
249
250    * __FinalGroup__: Compute final aggregate results.
251
252* __Other SELECT operators__
253
254    * __Project__
255
256    * __Distinct__
257
258    * __Order__
259
260    * __Offset__
261
262    * __Limit__
263
264    * __Let__
265
266    * __UnionAll__: Combine the results of two queries. For UNION, we
267      perform UNION ALL followed by DISTINCT.
268
269* __Framework operators__
270
271    * __Collect__: Collect results into an array. Used for subqueries.
272
273    * __Discard__: Discard results.
274
275    * __Stream__: Stream results out. Used for returning results.
276
277    * __Parallel__: A container that executes multiple copies of its
278      child operator in parallel. Used for all data-parallelism.
279
280    * __Sequence__: A container that chains its children into a
281      sequence. Used for all execution pipelining.
282
283* __DML operators__
284
285    * __SendDelete__
286
287    * __SendInsert__
288
289    * __Set__: Used for UPDATE.
290
291    * __Unset__: Used for UPDATE.
292
293    * __Clone__: Used for UPDATE. Clones data values so that UPDATEs
294      read original values and mutate a clone.
295
296    * __SendUpdate__
297
298    * __Merge__
299
300### Execution
301
302The execution package implements query execution. The objects in this
303package mirror those in the plan package, except that these are the
304running instances.
305
306Golang channels are used extensively to implement concurrency and
307signaling.
308
309#### Subquery execution
310
311The __Context__ object supports subquery execution. It performs
312planning, execution, and collection of subquery results. It also
313performs plan and result caching for uncorrelated subqueries.
314
315### Datastore
316
317The datastore package defines the interface to the underlying database
318server.
319
320Some key differences from the previous datastore API (previously
321catalog API):
322
323* DML support
324
325* Use of channels for error handling and stop signaling
326
327* Generalized index interface that supports any combination of hash
328  and range indexing
329
330### Parser
331
332This package will contain the parser and lexer.
333
334### Server
335
336This package will contain the main engine executable and listener.
337
338### Clustering
339
340This package defines the interface to the underlying cluster management
341system.
342
343It provides a common abstraction for cluster management, including
344configuration of and the lifecycle of a cluster.
345
346### Accounting
347
348This package will contain the interface to workload tracking and
349monitoring. Accounting data can cover metrics, statistics, event 
350and potentially log data.
351
352It provides a common abstraction for recording accounting data and
353services over accounting data.
354
355### Shell
356
357This package will contain the client command-line shell.
358
359### Sort
360
361This package provides a parallel sort. It was copied from the Golang
362source and basic parallelism was added, but it has not been
363fine-tuned.
364
365### cbq
366
367This package provides a client library that will be used by the
368command-line shell to encapsulate cluster-awareness and other
369connectivity concerns.
370
371The library will implement the standard golang database APIs at
372[database/sql](http://golang.org/pkg/database/sql/) and
373[database/sql/driver](http://golang.org/pkg/database/sql/driver/).
374
375The library will connect using the [Query REST
376API](http://goo.gl/ezpmVx) and the [Query Clustering
377API](http://goo.gl/yKZ6v5).
378
379## Data parallelism
380
381The query engine is designed to be highly data-parallel. By
382data-parallel, we mean that individual stages of the execution
383pipeline are parallelized over their input data. This is in addition
384to the parallelism achieved by giving each stage its own goroutine.
385
386Below, N1QL statement execution pipelines are listed, along with the
387data-parallelization and serialization points.
388
389### SELECT
390
3911. Scan
3921. __Parallelize__
3931. Fetch
3941. Join / Nest / Unnest
3951. Let (Common subexpressions)
3961. Where (Filter)
3971. GroupBy: Initial
3981. GroupBy: Intermediate
3991. __Serialize__
4001. GroupBy: Final
4011. __Parallelize__
4021. Letting (common aggregate subexpressions)
4031. Having (aggregate filtering)
4041. __Serialize__
4051. Order By (Sort)
4061. __Parallelize__
4071. Select (Projection)
4081. __Serialize__
4091. Distinct (De-duplication)
4101. Offset (Skipping)
4111. Limit
412
413### INSERT
414
4151. Scan
4161. __Parallelize__
4171. SendInsert
4181. Returning (Projection)
419
420### DELETE
421
4221. Scan
4231. __Parallelize__
4241. Fetch
4251. Let (Common subexpressions)
4261. Where (Filter)
4271. __Serialize__
4281. Limit
4291. __Parallelize__
4301. SendDelete
4311. Returning (Projection)
432
433### UPDATE
434
4351. Scan
4361. __Parallelize__
4371. Fetch
4381. Let (Common subexpressions)
4391. Where (Filter)
4401. __Serialize__
4411. Limit
4421. __Parallelize__
4431. Clone
4441. Set / Unset
4451. SendUpdate
4461. Returning (Projection)
447
448## Steps to create a build
449
450### Get a working repository
451
452     $ export GOPATH=$HOME/query/
453     $ mkdir -p $GOPATH/src/github.com/couchbase/
454     $ cd ~/query
455     $ mkdir bin pkg
456
457Install the required goyacc tool and update the PATH to see it:
458
459     $ cd $GOPATH/src/golang.org/x
460     $ git clone https://github.com/golang/tools.git
461     $ cd tools/cmd/goyacc
462     $ go build
463     $ go install
464     $ export PATH=$PATH:$GOPATH/bin/
465
466Clone the query repo and build it:
467
468     $ cd $GOPATH/src/github.com/couchbase/
469     $ git clone https://github.com/couchbase/query query
470     $ cd query 
471     $ ./build.sh
472
473By default, this builds the community edition of query. If you want
474the enterprise version (which includes schema inferencing), use:
475
476     $ ./build.sh -tags "enterprise"
477
478All the builds exist in their respective directories. You can find the
479cbq and cbq-engine binaries in the shell and server directories.
480
481### Creating a local build using local json files:
482
483#### Pre-requisites: 
484cbq-engine binary
485cbq binary
486Data sample set zip file(sample set of json documents)
487
488#### Steps to run:
4891.	Create a directory  
490        
491        $ mkdir ~/sample_build/tutorial/data
492
4932.	Copy the binaries cbq and cbq-engine into the ~/sample_build/. directory. 
4943.	Copy the data sample into the ~/sample_build/tutorial/data/. directory
4954.	Unzip the sample using the command 
496
497        $ unzip sampledb.zip
498
4995.	Go back to the directory containing the binaries 
500        
501        $ cd ~/sample_build/
502
5036.	First run the cbq-engine executable using the –datastore “<directory path>”  -namespace <name of subdirectory the data is in. ( here the ampersand can be used to run the process in the background and get the prompt back) :
504
505        $ ./cbq-engine -datastore "$HOME/sample_build/tutorial" -namespace data 
506
5077.	Then run the cbq executable in a new terminal. This should give you the N1QL command line interface shell. 
508        
509        $ ./cbq
510        cbq> select * from tutorial;
511
5128.	TIME TO EXPERIMENT ☺ 
513
514### Using the Admin UI 
5151.	Download the Couchbase server and install it (for the mac add it to the Applications folder)
5162.	Open up localhost:8091 and follow setup instructions
5173.	Create your own buckets and fill in data.
5184.	Connect N1QL with the Couchbase server we need to run the following command in two terminals one after the other. 
519      
520        $ ./cbq-engine –datastore “http://127.0.0.1:8091/521        $ ./cbq -u=<username> -p=<password> localhost:8091
522
5235.	Run the following command on the created buckets before querying them
524     
525        cbq> create primary index on [bucket_name]  
526
5276.	Run N1QL queries on the CLI.
528
529NOTE: Ctrl + D should allow you to exit the running cbq and cbq-engine processes.
530
531