• Home
  • History
  • Annotate
  • only in this directory
Name Date Size

..11-Feb-20204 KiB

.gitignoreH A D11-Feb-2020149

accounting/H11-Feb-20204 KiB

algebra/H11-Feb-20204 KiB

build.shH A D11-Feb-2020331

build_tags.shH A D11-Feb-202078

clustering/H11-Feb-20204 KiB

CMakeLists.txtH A D11-Feb-20201 KiB

data/sampledb/H11-Feb-20204 KiB

datastore/H11-Feb-20204 KiB

dist/H11-Feb-20204 KiB

docs/H11-Feb-20204 KiB

errors/H11-Feb-20204 KiB

execution/H11-Feb-20204 KiB

expression/H11-Feb-20204 KiB

logging/H11-Feb-20204 KiB

parser/n1ql/H11-Feb-20204 KiB

plan/H11-Feb-20204 KiB

planner/H11-Feb-20204 KiB

querylog/H11-Feb-20204 KiB

README.mdH A D11-Feb-202013.2 KiB

server/H11-Feb-20204 KiB

shell/cbq/H11-Feb-20204 KiB

sort/H11-Feb-20204 KiB

static/H11-Feb-20204 KiB

test/H11-Feb-20204 KiB

timestamp/H11-Feb-20204 KiB

tutorial/H11-Feb-20204 KiB

util/H11-Feb-20204 KiB

value/H11-Feb-20204 KiB

README.md

1# Query README
2
3* Latest: [Query README](https://github.com/couchbase/query/blob/master/README.md)
4* Modified: 2015-04-25
5
6## Introduction
7
8This README describes the source code and implementation of the N1QL
9query engine and components. This source code is targeted for N1QL
10Developer Preview 4, Beta and GA.
11
12## Goals
13
14The goals of this implementation are:
15
16* Language completeness for GA
17
18* GA code base
19
20* Source code aesthetics
21    + Design, object orientation
22    + Data structures, algorithms
23    + Modularity, readability
24
25## Features
26
27This N1QL implementation provides the following features:
28
29* __Read__
30    + __SELECT__
31    + __EXPLAIN__
32
33* __DDL__
34    + __CREATE / DROP INDEX__
35    + __CREATE PRIMARY INDEX__
36
37* __DML__
38    + __UPDATE__
39    + __DELETE__
40    + __INSERT__
41    + __UPSERT__
42    + __MERGE__
43
44    The ACID semantics of the DML statements have not yet been decided
45    or implemented. Nor has the underlying support in Couchbase
46    Server. At this time, only the DML syntax and query engine
47    processing have been provided.
48
49## Deployment architecture
50
51The query engine is a multi-threaded server that runs on a single
52node. When deployed on a cluster, multiple instances are deployed on
53separate nodes. This is only for load-balancing and availability. In
54particular, the query engine does __not__ perform distributed query
55processing, and separate instances do not communicate or interact.
56
57In production, users will have the option of colocating query engines
58on KV and index nodes, or deploying query engines on dedicated query
59nodes. Because the query engine is highly data-parallel, we have a
60goal of achieving good speedup on dedicated query nodes with high
61numbers of cores.
62
63The remainder of this document refers to a single instance of the
64query engine. At this time, load balancing, availability, and liveness
65are external concerns that will be handled later by complementary
66components.
67
68## Processing sequence
69
70* __Parse__: Text to algebra. In future, we could also add JSON to
71  algebra (e.g. if we add something like JSONiq or the Mongo query
72  API).
73
74* __Prepare__: Algebra to plan. This includes index selection.
75
76* __Execute__: Plan to results. When we add prepared statements, this
77  phase can be invoked directly on a prepared statement.
78
79## Packages
80
81### Value
82
83The value package implements JSON and non-JSON values, including
84delayed parsing. This implementation has measured a 2.5x speedup over
85dparval.
86
87Primitive JSON values (boolean, number, string, null) are implemented
88as golang primitives and incur no memory or garbage-collection
89overhead.
90
91This package also provides collation, sorting, and sets
92(de-duplication) over Values.
93
94* __Value__: Base interface.
95
96* __AnnotatedValue__: Can carry attachments and metadata.
97
98* __CorrelatedValue__: Refers and escalates to a parent Value. Used to
99  implement subqueries and name scoping.
100
101* __ParsedValue__: Delayed evaluation of parsed values, including
102  non-JSON values.
103
104* __MissingValue__: Explicit representation of MISSING values. These
105  are useful for internal processing, and can be skipped during final
106  projection of results.
107
108* __BooleanValue__, __NumberValue__, __StringValue__, __NullValue__,
109  __ArrayValue__, __ObjectValue__: JSON values.
110
111### Errors
112
113The errors package provides a dictionary of error codes and
114messages. When fully implemented, the error codes will mirror SQL, and
115the error messages will be localizable.
116
117All user-visible errors and warnings should come from this package.
118
119### Expression
120
121The expression package defines the interfaces for all expressions, and
122provides the implementation of scalar expressions.
123
124This package is usable by both query and indexing (for computed
125indexes).
126
127Expressions are evaluated within a context; this package provides a
128default context that can be used by indexing. The context includes a
129statement-level timestamp.
130
131Expressions also provide support for query planning and processing;
132this includes equivalence testing, constant folding, etc.
133
134The following types of scalar expressions are included:
135
136* arithmetic operators
137* CASE
138* Collection expressions (ANY / EVERY / ARRAY / FIRST)
139* Comparison operators (including IS operators)
140* String concat
141* Constants (including literals)
142* Functions
143* Identifiers
144* Navigation (fields, array indexing, array slicing)
145
146### Algebra
147
148The algebra package defines the full algebra and AST (abstract syntax
149tree) for all N1QL statements (using the expression package for scalar
150expressions).
151
152It includes aggregate functions, subquery expressions, parameter
153expressions, bucket references, and all the N1QL statements and
154clauses.
155
156#### Aggregate functions
157
158* __ARRAY\_AGG(expr)__
159
160* __ARRAY\_AGG(DISTINCT expr)__
161
162* __AVG(expr)__
163
164* __AVG(DISTINCT expr)__
165
166* __COUNT(*)__
167
168* __COUNT(expr)__
169
170* __COUNT(DISTINCT expr)__
171
172* __MAX(expr)__
173
174* __MIN(expr)__
175
176* __SUM(expr)__
177
178* __SUM(DISTINCT expr)__
179
180### Plan
181
182The plan package implements executable representations of
183queries. This includes both SELECTs and DML statements.
184
185When we implement prepared statements, they will be represented as
186plans and stored as JSON documents or in-memory plan objects.
187
188Plans are built from algebras using a visitor pattern. A separate
189planner / optimizer will be implemented for index selection.
190
191Plans include the following operators:
192
193* __Scans__
194
195    * __PrimaryScan__: Scans a primary index.
196
197    * __IndexScan__: Scans a secondary index.
198
199    * __KeyScan__: Does not perform a scan. Directly treats the
200      provided keys as a scan.
201
202    * __ParentScan__: Used for UNNEST. Treats the parent object as the
203      result of a scan.
204
205    * __ValueScan__: Used for the VALUES clause of INSERT and UPSERT
206      statements. Treats the provided values as the result of a scan.
207
208    * __DummyScan__: Used for SELECTs with no FROM clause. Provides a
209      single empty object as the result of a scan.
210
211    * __CountScan__: Used for SELECT COUNT(*) FROM bucket-name. Treats
212      the bucket size as the result of a scan, without actually
213      performing a full scan of the bucket.
214
215    * __IntersectScan__: A container that scans its child scanners and
216      intersects the results. Used for scanning multiple secondary
217      indexes concurrently for a single query.
218
219* __Fetch__
220
221* __Joins__
222
223    * __Join__
224
225    * __Nest__
226
227    * __Unnest__
228
229* __Filter__
230
231* __Group__: To enable data-parallelism, grouping is divided into
232  three phases. The first two phases can each be executed in a
233  data-parallel fashion, and the final phase merges the results.
234
235    * __InitialGroup__: Initial phase.
236
237    * __IntermediateGroup__: Cumulate intermediate results. This phase
238      can be chained.
239
240    * __FinalGroup__: Compute final aggregate results.
241
242* __Other SELECT operators__
243
244    * __Project__
245
246    * __Distinct__
247
248    * __Order__
249
250    * __Offset__
251
252    * __Limit__
253
254    * __Let__
255
256    * __UnionAll__: Combine the results of two queries. For UNION, we
257      perform UNION ALL followed by DISTINCT.
258
259* __Framework operators__
260
261    * __Collect__: Collect results into an array. Used for subqueries.
262
263    * __Discard__: Discard results.
264
265    * __Stream__: Stream results out. Used for returning results.
266
267    * __Parallel__: A container that executes multiple copies of its
268      child operator in parallel. Used for all data-parallelism.
269
270    * __Sequence__: A container that chains its children into a
271      sequence. Used for all execution pipelining.
272
273* __DML operators__
274
275    * __SendDelete__
276
277    * __SendInsert__
278
279    * __Set__: Used for UPDATE.
280
281    * __Unset__: Used for UPDATE.
282
283    * __Clone__: Used for UPDATE. Clones data values so that UPDATEs
284      read original values and mutate a clone.
285
286    * __SendUpdate__
287
288    * __Merge__
289
290### Execution
291
292The execution package implements query execution. The objects in this
293package mirror those in the plan package, except that these are the
294running instances.
295
296Golang channels are used extensively to implement concurrency and
297signaling.
298
299#### Subquery execution
300
301The __Context__ object supports subquery execution. It performs
302planning, execution, and collection of subquery results. It also
303performs plan and result caching for uncorrelated subqueries.
304
305### Datastore
306
307The datastore package defines the interface to the underlying database
308server.
309
310Some key differences from the previous datastore API (previously
311catalog API):
312
313* DML support
314
315* Use of channels for error handling and stop signaling
316
317* Generalized index interface that supports any combination of hash
318  and range indexing
319
320### Parser
321
322This package will contain the parser and lexer.
323
324### Server
325
326This package will contain the main engine executable and listener.
327
328### Clustering
329
330This package defines the interface to the underlying cluster management
331system.
332
333It provides a common abstraction for cluster management, including
334configuration of and the lifecycle of a cluster.
335
336### Accounting
337
338This package will contain the interface to workload tracking and
339monitoring. Accounting data can cover metrics, statistics, event 
340and potentially log data.
341
342It provides a common abstraction for recording accounting data and
343services over accounting data.
344
345### Shell
346
347This package will contain the client command-line shell.
348
349### Sort
350
351This package provides a parallel sort. It was copied from the Golang
352source and basic parallelism was added, but it has not been
353fine-tuned.
354
355### Client/go_cbq
356
357This package provides a client library that will be used by the
358command-line shell to encapsulate cluster-awareness and other
359connectivity concerns.
360
361The library will implement the standard golang database APIs at
362[database/sql](http://golang.org/pkg/database/sql/) and
363[database/sql/driver](http://golang.org/pkg/database/sql/driver/).
364
365The library will connect using the [Query REST
366API](http://goo.gl/ezpmVx) and the [Query Clustering
367API](http://goo.gl/yKZ6v5).
368
369## Data parallelism
370
371The query engine is designed to be highly data-parallel. By
372data-parallel, we mean that individual stages of the execution
373pipeline are parallelized over their input data. This is in addition
374to the parallelism achieved by giving each stage its own goroutine.
375
376Below, N1QL statement execution pipelines are listed, along with the
377data-parallelization and serialization points.
378
379### SELECT
380
3811. Scan
3821. __Parallelize__
3831. Fetch
3841. Join / Nest / Unnest
3851. Let (Common subexpressions)
3861. Where (Filter)
3871. GroupBy: Initial
3881. GroupBy: Intermediate
3891. __Serialize__
3901. GroupBy: Final
3911. __Parallelize__
3921. Letting (common aggregate subexpressions)
3931. Having (aggregate filtering)
3941. __Serialize__
3951. Order By (Sort)
3961. __Parallelize__
3971. Select (Projection)
3981. __Serialize__
3991. Distinct (De-duplication)
4001. Offset (Skipping)
4011. Limit
402
403### INSERT
404
4051. Scan
4061. __Parallelize__
4071. SendInsert
4081. Returning (Projection)
409
410### DELETE
411
4121. Scan
4131. __Parallelize__
4141. Fetch
4151. Let (Common subexpressions)
4161. Where (Filter)
4171. __Serialize__
4181. Limit
4191. __Parallelize__
4201. SendDelete
4211. Returning (Projection)
422
423### UPDATE
424
4251. Scan
4261. __Parallelize__
4271. Fetch
4281. Let (Common subexpressions)
4291. Where (Filter)
4301. __Serialize__
4311. Limit
4321. __Parallelize__
4331. Clone
4341. Set / Unset
4351. SendUpdate
4361. Returning (Projection)
437
438## Steps to create a build
439
440### Get a working repository
441
442     $ export GOPATH=$HOME/query/
443     $ mkdir -p $GOPATH/src/github.com/couchbase/
444     $ cd ~/query
445     $ mkdir bin pkg
446     $ cd $GOPATH/src/github.com/couchbase/
447
448Clone the git repo into the current working directory, to get the
449source, so as to be able to make a build. This clones it into query:
450
451     $ git clone https://github.com/couchbase/query query
452     $ cd query 
453     $ ./build.sh
454
455All the builds exist in their respective directories. You can find the
456cbq and cbq-engine binaries in the shell and server directories.
457
458### Creating a local build using local json files:
459
460#### Pre-requisites: 
461cbq-engine binary
462cbq binary
463Data sample set zip file(sample set of json documents)
464
465#### Steps to run:
4661.	Create a directory  
467        
468        $ mkdir ~/sample_build/tutorial/data
469
4702.	Copy the binaries cbq and cbq-engine into the ~/sample_build/. directory. 
4713.	Copy the data sample into the ~/sample_build/tutorial/data/. directory
4724.	Unzip the sample using the command 
473
474        $ unzip sampledb.zip
475
4765.	Go back to the directory containing the binaries 
477        
478        $ cd ~/sample_build/
479
4806.	First run the cbq-engine executable using the –datastore “<directory path>”  -namespace <name of subdirectory the data is in. ( here the ampersand can be used to run the process in the background and get the prompt back) :
481
482        $ ./cbq-engine -datastore "$HOME/sample_build/tutorial" -namespace data 
483
4847.	Then run the cbq executable in a new terminal. This should give you the N1QL command line interface shell. 
485        
486        $ ./cbq
487        cbq> select * from tutorial;
488
4898.	TIME TO EXPERIMENT ☺ 
490
491### Using the Admin UI 
4921.	Download the Couchbase server and install it (for the mac add it to the Applications folder)
4932.	Open up localhost:8091 and follow setup instructions
4943.	Create your own buckets and fill in data.
4954.	Connect N1QL with the Couchbase server we need to run the following command in two terminals one after the other. 
496      
497        $ ./cbq-engine –datastore “http://127.0.0.1:8091/498        $ ./cbq
499
5005.	Run the following command on the created buckets before querying them
501     
502        cbq> create primary index on [bucket_name]  
503
5046.	Run N1QL queries on the CLI.
505
506NOTE: Ctrl + D should allow you to exit the running cbq and cbq-engine processes.
507
508