class: center, middle # MongoDB --- ### Why MongoDB * Their original rationale: "A hu**mongo**us database" - handling large quantities of data (so-called "web scale") - easy-to-set-up scalability (sharding) * Personal view: - It's a document database, not a relational database - Can get away without having a DBA (database administrator) until later --- ### Relational databases * Store tables with a fixed set of columns for every row **USERS** ID | Name | ---|----- | 1 | Algernon | **USER_IDENTITIES** User | Service | Username -----|---------|--------- 1 | github | algernon123497 1 | twitter | algie_tweets40 --- ### Relational databases (cont'd) **USER_SESSIONS** User | IP | SessionKey | Since -----|---------|--------- | ---- 1 | 127.0.0.1 | 2341345135 | 10349582345 --- ### Document databases * Document databases store *collections* of documents. These are usually close to *JSON* documents ```json { "_id": ObjectId("135423462345"), "name": "Algernon Moncrieff" "identities": [ { "service": "github", "username": "algernon123497" }, { "service": "twitter", "username": "algie_tweets40" } ], "activeSessions" : [ { "ip": "127.0.0.1", "key": ObjectId("2341345135"), "since": Long("10349582345") } ] } ``` --- ### When are document databases useful 1. If your collection is going to contain *heterogeneous* documents Documents in a collection do not have to have the same set of fields 2. If you don't know what columns you need in advance. eg, using the above property to experiment and iterate 3. If your data is going to be "document-shaped" anyway eg, web system with a JSON API Generally, if it's reason (1), you'll possibly stick with MongoDB longer term.Otherwise, you might want to use MongoDB to get something up quickly, and then migrate to a relational database later --- ### IDs * In *relational databases*, typically the database allocates IDs - this means that objects don't always have an ID until they have been saved - IDs are usually sequential - though some people use a client-issued UUID instead --- ### IDs * With MongoDB, the *database driver* can allocate IDs. - IDs are disposable, larger, not sequential - designed so that different machines can issue IDs with very low probability of collision, but key order is mostly correct - 12 bytes containing: `{time} {machine} {process id} {counter}` - Not quite a UUID (not universally unique, just application unique) - `ObjectId("507f1f77bcf86cd799439011")` --- ### Storage * Uses BSON ("Binary JSON") * Some special types - ObjectID - Date - NumberLong --- ### Date * I just tend to use `Long`, as almost every library can convert to and from Unix epoch in a long - Convert at *display time* using JavaScript's Date type * And it matches what BSON stores internally anyway > BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This results in a representable date range of about 290 million years into the past and future. --- ### Connecting to a database * `mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]` * Usual port is `27017` * No username / password on turing. (Please don't tread on each other's data). Use `cosc360_` followed by your username as your database name eg: `mongodb://127.0.0.1:27017/comp391_amoncrieff` * With Java driver, can re-use connection (it's thread-safe) --- ### From mongo shell ``` $ mongo MongoDB shell version: 3.0.2 connecting to: test Server has startup warnings: 2015-09-08T11:13:08.291+1000 I CONTROL [initandlisten] 2015-09-08T11:13:08.291+1000 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000 > use comp391_amoncrieff switched to db comp391_amoncrieff ``` --- ### Creating a collection * Just use it. If it doesn't exist, Mongo will create it. --- ### Saving and retrieving a document ``` > db.chitterUser.save({ "name" : "algernonMoncrieff" }) WriteResult({ "nInserted" : 1 }) > db.chitterUser.find() { "_id" : ObjectId("55ee7ff5d876c542f372dc61"), "name" : "algernonMoncrieff" } > ``` --- ### Querying * Can just use a partial document ``` > db.chitterUser.find( "name" : "algernonMoncrieff" }) ``` --- ### Updating a whole document * db.*collection*.update(*query*, *updatedDoc*, *options*) * options include - multi: update all documents matching the query, or just the first? - upsert: if the document doesn't exist, then create it? --- ### Operators * Query operators are done as special fields, eg ``` db.chitterUser.find( { "name" : { "$nin" : [ "alice", "bob" ] } }) ``` --- ### Operators * So are update operators, eg ``` db.chitterUser.update( { "_id" : ObjectId("55ee7ff5d876c542f372dc61") }, { "$push" : { "identities" : { "service": "github", "username": "algernon_example" } } } ) ``` --- ### Simple subdocument queries * Use dot notation. eg, all users with a github identity: ``` db.chitterUser.find({ "identities.service": "github" }) ``` --- ### Querying sub-documents in arrays * Use `$elemMatch` operator ``` db.chitterUser.find({ "identities": { "$elemMatch" : { "username": "algernon_example", "service": "github" } } }) ```