MongoDB as in huMONGOus, not retarded

MongoDBUnlike the rest of the known world, I won’t write anything on the iPad. Yet. Instead, I’m still in the process of writing the upcoming parts of my Popurls Clone series and as per usual, it takes a lot more time than expected. In combination with a massive amount of bills, life is good! As it turns out, you actually need to “work” in order to get “money”. How, when and why did we agree upon that? That’s just as stupid as, say, doing a fully animated Star Wars prequel/sequel to take place during the Clone Wars. Oh, wait… While pondering life’s great mysteries, such as how you can evade paying your bills, I felt it was time to write something about databases. You see the connection, right? Good, because I don’t. A quick Google search for MySQL rendered roughly 117 million results so I figured that another article about it would be kinda superfluous. A search for MongoDB on the other hand resulted in “only” 1,2 million results. Much better. That means that if and when this article gets indexed by the Google, it will constitute no less than ~ 8.33 × 10-7 % of the relevant search results, in stark contrast to the ~ 5.88 × 10-9 %, should I’ve written some stuff about MySQL. Goodie! In short, this article will cover MongoDB and how you can use it with PHP. Good times ahead so if you’re a killjoy, you’re not welcome.


A brief introduction to MongoDB

First of all, MongoDB is not a relational database management system (RDBMS) such as MySQL. A relational DBMS is based on a relational model, based on first-order predicate logic as proposed by Edgar Frank Codd in 1969. Data and its internal relationships are more often than not stored as tables. These tables are what constitute the relations. The tables are organized using a kind of model of vertical columns and rows where there are a specified number of columns but the number of rows are, in theory, infinite. The data is fetched via queries using SQL. But enough about that – we’re here to talk about MongoDB which is a part of the NoSQL movement. In NoSQL there are basically three kinds of databases; column oriented, key-value pairs and document oriented. MongoDB employs the latter kind. A document in this context is a structure of data with a given number of properties. These properties can be strings, numbers, arrays or objects, etc. If you’ve dabbled with associative arrays or objects in PHP, you’ll know what this means. You can also group documents in collections. Also, there are something called sub-documents which are pretty much what you’d think they’d be.



In the case of MongoDB, we have no schema to play with but the power lies in the fact that we get to play with JSON-like structures. And JSON is good, right? MongoDB is written in C++, it just hit version 1.4.0 and it stores data using BSON, which is a “binary-encoded serialization of JSON-like documents”. Thankfully, there’s a native PHP extension. So the first thing you’d want to do is install it. Download the appropriate package and perhaps read the Quickstart. Assuming you use PECL, which you should, you can install it easily with the following command;

pecl install mongo

or you can do it manually by downloading the source

phpize
./configure --enable-mongo
make install

The install should have created an extension for you and put it in the extension directory. You may need to edit your PHP.ini and add

extension=mongo.so

where appropriate (hint: maybe where the other extensions are defined). If you get an error in the lines of “phpize: comand not found”, this page is for you. If you’re on linux, a quick fix is to install php5-dev with apt-get or php-devel with yum. Restart your webserver. Do it. If you’re interested in MongoDB but loathe PHP (god knows you have your reasons), there are drivers for other languages, such as C# and .NET (why are you reading this?), Clojure (you pervert), ColdFusion (oh no, you dideeent), Python (you’re all right) or Ruby (pff).


I feel the vibe

Let’s do this. Connect to your brand spanking new Mongo database at localhost (default port is 27017)

$the_mongo_connection = new Mongo();

Or if you’d like to connect to a remote host

$the_mongo_connection = new Mongo("10.0.0.2[:optional_port]");

Pro tip: use a persistent connection. New connections to the database can be very slow. Let this quote borrowed from php.net serve as a cautionary, um, well, example;

for ($i=0; $i<1000; $i++) {
    $m = new Mongo();
}

This will take roughly 18 seconds to execute. Yes, 18. That’s like an eternity on the Internets. If we use a persistent connection like so;

for ($i=0; $i<1000; $i++) {
    $m = new Mongo("localhost:27017", array("persist" => "x"));
}

…we’re down to 0.02 seconds. So, yeah, persistent may be the way to go. And by “may” I mean “do it, ass hat”. If you’re wondering about the “x” in the last piece of code, it’s because persistent connections need a unique identifier string.

The name of the database can use almost any character in the ASCII range, except a blank space, a period or just an empty string. You can actually name a database “null” if you’d like. Might be fun just to piss people off. Selecting a database can be done in two ways and isn’t harder than

$db = $the_mongo_connection->name_of_database;
$db = $the_mongo_connection->selectDB('name_of_database');

The latter is to be preferred if you’re using wonky characters in your database name, such as a comma, a slash or something like that. Valid, but it’ll break your code into millions and millions of pieces if you don’t use the quoted selectDB(). Now we need to select collection to work with. You might say that this is similar to selecting a table in a RDBMS. And you’d be correct. Gold star for you.

$sweet_collection = $db->name_of_collection;
$sweet_collection = $db->selectCollection('name_of_collection');

Sweet. Let’s do some tangible, hands-on exercises, mm’kay? Mm’kay.


The insert [note to self: penis joke here]

For this part, I’ll go with something stupid, like pretend that we’re creating a database of DVD’s. Cheezy, but it’ll do. First let’s set up some dummy data.

$dvd = array(
    'title' => 'A Nightmare On Elm Street',
    'slug' => 'anoes',
    'year' => '1983',
    'format' => 'Blu-Ray',
    'screen' => '16:10',
    'media' => array(
        'audio' => array(
            'English - DTS 5.2 HD',
            'Master Audio',
            'English - DD 1.0',
            'French - DD 1.0'
        ),
        'subtitles' => array(
            'english',
            'french',
            'spanish',
            'spanglish'
        ),
        'extras' => array(
            'Commentary Tracks',
            'Featurettes',
            'Alternate Endings',
            'Trivia Track'
        )
    ),
    'starring' => array(
        'Robert Englund',
        'John Saxon',
        'Heather Langenkamp',
        'Johnny Jepp'
    )
);


Yes, Nightmare on Elm Street. Johhny Depp’s debut movie. As you can see we have a pretty basic multidimensional associative array. The actual code to insert the data I want, you say with a Yoda type of grammar. Sure.

$sweet_collection->insert($dvd);

Boom. There is an optional second argument that can be used if you want to check if the insert succeeded or not. If set to true, it will return an array with the status of the insert. That is, the the array will be returned with the _id assigned by Mongo. Think about the lack of need to call insert_id() or similar. Sweet, huh? If the insert was not successful, it will return a boolean which represents if the array provided was not empty. Empty arrays will not be inserted. Let’s try that out. Assuming we just executed the insert above, that insert gets a unique id (the key is “_id” – try to echo out your last insert id with echo $dvd['_id'];) and then fire this little baby;

try {
    $sweet_collection->insert($dvd, true);
}
catch(MongoCursorException $e) {
    echo "Why you wanna play me like dat? That movie ID already exists in the DB!";
}

Pretty nice, huh? One more thing about the id – MongoDB creates a unique id if non is supplied manually. The MongoId is an object and not a string. It’s the MongoDB equivalent to an autoincrement in RDBMS. Each id is 12 bytes – 4+3+2+3. The first four bytes are a timestamp. The following three are a hash of the client machine’s hostname, the next two are “the lease significant bytes of the process id running the script” and the last three bytes are an incrementing value. Food for thought.


The update

As you may or may not have noticed, there are some minor mistakes in the data above. Oops. Guess we’ll have to update. Funny how these things work out! There are a couple of handy dandy modifier operations to make our Mongo life a little bit easier. The syntax is as follows;

update( [criteria], [objNew], [upsert], [multi] );
  • criteria – query which selects the record to update
  • objNew – updated object or $ operators (e.g., $inc) which manipulate the object
  • upsert – if this should be an “upsert”; that is, if the record does not exist, insert it
  • multi – if all documents matching criteria should be updated (the default is to only update the first document found)

Our criteria would, in this example, be a value we can search for. Let’s search for the slug, “anoes”. We need to update a couple of fields here; the year should be 1984, screen is 16:9, not 16:10, the english audio is 5.1 surround, not 5.2, obviously, and there is no ‘spanglish’ subtitle. Before we do the actual update, we should discuss the modifier operations that are available. Using these modifiers, we’re able to do atomic updates! Yes, that last sentence deserved an exclamation mark. Shut up.

  • $inc – increments the value by the factor you provide. If the value is not present, the key is set to that value.
  • $set – sets the value for a specified key.
  • $unset – deletes given key and value.
  • $push – append the specified value to the key if an array is present. If not, an array is set to the key. If  the key is present but not an array, an error condition is raised.
  • $pushAll – append each value in the value array to the key. Works similar to $push.
  • $addToSet – adds value to the array only if it’s not in the array already.
  • $pop – removes the first or last element of the array. Use “1″ to remove the last, and “-1″ to remove the  first.
  • $pull – removes all occurrences of the specified value from the key if the key is an array. If the key’s there but is not an array, an error is raised.
  • $pullAll – removes all occurrences of each value in the array given from the key. If the key is there but isn’t an array – you guessed it – an error is raised.

Still here? no?, I’ll wait some more then.

Okay, so how do we update our movie info? Like so;

$criteria = array(
    'slug' => 'anoes'
);

$objNew = array(
    '$inc' => array(
        'year' => 1
    ),
    '$set' => array(
        'screen' => '16:9',
        'media.audio.0' => 'English - DTS 5.2 HD'
    );
    '$unset' => array(
        'media.subtitles.3' => 1
    );
);

$sweet_collection->update(
    $criteria,
    $objNew
);

The last two arguments are not set since we’re not updating multiple records, nor do we want it to upsert. As you may have noticed, you can target a certain index in the array – see media.audio.0 and media.subtitles.3. Pretty nice. But what if you don’t know the index? Then you’re screwed. Go home.



No, not really. There is something called the positional operator and it’s represented by the $ sign. Use it to find an array entry and manipulate it. For example, let’s update the audio this way instead.

$sweet_collection->update(
    array(
        'media.audio.$' => 'English - DTS 5.2 HD'
    ),
    array(
        '$set' => 'English - DTS 5.1 HD'
    )
);

Note that the $ only applies to the first matched item in the query. This may be changed in the future, but as of now, no.


Fetch, Bobo, fetch! Good boy.

Okay, then. How do we actually fetch our data? In many ways, as it turns out. Queries in MongoDB support both conditional operators (less than, greater than, etc, no equals, in (much like SQL’s “in”), mod, and much more), regular expressions and has a terminology, or, rather, methods, that work similar to SQL syntax. Here’s a quick reference – the MongoDB method to the left and its MySQL equivalent to the right.

  • sort() - ORDER BY
  • limit() – LIMIT (chocking, I know)
  • group() – GROUP BY

Sorry, I thought that list would be longer. You also have the following to toy around with;

  • skip() – allows you to specify at which object the database should begin returning results. Cool. Think paging results.
  • count() – returns the number of objects. It counts at server level instead of client level.
  • snapshot() – this makes sure that no duplicates are returned or objects were missed which were present at both the start and at the end of the query’s execution – even if the objects were updated. Short query responses – less than 1 MB  are aways snapshotted.


Having said that, now’s the time to get to know find() and findOne(). These functions lets you query a specified collection. Basically, you tell it which collection to query and which fields to return. Say we have a number of entries in our DVD database and we want to find titles released from 1980 up to 1990, we could write something like;

$years = array('x' => array( '$gte' => 1980, '$lt' => 1990 ));
$best_movies_evar = $sweet_collection->find( $years );

Conditional operators FTW. findOne() works the same way but it returns a single element instead. It returns the record or null if it fails. Let’s say you need to paginate your results as the last query return a massive amount of titles. And right now, you need ten results, starting from title number 30. No biggie.

$num_movies = $best_movies_evar->total();
$best_movies_evar->limit(10)->skip(30);
foreach ( $best_movies_evar $sweet_movie ) {
    var_dump( $sweet_movie );
}

Note that queries are actually not executed until the result is requested. That means you can refine your results before actually fetching them, like we did above. Got it? Nice. Niiccceeeeee.


Aww, that’s nice. C-click. Deleted!

Sooner or later, you’ll want to delete a record or two hundred. Easy enough.

$field_to_remove = array(
    'year' => 1984
);
$sweet_collection->remove( $field_to_remove );

This would remove any document with the year 1983. If you want to remove the first document found, pass true as the second parameter and you’re good to go, sparky.


Now you can create an index, too!

This should be filed under optimisation, bot whatever. You can create indexes in MongoDB which is just gravy. Let’s create an index based on year of release in our sweet sweet DVD collection database. It’s as simple as calling ensureIndex(). Pass an array with the keys you want o have as an index and either “1″ or “-1″ 1 for ascending and -1 descending.


$sweet_collection->ensureIndex(
    array(
        'year' => 1
    )
);

… and that’s it. There are however some options you may pass in an associative array in the second parameter.

  • unique – creates a unique index.
  • dropDups –  If a duplicate value exists while creating an index, drop all but one value.
  • background – create indexes in the background while other operations are taking place. It’s synchronous by default, so set this to true to make it asynchronous.
  • safe – check if thecreation of an index succeeded. If it failed, a MongoCursosException is thrown
  • name – specify an index name, should Mongo complains about the index name being too long while you’re indexing many keys ot once.

So let’s say we want to create two indexes instead of just one, make it unique, have it work in th ebackground and see if it worked.

try {
    $sweet_collection->ensureIndex(
        array(
            'slug' => 1,
            'title => -1
        ),
        array(
            'unique' => 1,
            'background => 1,
            'safe' => 1
        )
    );
}
catch(MongoCursorException $e) {
    echo "Oh, hells noes!";
}

I think you get the general idea here. You can’t create unique indexes on non-unique values, of course. Shoudl we have two or more identical titles or slugs, the unique index will fail, whch actually is quite probable in this example. Fun, huh? That’ll teach you to copy paste code without checking what it does.

I think that’ll suffice for now. As a closing hint, I’d like to point you to a sweet package on phpclasses.org by Cesar D. Rodas. It’s called MongoFS and allows you to store and get data in MongoDB GridFS like files. See, MongoDB stores large objects as chunks of data as well as its metadata. It stores it in chunks are BSON objects in MongoDB are limited to  4 MB in size. The chunk approach allows you to fetch a range of data, which could come in handy. Of course, there are articles covering GridFS already, so I’ll just point you to them instead (by “them” I mean “this specific article” – deal with it). Definitely worth checking out.

No related posts.

  • http://abcphp.com/story/5044/ abcphp.com

    MongoDB as in huMONGOus, not retarded | profeshunl newbie…

    Unlike the rest of the known world, I won’t write anything on the iPad. Yet. Instead, I’m still in the process of writing the upcoming parts of my Popurls Clone series and as per usual, it takes a lot more time than expected. In combination with a mass…

  • http://www.developercast.com/2010/04/14/profeshunl-newbie-blog-mongodb-as-in-humongous-not-retarded/ Profeshunl Newbie Blog: MongoDB as in huMONGOus, not retarded | Development Blog With Code Updates : Developercast.com

    [...] a database technology that’s growing more and more in popularity all the time – MongoDB – and how you can use it in PHP as a data repository. MongoDB is one of a few popular NoSQL databases out there. In NoSQL there [...]

  • Lucian Lature

    One of the best tutorials on MongoDB around…Very good job, thank you for it.
    One more thing: can you show us a basic example of SQL join equivalent in MongoDB?

  • http://weblogs.asp.net/yuanjian/archive/2010/04/24/cheatsheet-2010-04-19-04-25.aspx gOODiDEA.NET

    Cheatsheet: 2010 04.19 ~ 04.25…

    Web The Best of Steve: Performance at JSConf Seven JavaScript Things I Wish I Knew Much Earlier In My…

  • http://www.hirbour.org John Hirbour

    my understanding of MongoDB is that there are not JOINs. it's a document database. So if you have widgets and widgets types… you'd just add widget types into the widget document. It can enforce types of widgets etc… via indexes, but there are no JOINs in the traditional SQL way AFAIK.

  • Tobias

    who would ever do pconnections on a mongodb socket?
    that makes no sense buddy… and of course it takes only a couple ms because its the same single connection. there is no difference between 1000 times a persistent connection or a single non persistent connection within the same script. – it will open it only once, thats why its call persistent.

    fyi: $m = new Mongo(); will connect to the default port and default host (localhost:27017)

  • Kelvin Jones

    Fucking offensive title. What are you? 12?

  • http://ehcache.net/nosql/mongodb-as-in-humongous-not-retarded ehcache.net

    MongoDB as in huMONGOus, not retarded…

    Unlike the rest of the known world, I won’t write anything on the iPad. Yet. Instead, I’m still in the process of writing the upcoming parts of my Popurls Clone series and as per usual, it takes a lot more time than expected. In combination with a mass…

  • Edward Whittle

    Love the site. Try mine for free dickens69.com and pls dont mark this as akismet (sic)..

  • Edward Whittle

    No, swedish.

  • http://www.burclar.org Burçlar

    Is there any method to getting last inserted id ?

  • Shirani

    Can somebody talk about too deep nested documents, and when total records is too huge, can MongoDB still take care of it? e.g. building a social network structure on MongoDB. User has multiple posts and each post has likes and comments, and each comment again has likes.

  • Michael Peters

    Great intro to MongoDB and PHP. Wish I had discovered it a few weeks ago!
    Thanks , Mike

  • Shirani

    var last_id = db.coll.insert(…………….);

Page 1 of 11
  • What is this?

    My name is William and I'm a 30 year old developer/designer from Stockholm, Sweden. I have a love/hate relationship with PHP, I'm slightly aroused by jQuery and if I had the Adobe Flash IDE as a friend on Facebook, I'd label it as "it's complicated". This is my twelfth year as a freelance monkey. I prefer the term mercenary, but someone said it had a negative ring to it. Whatever. Oh, and I'm a Mac guy who loves his BacBook Pro in a somewhat unhealthy way.


    The font used for headings is Geometry Soft Pro as found on dafont.com.