Why a new framework?
If you've been following along with this article series, you've been introduced to MongoDB's mapreducecommand, which up until MongoDB 2.1 has been the go-to aggregation tool for MongoDB. (There's also the group() command, but it's really no more than a less-capable and un-shardable version of mapreduce(), so we'll ignore it here.)
So if you already have mapreduce() in your toolbox, why would you ever want something else?
Mapreduce is hard; let's go shopping
The first motivation behind the new framework is that, while mapreduce() is a flexible and powerful abstraction for aggregation, it's really overkill in many situations, as it requires you to re-frame. your problem into a form. that's amenable to calculation using mapreduce().
For instance, when I want to calculate the mean value of a property in a series of documents, trying to break that down into appropriate map, reduce, and finalize steps imposes some extra cognitive overhead that we'd like to avoid. So the new aggregation framework is (IMO) simpler.
The MapReduce algorithm, the basis of MongoDB's mapreduce() command, is a great approach to solving Embarrassingly Parallel problems.
Each invocation of map, reduce, and finalize is completely independent of the others (though the map/reduce/finalize phases are order-dependent), so we shouldbe able to dispatch these jobs to run in parallel without any problems.
So in order to get any parallelism with a MongoDB mapreduce(), you must run it on a sharded cluster, and on a cluster with N shards, you're limited to N-way parallelism.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/301743/viewspace-732370/，如需转载，请注明出处，否则将追究法律责任。