Over the past five years, the authors and many others at Google have implemented hundreds of special-purpose computations that process large amounts of raw data, such as crawled documents, web request logs, etc., to compute various kinds of derived data, such as inverted indices, various representations of the graph structure of web documents, summaries of the number of pages crawled per host, the set of most frequent queries in a given day, etc. Most such computations are conceptually straightforward. However, the input data is usually large and the computations have to be distributed across hundreds or thousands of machines in order to finish in a reasonable amount of time. The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues. And as a reaction to this issue an abstracting mechanism that allows parallelism, hide fault tolerance and work on distributed environment made google to come up with the MapReduce as a solution. In 2004, Google published the paper that introduced MapReduce to the world.12 Early in 2005, the Nutch developers had a working MapReduce implementation in Nutch, and by the middle of that year, all the major Nutch algorithms had been ported to run using MapReduce and NDFS. And till date progress on working and enhancing the technique is unstoppable.

MapReduce is a programming model for data processing. Hadoop runs MapReduce programs written in various languages like Java, Ruby, Python, and C++. Most important, MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal. MapReduce algorithm has been used for applications such as generating search indexes, document clustering, access log analysis, and different other kinds of data analysis as discussed former. A MapReduce job is an access and process-streaming job that splits the input dataset into independent chunks (blocks) and stores them in Hadoop Distributed File System (HDFS). It has two main tasks Map and Reduces, which are completely in a parallel manner. The input phase gives input to the mapper in the <key, value> pairs and then mapper maps these inputs which are partitioned using user-defined partitioning function. The shuffling and sorting of data is made and send to the reducer where reducer then reduces it to optimized form and generates the output.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

The MapReduce programming model has been successfully used at Google for many different purposes. The success of this method can be attributed for many reasons. First, the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault-tolerance, locality optimization, and load balancing. Second, a large variety of problems are easily expressible as MapReduce computations. For example, MapReduce is used for the generation of data for Google’s production web search service, for sorting, for data mining, for machine learning, and many other systems. Third, implementation of MapReduce has been developed that scales to large clusters of machines comprising thousands of machines. The implementation makes efficient use of these machine resources and therefore is suitable for use on many of the large computational problems encountered at Google.


Join the discussion and tell us your opinion.

September 20, 2021 at 12:42 am

Wow, awesome blog structure! How lengthy have you been running a blog for? you made running a blog look easy. The overall glance of your website is excellent, as neatly as the content!!

grow your linkedinreply
September 26, 2021 at 10:52 pm

Wow, incredible weblog structure! How lengthy have you ever been running a blog for? you made blogging glance easy. The entire look of your site is fantastic, let alone the content!!

October 11, 2021 at 10:59 pm

Hello, i think that i saw you visited my web site so i came to return the favor.I’m attempting to find things to improve my website!I suppose its ok to use some of your ideas!!

October 12, 2021 at 3:12 am

I feel this is one of the so much important info for me. And i’m happy studying your article. However should commentary on few basic issues, The website style is ideal, the articles is really great : D. Excellent process, cheers

Own activatereply
October 12, 2021 at 12:45 pm

My brother suggested I might like this blog. He used to be totally right. This publish truly made my day. You cann’t imagine just how a lot time I had spent for this information! Thanks! activatereply
October 12, 2021 at 6:21 pm

Hello there! Quick question that’s entirely off topic. Do you know how to make your site mobile friendly? My site looks weird when viewing from my apple iphone. I’m trying to find a template or plugin that might be able to resolve this problem. If you have any suggestions, please share. With thanks!

aa listsreply
October 13, 2021 at 8:32 pm

Wow, marvelous blog format! How long have you been running a blog for? you made running a blog glance easy. The total glance of your web site is magnificent, as neatly as the content!!

www activate nbcnews comreply
October 13, 2021 at 9:36 pm

You bear a fantastic talent. Your composing skills are superior. Kudos for sending web content on-line and training your users.

Toloco handheld deep tissue muscle massagerreply
October 14, 2021 at 3:48 am

Amazing blog! Do you have any hints for aspiring writers? I’m planning to start my own blog soon but I’m a little lost on everything. Would you recommend starting with a free platform like WordPress or go for a paid option? There are so many choices out there that I’m completely overwhelmed .. Any recommendations? Bless you!

October 14, 2021 at 4:32 pm

fantastic post, very informative. I wonder why the other specialists of this sector don’t notice this. You should continue your writing. I’m confident, you have a great readers’ base already!

Leave a reply