Jeremy Zadowny has mention of a very large scale Hadoop deployment over at Yahoo! Search.

It makes sense given their previous commitments and investments to the project but it’s also cool in a way to start seeing some significant migrations to the framework.

Over on the Yahoo! Hadoop blog, you can read about how the webmap team in Yahoo! Search is using the Apache Hadoop distributed computing framework. They’re using over 10,000 CPU cores to build the map and processing a ton of data to do so. They end up using over 5 petabytes of raw disk storage, eventually outputting over 300 terabytes of compressed data that’s used to power every single search.

Another interesting quote from Eric Baldeschwieler (Senior Director, Grid Computing):

This process is not new (see the AltaVista connectivity server). What is new is the use of Hadoop. Hadoop has allowed us to run the identical processing we ran pre-Hadoop on the same cluster in 66% of the time our previous system took. It does that while simplifying administration. Further we believe that as we continue to scale up Hadoop, we will be able to scale up our production jobs as needed to larger cluster sizes.

Pretty impressive.

As part of this announcement, Jeremy has posted an interview he did with a couple of the webmap and grid computing people. The video feed seems quite slow right now so you’ll have to be patient.

Update: The video feed seems much better now. Check it out.


Leave a Comment




  • Pet Peeve: Don’t email my password to me in plain text You know the drill. Signup for some random service on the internet Receive a confirmation email with your account information or Forget a password for some random service ...

  • Eclipise Memory Analyzer (MAT) I must say the Eclipse Memory Analyzer looks pretty slick. There is some pretty good material over on the developers blog. Lastly, there was a talk on it ...

  • Open-source Web-based Code Review Tool: Rietveld Guido van Rossum, of Python fame, has recently released a Django-based application that enables web-based code reviews... Rietveld. It supports any language and currently can hook into Subversion repositories. You ...

  • An implementation of the JVM in Javascript? Caught this over on JavaPosse Google Groups. Essentially, some bright fellows over in Japan have developed a bytecode->javascript compiler. There's a demo floating around that took a Tetris ...

  • Facebook Chat? So it looks like the Facebook Chat service has finally started rolling out to my network (Facebook Chat has been mentioned previously). Not quite sure how ...