So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler.
My only minor issue is I'm a C# developer and it's in Java.
It's not that I don't understand the Java as much as I'm looking for the Hadoop.net or NHadoop or the .NET project that embraces the Google MapReduce approach. Does anyone know of one?
Have a look on:
http://www.windowsazure.com/en-us/services/hdinsight/
It is an implementation of Hadoop for Azure and you can use .NET for accessing it.
There's a pretty cute MapReduce implementation for .NET at: http://mapsharp.codeplex.com/
I would say that DryadLinq is the closest thing that us .NET folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system then DryadLINQ isn't what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.
That being said, if its the distributed execution aspect of Hadoop that you are looking for than DryadLINQ is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPC cluster setup than getting going with DryadLINQ is really easy.
The code you write is really just straight LINQ code, except instead of executing the LINQ on
IEnumerable<T>
you have to execute it onPartitionedTable<T>
(the self build distributed data structure).What has really been cool about DryadLINQ is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLINQ will take care of the whole distributed execution part. It's the most natural analog I've come across that makes writing code for distributed processing just like writing code for single process processing.
As others have mentioned, DryadLINQ is a programming framework that allows developers to write LINQ queries and execute them on a cluster, in a similar manner to MapReduce. The DryadLINQ project has recently been released under the Apache license on GitHub, and the release includes support for running on YARN clusters (including Azure HDInsight clusters).
Microsoft is in the process of rolling out HDInsight, which is billed as their "100% Apache compatible Hadoop distribution."
It is available both on Windows Server and as a Windows Azure service.
Recently, MySpace released their .NET MapReduce framework, Qizmt, as Open Source, so this is also a potential contender in this space.