After re-thinking and re-tooling some of the work I've been doing to take advantage of Gearman, I've started to wish for a big file system in the sky. I guess it's no surprise that Google uses GFS with their Map/Reduce jobs and that Hadoop has HDFS as a major piece of its infrastructure.
The Wikipedia page List of file systems has a section on Distributed parallel fault tolerant file systems that appears to be a good list of what's out there. The problem, of course, is that it's little more than a list.
Do you have any experience with one or more of those? Recommendations?
I should say that I'm only interested in something that's Open Source and have a minor bias against big Java things as well as stuff that appear as though it would cease to exist if a single company went out of business.
I'm not too worried about POSIX compliance. The main use would be for writing large files that other machines or processes would then read all or part of. I don't need updates. The ability to append would probably be nice, but that's easy to work around.
More specifically, these three have my eye at the moment:
- CloudStore (was KFS) by Kosmix, a C++ clone of GFS
- MogileFS from Danga, what can I say--I'm a Perl guy
- HDFS the Hadoop file system
It's interesting that some solutions deal with blocks (often large) while others deal with files. I'm not sure I have a preference for either at the moment.
But I'm open to hearing about everything, so speak up! :-)
(comments)