NOTE: forqlift is written as a standalone JAR. In most cases, you'll need only Java Runtime Environment (JRE) 1.6 to run it.
NOTE FOR WINDOWS: If you're running forqlift under Windows, please make sure you have installed cygwin (specifically, the
chmod command). forqlift is otherwise a standalone product.
Notes: forqlift 0.9.0 adds the following features:
- inspect file: determine the types of data and number of records in the SequenceFile
- support for external data types: out of the box, forqlift supports base Hadoop
Writabletypes, such as
BytesWritable. You can now add JARs to the forqlift install to support reading external
Writableimplementations, such as those from Hive or Mahout. (NOTE: forqlift will read and extract these data types, but it will not write them.)
- clean up file names on extract: if your record keys are, say, URLs or file paths, forqlift will replace characters unsuitable for filesystems (such as
/) with underscores (
Please review the
EXAMPLES.txt in the distribution for more details.
Warning: This release is considered alpha-quality. Please use it only if you wish to try out experimental features and provide feedback. The previous release appears more stable.
Notes: forqlift 0.8.0 includes an experimental new feature to interact directly with HDFS. This means that, instead of writing a SequenceFile to a local disk and then pushing it out to your Hadoop cluster, you can write that file to (and read from) HDFS without that intermediate step.
If you're using forqlift on a local Hadoop cluster, this will save you some time and disk space. (If you're shuttling data to and from a remote Hadoop cluster, such as something on Amazon's EC2 or Elastic MapReduce, this feature is likely of little interest. Your best bet is to build the SequenceFile locally and upload it as usual.)
How it works
To enable forqlift's HDFS access, pass the
--hadoopconfig flag and point it to a file that defines the
fs.default.name property, typically core-site.xml. For example:
forqlift --hadoopconfig=conf/core-site.xml --file=hdfs:///tmp/foo.seq forqlift --hadoopconfig=conf/core-site.xml hdfs:///tmp/foo.seq
I admit, this is still a little raw. Over time I hope to polish this up and make it easier to use. Before I do that, though, I'd like to confirm that the core functionality of reading/writing directly to HDFS works. Please drop me a line to say whether it works for you:
forqlift-questions at this domain.
Notes: This release includes some mild UI enhancements, as well as some backend tweaks.
Notes: This release includes a significant performance improvement in the
toarchive commands which, respectively, convert a SequenceFile from and to a more common archive format (tar, tar+gz, tar+bz2, zip). If these file conversions were very slow for you in the previous release, please try this one and let me know what you think.
There's also a new
--version flag shows, among other things, the version of Hadoop used to build forqlift.
Notes: This release addresses several small code issues. If you ran into a problem using the previous version of forqlift, please give this one a try and let me know how it works out for you.
There are also several adjustments that will be invisible to the end-user, but will pave the way for future plans. Finally, the
EXAMPLES.txt file is included.
EXAMPLES.txt file didn't make it into this release. Please refer to the forqlift examples page on the website, instead.