Sunday, April 15, 2012

Using Hive ...

This weekend, I started working on my pilot and I contemplated map reduce vs hive. Since my background includes extensive SQL use, I opted to give hive a try. I loaded 100 log files from our ETL server into hdfs, loaded them using hive, and began running queries to process the data, looking for the longest operation in our transformation operations. While I'll probably need MR for better text parsing, hive is very cool for high level processing.

Next week, I'll discuss installing hadoop on a linux server for a pilot.