Sunday, May 20, 2012

KeyValueTextInputFormat

After a bit of a lay off, I found some data to try a in a MR job.  My training used TextInputFormat, so I wanted to try a key/value pair for this dataset.  I copied my existing driver, map, and reduce programs and made changes to the driver and map code, then compiled and ran.  BOOM!  First, I didn't have the new input class declared, so I added that:

import org.apache.hadoop.mapred.KeyValueTextInputFormat;

I'm using a pipe ("|") for a separator, so I have this code:

conf.setInputFormat(KeyValueTextInputFormat.class);

conf.set("key.value.separator.in.input.line", "|");


Next, I fought through various type mismatch errors in the mapper.  Eventually, I came up with this code in the driver:
 
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

And this code in the mapper:
 
MyMapper extends MapReduceBase implements Mapper'<'Text, Text, Text, IntWritable'>'
public void map(Text key, Text value,OutputCollector'<'Text, IntWritable'>'output, Reporter reporter)

My generic mapper used a LongWritable for the key.  It appears that the key in this input format is a Text, which threw me.  The errors were a bit cryptic the first time, but after a few iterations, I caught on to the issue.  I loaded a small test file looking like:

1|this is the text
2|this is more text

The output looked like I wanted, words and counts, cool.

Env: 020.2 pseudo-distributed mode, CentOS on VM

Saturday, May 19, 2012

Ouch...

Where we're at:

DJIA  12,369.38
S&P 500  1,295.22
10-Yr T-Note  1.70%
S&P P/E: 20.80
VIX: 25.10

equity: 30
bond: 34
cash: 36

1yr: 1.9%

It looks like Europe is going to bring the world economy down again for the 3rd year, S&P fell  -4.3% this week. Oh well...I am going to go on a a little shopping spree.  I'll add to our int'ls, high-yield bonds, and REITS, a little here and there as we step down from these levels.  My job is like a 'small cap stock', so I can't go crazy.   TXT came back down to earth as well.  I'll wait for some support and then sell to my desired allocation.  It looks like the "big boys" exercised some options in February.  I think I'll keep an eye on their activity more in the future.

Our group is cutting loose 4 of the off-shore team to trim our 2012 budget.  I'm sad to see them go, but they'll be fine, I'm sure the firm has work for them.  But, this may lead to more interesting work for peer and myself.  We'll have to see.  I'm continuing to gain experience in Hadoop, good for current and future opportunities, I think.  No changes to vacation plans in August.

Friday, May 11, 2012

Finally...

Where we're at:

dow: 12,820.60
s&p: 1,353.39
10yr: 1.84%
s&p P/E: 22.56
VIX: 19.89

equity: 31%
bond: 33%
cash: 36%

1yr: 3.5%

Wow, I finally made it, vested. It feels good, all 401(k) match is mine, plus a pension benefit. The first quarter was better than 2011, but CSNA still lost money. The call sounded better than last year, but who knows. For the third year, Europe decides to rain on the world's parade in April/May. Hiring is still sparse, but stable (so far). This summer, we'll celebrate the fact that we both have jobs by taking a real vacation. Our bathroom remodel is done, so no more home upgrades for awhile.

Spouse is doing well, the LOGM acquisition seems to suit her and her work style. The market seems to find a floor around 12,900 (dow). While EUR is down, I'm taking this opportunity to pick up some more int'l stocks to get back to our target allocation. I'd like to exchange some corp bonds for some high yield, but they seem a bit pricey, so I'll wait.

More Hive fun...

I have been spending time learning Hive using the VM (CentOS) on my laptop. I have loaded a few hundred job logs and begun querying for analysis. I am continuing to meet with the Linux server team to install the Hadoop framework on RHEL. I think they'll have it for me by the end of May. Next, I'm going to demo Hive to my EDW team and write a M/R wordcount app processing long text we extract from SAP. I'm still trying to build some momentum around the tool.