Author Archives: joecrow

using jarjar to solve hive and pig antlr conflicts

Pig 0.9+ and Hive 0.7+ (and maybe older versions, too) both use antlr. Unfortunately, they use incompatible versions which causes problems if you try to pull in both pig and hive via ivy or maven. Oozie has come up with … Continue reading

Posted in Uncategorized | Leave a comment

Workflow Engines for Hadoop

Over the past 2 years, I’ve had the opportunity to work with two open-source workflow engines for Hadoop. I used and contributed to Azkaban, written and open-sourced by LinkedIn, for over a year while I worked at Adconion. Recently, I’ve … Continue reading

Posted in hadoop | 4 Comments

Getting Started with Apache Hadoop 0.23.0

Hadoop 0.23.0 was released November 11, 2011. Being the future of the Hadoop platform, it’s worth checking out even though it is an alpha release. Note: Many of the instructions in this article came from trial and error, and there are … Continue reading

Posted in hadoop | 15 Comments

Recap: Apache Flume (incubating) User Meetup, Hadoop World 2011 NYC Edition

The Apache Flume (incubating) User Meetup, Hadoop World 2011 NYC Edition was Wednesday, November 9. It was collocated with the Hive Meetup at Palantir’s awesome office space in the meatpacking district in Manhattan. The following are my notes from the two … Continue reading

Posted in flume, hadoop | 1 Comment

Recap: April Puppet NYC Meetup

Last week, I attended my first Puppet NYC meetup, which was hosted at Gilt Groupe. As a fairly recent user of puppet, it was great to meet some folks from the community in NYC that are using it on a … Continue reading

Posted in Devops, Linux, Puppet | 1 Comment

Silently broken Gmail

At work, we have google apps, which comes with several gigs of gmail storage. For email, though, we use outlook server with a low quota. Rather than deleting email, I “archive” to gmail via IMAP. One day, though, gmail IMAP … Continue reading

Posted in Apple, Programming | 1 Comment

two puppet tricks: combining arrays and local tests

Joining Arrays I found myself wanting to join a bunch of arrays in my puppet manifests. I had 3 lists of ip addresses, but wanted to join all 3 lists together into a single list to provide all ips to … Continue reading

Posted in Linux | Tagged | 5 Comments

Python setup.py bdist_rpm on CentOS 5.5

I recently learned that python setup.py can be used to build a rpm using the bdist command.  Since we’re using puppet to manage installed software, this makes it really easy to add python modules to a bunch of servers. During the … Continue reading

Posted in Linux | Tagged , , | Leave a comment

Moving wordpress blog to lighttpd

I’ve moved my wordpress blog from a hosted account on godaddy.com to a server that’s running lighttpd on ubuntu. The move was more complex than I expected, so I thought I’d share some details for others… I already had lighttpd … Continue reading

Posted in Linux | 3 Comments

JAVA_HOME on Mac OS X

I was working on configuring HBase to run on my Mac OS X machine, and I ran into a hiccup setting up the JAVA_HOME environment variable. Eventually, I determined that there’s a “Home” directory inside of each Java Framework. So, … Continue reading

Posted in Apple | Leave a comment