The big question is how the log file needs to be parsed / formatting. I'd be inclined to write a UDF that would take the line of text and return a tuple of the values you'd be storing in hbase.
Then you could do other operations on the bag of tuples that get passed back.
Alternatively, you could write a regex statement and use an internal pig function like REGEX_EXTRACT or REGEX_EXTRACT_ALL.
I like the UDF approach in this case because then I can more easily write unit tests around my log parser and get that testing out of the way before actually spawning any jobs. On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma < [EMAIL PROTECTED]> wrote:
if you want to load hbase log, why do you not directly write MapReduce jobs. In pig, you need to write your customized load function. However, if you write MapReduce job, you can directly use hbase api. On Wed, Feb 26, 2014 at 2:15 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext