I have a log file in HDFS which needs to be parsed and put in a Hbase table.
I want to do this using PIG .
How can i go about it .Pig script should parse the logs and then put in Hbase? Regards, Chhaya Vishwakarma ________________________________ The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"
The big question is how the log file needs to be parsed / formatting. I'd be inclined to write a UDF that would take the line of text and return a tuple of the values you'd be storing in hbase.
Then you could do other operations on the bag of tuples that get passed back.
Alternatively, you could write a regex statement and use an internal pig function like REGEX_EXTRACT or REGEX_EXTRACT_ALL.
I like the UDF approach in this case because then I can more easily write unit tests around my log parser and get that testing out of the way before actually spawning any jobs. On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma < [EMAIL PROTECTED]> wrote:
if you want to load hbase log, why do you not directly write MapReduce jobs. In pig, you need to write your customized load function. However, if you write MapReduce job, you can directly use hbase api. On Wed, Feb 26, 2014 at 2:15 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext