|
|
-
Force number of records per map task
John Omernik 2012-08-31, 12:13
This is going to sound very odd, but I am hoping to use a transform script in such a way that I pass a filepath to the transform script, to which it reads the file and produces a bunch of rows in hive. In this case the data is pcaps. I have a location accessible to all nodes, and I want to have my transform script read in a file location, and then spit out, for example the IP addresses that were seen in the packet capture (using a script I've already written). Can I do something whereby I load my file locations into a table in hive (one file per row) and read that table into a transform script and only have one map task per source row? I don't want my script to parse several files, it may make for some poor parrelelization, but I am having trouble forcing such a small record count per map task.
Thoughts?
-
RE: Force number of records per map task
Elango, Vikram 2012-08-31, 12:52
Thanks buddy !!
Thanks and regards, Vikram Elango The Home Depot, Nortel no: 0441-3806
Mobile: +91-8939662345
From: John Omernik [mailto:[EMAIL PROTECTED]] Sent: Friday, August 31, 2012 5:44 PM To: [EMAIL PROTECTED] Subject: Force number of records per map task
This is going to sound very odd, but I am hoping to use a transform script in such a way that I pass a filepath to the transform script, to which it reads the file and produces a bunch of rows in hive. In this case the data is pcaps. I have a location accessible to all nodes, and I want to have my transform script read in a file location, and then spit out, for example the IP addresses that were seen in the packet capture (using a script I've already written). Can I do something whereby I load my file locations into a table in hive (one file per row) and read that table into a transform script and only have one map task per source row? I don't want my script to parse several files, it may make for some poor parrelelization, but I am having trouble forcing such a small record count per map task.
Thoughts?
Confidential: This electronic message and all contents contain information from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee only. If you are not the addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error, please notify the sender immediately and destroy the original message and all copies.
|
|