Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Force number of records per map task


Copy link to this message
-
Force number of records per map task
This is going to sound very odd, but I am hoping to use a transform script
in such a way that I pass a filepath to the transform script, to which it
reads the file and produces a bunch of rows in hive.  In this case the data
is pcaps.  I have a location accessible to all nodes, and I want to have my
transform script read in a file location, and then spit out, for example
the IP addresses that were seen in the packet capture (using a script I've
already written).   Can I do something whereby I load my file locations
into a table in hive (one file per row) and read that table into a
transform script and only have one map task per source row?  I don't want
my script to parse several files, it may make for some poor
parrelelization, but I am having trouble forcing such a small record count
per map task.

Thoughts?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB