Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MapReduce job with mixed data sources: HBase table and HDFS files


Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
Ted & Azurry, after some investigation on log files, I figured out why it happens. In the code, I use the same path "inputPath1" for two inputs (see below) since I thought the input path is not effective for HBase table. But it turns out that the input path of HBase table could affect the input path of HDFS file. I changed the input path for HBase table to a different value, then it works!
Ted & Azurry, I really appreciate your help!
MultipleInputs.addInputPath(job1, inputPath1, TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job1, inputPath1,  TableInputFormat.class, TableMap.class);
________________________________
 From: S. Zhou <[EMAIL PROTECTED]>
To: Ted Yu <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Thursday, July 11, 2013 10:19 PM
Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS files
 

i use org.apache.hadoop.mapreduce.lib.input.MultipleInputs

I run on pseudo-distributed hadoop (1.2.0) and Pseudo-distributed HBase (0.95.1-hadoop1).
________________________________
From: Ted Yu <[EMAIL PROTECTED]>
To: S. Zhou <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Thursday, July 11, 2013 9:54 PM
Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS files

Did you use org.apache.hadoop.mapreduce.lib.input.MultipleInputs or the one from org.apache.hadoop.mapred.lib ?

Which hadoop version do you use ?

Cheers
On Thu, Jul 11, 2013 at 9:49 PM, S. Zhou <[EMAIL PROTECTED]> wrote:

Thanks Ted &Azurry. Your hint helped me solve that particular issue.
>
>But now I run into a new problem with multipleInputs. This time I add a HTable and a HDFS file as inputs. (see the new code below). The problem is: whatever data source added later overrides the data source added before. For example, if I add HBase table first to MultipleInputs first then HDFS file, then the final result from reducer only contains the results of HDFS file. On the other hand, if I add HDFS file first then HBase table, then the final result from reducer only contains the results of HBase table.
>
>public class MixMR {
>       public static class Map extends Mapper<Object, Text, Text, Text> {
>
>                public
void map(Object key, Text value, Context context) throws IOException,   InterruptedException {
>                        String s = value.toString();
>                        String[] sa = s.split(",");
>                        if (sa.length == 2) {
>                                context.write(new Text(sa[0]), new
Text(sa[1]));
>
>                        }
>
>                }
>
>        }
>
>    public static class TableMap extends TableMapper<Text, Text>  {
>        public static final byte[] CF = "cf".getBytes();
>        public static final byte[] ATTR1 = "c1".getBytes();
>
>        public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
>          
>            String key Bytes.toString(row.get());
>            String val = new String(value.getValue(CF, ATTR1));
>          
>            context.write(new Text(key), new Text(val));
>        }
>    }
>
>
>    public static class Reduce extends Reducer  <Object, Text, Object, Text> {
>        public void reduce(Object key, Iterable<Text> values, Context context)
>                throws IOException, InterruptedException {
>            String ks = key.toString();
>            for (Text val :
values){
>                context.write(new Text(ks), val);
>            }
>
>        }
>    }
>
> public static void main(String[] args) throws Exception {
>                  Path inputPath1 = new Path(args[0]);
>                Path outputPath = new Path(args[1]);
>
>                String tableName1 = "sezhou-test";
>
>                Configuration config1 HBaseConfiguration.create();
>                Job job1 = new Job(config1, "ExampleRead");
>                job1.setJarByClass(com.shopping.test.dealstore.MixMR.class);     // class that contains mapper
jobs
value
MultipleInputs.addInputPath(job1, inputPath1,  TableInputFormat.class, TableMap.class); // inputPath1 here has no effect for HBase table
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB