Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive + mongoDB


Copy link to this message
-
Re: Hive + mongoDB
Thanks all
i am trying to import data with this program
but when i compied this code i got errors

Here is the code

import java.io.*;
import org.apache.commons.logging.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.mapreduce.*;
import org.bson.*;
import com.mongodb.hadoop.*;
import com.mongodb.hadoop.util.*;

public class ImportWeblogsFromMongo {

private static final Log log = LogFactory.
getLog(ImportWeblogsFromMongo.class);

public static class ReadWeblogsFromMongo extends Mapper<Object,
BSONObject, Text, Text>{

public void map(Object key, BSONObject value, Context context) throws
IOException, InterruptedException{

System.out.println("Key: " + key);
System.out.println("Value: " + value);

String md5 = value.get("md5").toString();
String url = value.get("url").toString();
String date = value.get("date").toString();
String time = value.get("time").toString();
String ip = value.get("ip").toString();
String output = "\t" + url + "\t" + date + "\t" + time + "\t" + ip;

context.write( new Text(md5), new Text(output));
}
}

public static void main(String[] args) throws Exception{

final Configuration conf = new Configuration();
MongoConfigUtil.setInputURI(conf,"mongodb://localhost:27017/mongo_hadoop.example");

MongoConfigUtil.setCreateInputSplits(conf, false);
System.out.println("Configuration: " + conf);

final Job job = new Job(conf, "Mongo Import");
Path out = new Path("/user/mongo_data");
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(ImportWeblogsFromMongo.class);
job.setMapperClass(ReadWeblogsFromMongo.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(MongoInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
System.exit(job.waitForCompletion(true) ? 0 : 1 );
}
}

On Wed, Sep 11, 2013 at 11:50 PM, Russell Jurney
<[EMAIL PROTECTED]>wrote:

> The docs are at https://github.com/mongodb/mongo-hadoop/tree/master/hive
>
> You need to build mongo-hadoop, and then use the documented syntax to
> create BSON tables in Hive.
>
>
> On Wed, Sep 11, 2013 at 11:11 AM, Jitendra Yadav <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> 1. you may use Hadoop-mongodb connector, create a map reduce program
>> to process your data from mongodb to hive.
>>
>> https://github.com/mongodb/mongo-hadoop
>>
>>
>> 2. As an alternative you can also use pig mongodb combination to get
>> the data from mongodb through pig, then after you can create a table
>> in hive that will points to the pig output file on hdfs.
>>
>> https://github.com/mongodb/mongo-hadoop/blob/master/pig/README.md
>>
>> Regards
>> Jitendra
>> On 9/11/13, Jérôme Verdier <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > You can use Talend to import data from mongodb to hive
>> >
>> > More informations here : http://www.talend.com/products/big-data
>> >
>> >
>> > 2013/9/11 Sandeep Nemuri <[EMAIL PROTECTED]>
>> >
>> >> Hi every one ,
>> >>                        I am trying to import data from mongodb to hive
>> .
>> >> i
>> >> got some jar files to connect mongo and hive .
>> >> now how to import the data from mongodb to hive ?
>> >>
>> >> Thanks in advance.
>> >>
>> >> --
>> >> --Regards
>> >>   Sandeep Nemuri
>> >>
>> >
>> >
>> >
>> > --
>> > *Jérôme VERDIER*
>> > 06.72.19.17.31
>> > [EMAIL PROTECTED]
>> >
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
> com
>

--
--Regards
  Sandeep Nemuri
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB