Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive + mongoDB


Copy link to this message
-
Re: Hive + mongoDB
Hi Nithin ,
               I have used

add jar /usr/lib/hive/lib/mongo-2.7.3.jar;
add jar /usr/lib/hive/lib/hive-mongo-0.0.3.jar;

create external table mongo_users2 (id int ,name string ,age int)
stored by "org.yong3.hive.mongo.MongoStorageHandler"
with serdeproperties ( "mongo.column.mapping" = "_id,name,age" )
tblproperties ( "mongo.host" = "192.168.0.199", "mongo.port"="27017",
"mongo.db" ="test", "mongo.user" = "test", "mongo.passwd" = "password",
"mongo.collection" = "users" );
It worked for me now i am able to extract data from mongodb

I have a nested data like

{
  "DocId": "ABC",
  "User": {
    "Id": 1234,
    "Username": "sam1234",
    "Name": "Sam",
    "ShippingAddress": {
      "Address1": "123 Main St.",
      "Address2": null,
      "City": "Durham",
      "State": "NC"
    },
    "Orders": [
      {
        "ItemId": 6789,
        "OrderDate": "11/11/2012"
      },
      {
        "ItemId": 4352,
        "OrderDate": "12/12/2012"
      }
    ]
  }
}

    To extract this collection
i have used

CREATE EXTERNAL TABLE complex_json3 (
DocId string,
User struct<Id:int,
Username:string,
Name: string,
ShippingAddress:struct<Address1:string,
                                     Address2:string,
                                     City:string,
                                     State:string>,
Orders:array<struct<ItemId:int,
OrderDate:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored by "org.yong3.hive.mongo.MongoStorageHandler"
with serdeproperties ( "mongo.column.mapping" "DocId,User.Id,User.Username,User.Name,User.ShippingAddress.Address1,User.ShippingAddress.Address2,User.ShippingAddress.City,User.ShippingAddress.State,User.Orders.ItemId,User.Orders.OrderDate"
)
tblproperties ( "mongo.host" = "192.168.0.199", "mongo.port"="27017",
"mongo.db" ="mongo_hadoop", "mongo.user" = "mongo_hadoop", "mongo.passwd" "password", "mongo.collection" = "complex" );

i am not sure whether mongo.column.mapping syntax is correct or not.
But i am not able to make it as it is nested data
On Fri, Sep 13, 2013 at 9:34 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> Can you share your create table ddl for table name docs?
>
> Select statement does not need all those details. Those are part of create
> table DDL only.
>
>
> On Fri, Sep 13, 2013 at 4:24 PM, Sandeep Nemuri <[EMAIL PROTECTED]>wrote:
>
>> Hi nithin
>>
>> Thanks for your help
>> I have used this query in hive to retrieve the data from mongodb
>>
>> add jar /usr/lib/hadoop/lib/mongo-2.8.0.jar;
>> add jar /usr/lib/hive/lib/hive-mongo-0.0.3-jar-with-dependencies.jar;
>>
>> select * from docs
>> input format "org.yong3.hive.mongo.MongoStorageHandler"
>> with serdeproperties ( "mongo.column.mapping" >> "_id,dayOfWeek,bc3Year,bc5Year,bc10Year,bc20Year,bc1Month,bc2Year,bc3Year,bc30Year,bc1Year,bc7Year,bc6Year"
>> )
>> tblproperties ( "mongo.host" = "127.0.0.1", "mongo.port" = "27017",
>> "mongo.db" = "sample", "mongo.user" = "sample", "mongo.passwd" >> "password", "mongo.collection" = "docs" );
>>
>>
>> I got an Error
>>
>> FAILED: Parse Error: line 2:6 mismatched input 'format' expecting EOF
>> near 'input'
>>
>>
>>
>> On Thu, Sep 12, 2013 at 6:23 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>
>>> try creating table with your existing mongo db and collection see the
>>> data can be read by the user or not.
>>> What you need to do is mongo collection column mapping exactly with same
>>> names into hive column definition.
>>>
>>> if you can not see mongo data from hive query, do let me know what
>>> errors do you see.
>>>
>>>
>>> On Thu, Sep 12, 2013 at 5:28 PM, Sandeep Nemuri <[EMAIL PROTECTED]>wrote:
>>>
>>>> How we will get mongo data into mongo table ?
>>>>
>>>> By using this we can just create table
>>>>
>>>> create external table mongo_users(id int, name string, age int)
>>>> stored by "org.yong3.hive.mongo.MongoStorageHandler"
>>>> with serdeproperties ( "mongo.column.mapping" = "_id,name,age" )
>>>> tblproperties ( "mongo.host" = "192.168.0.5", "mongo.port" = "11211",

  Sandeep Nemuri