Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Re: sqoop job


[Including [EMAIL PROTECTED]]
On Tue, Apr 23, 2013 at 12:21 PM, Kathleen Ting <[EMAIL PROTECTED]> wrote:

> Hi Jay, this use-case seems to be beyond the scope of Sqoop, which is
> meant to just transfer data between a structured datastore and
> Hadoop. Including [EMAIL PROTECTED] to solicit more opinions.
>
> Regards, Kate
>
>
> On Mon, Apr 22, 2013 at 11:04 PM, jaikumar krishna <[EMAIL PROTECTED]>wrote:
>
>> Thanks Kate,
>>
>> My use case ::i am  doing  to do .
>> I have two table of inputs Table1 and Table 2 . In Table 1(like master)
>> i have *"25 lakhs" *records of  "*company name , address, city,state
>> ,zip ,phone nember, fax ,Mailid,company website url* ".
>>
>> In Table_2   i have " *5 lakhs"* records of  *company name , address,
>> city,state ,zip ,phone nember, fax ,Mailid,company website url* like
>> Table1. i want to check Table2 recods match with Table1 for verifying
>> (whether it's correct or not ).
>>
>> Before matching i have to put normalization's like below
>>
>> *Company name                                      Normalized _Company
>> name*
>> Century Tool & Gage           becomes   Century Tool and Gage
>> News-Gazette Printing Co     =>            News Gazette Printing
>> Punch Networks Inc               =>           Punch Networks
>> Omni Print Inc                       =>            Omni Print
>>
>> for Address_1 column
>> *Address_1             =>       Address_1_Normalized*
>> 15 Sproat St          =>        15 Sproat Street
>> 1 Preble Rd          =>        1 Preble Road
>> 90 Everett Ave      =>        90 Everett Avenue
>>
>> Kindly check for attached excel sheet for* normalization of remaining
>> fields *..(Both tables normalized before verifying )
>>
>> Then i have some condition for result accuracy by score those entities by
>> matching
>>
>> *1.company name == 100 and  (address == 100 or phone number == 100) ) *
>> * 2. ( company name>=75 and  address >=75  and city == 100  and  state
>> == 100 )*
>>
>> if any anyone satisfies i can put its verified one.
>>
>> in another case
>>  *if company name and phone number did not matched with  Table1 which
>> means i can add it in new entity (which means its not in Ttable1)*
>>
>> i have attached sample records of Table1 and table 2 and my current
>> output (which includes scores of my current process without hadoop takes
>> more and more time)
>>
>>
>> i hope you  understand my usecase.
>>
>> The main problem is how can i compare each row having  6 fields (comp
>> name, city ,street,state ,phone .mailid) with another table and get score
>> and finally get max... i am totally frustrated. ...
>>
>> Thanks,
>> Jay'
>>
>>
>> On Tue, Apr 23, 2013 at 4:49 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote:
>>
>>> Hi Jay, can you share your use-case behind verifying the table in
>>> Sqoop rather than in HDFS? Generally speaking, you can verify if the
>>> table transferred successfully by inspecting the file's contents via
>>> issuing $ hadoop fs -cat <tablename>/part-m-00000
>>>
>>> You can also verify the return value from the Sqoop command ($ echo
>>> $?), which should be 0.
>>>
>>> Regards, Kate
>>>
>>>
>>> On Monday, April 22, 2013 10:19:20 AM UTC-7, jaikumar krishna wrote:
>>>>
>>>> hi,
>>>>     how can i find the table is moved successfully  or not in sqoop(not
>>>> in hdfs) ?
>>>>
>>>> Thanks,
>>>> Jay'
>>>>
>>>  --
>>>
>>>
>>>
>>>
>>
>>  --
>>
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB