Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Review Request: SQOOP-931 - Integration of Sqoop and HCatalog


Copy link to this message
-
Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog
Venkat Ranganathan 2013-05-21, 00:35


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > build.xml, line 54
> > <https://reviews.apache.org/r/10688/diff/5/?file=288026#file288026line54>
> >
> >     I'm not feeling entirely comfortable about depending on SNAPSHOTS. Is there a particular feature that we're taking advantage of in 0.6.0 that is not in 0.5.0?

No, the functionality (from the contract point of view) is even compatible with 0.4.0 I think.   I could not successfully resolve the maven repos for the earlier versions and hence I had to switch to it.   I think now I tried to build and found that only 0.11.0 is available readily at repos.maven.org.   That was the reason.  I will update and switch to 0.5.0 if that version is available in the repos.   But given that we want to have readily available Hadoop 2 and Hadoop 1 artifacts, we may have to set to 0.11.0 assuming that is the version the HCatalog team decides to publish the repositories for.
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/SqoopOptions.java, line 160
> > <https://reviews.apache.org/r/10688/diff/5/?file=288029#file288029line160>
> >
> >     Out of curiosity what the "stanza" stands for?

Stanza means paragraph :)   We used this a lot earlier in my work to describe the SQL snippets when ]we write essays to describe what we want from the database.   May be clause is a more general DB term.
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/config/ConfigurationConstants.java, lines 66-67
> > <https://reviews.apache.org/r/10688/diff/5/?file=288030#file288030line66>
> >
> >     Does the new property make sense when it's valid only on Hadoop2 that actually do not have any JobTracker address at all? We already had issues with that on Sqoop2 side in SQOOP-1002.

I just wanted to consolidate all constants.  If we can remove it, that would be OK
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/config/ConfigurationConstants.java, lines 90-91
> > <https://reviews.apache.org/r/10688/diff/5/?file=288030#file288030line90>
> >
> >     Nit: Please put extra empty line between the property name and private constructor.

Thanks Will fix
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java, lines 377-379
> > <https://reviews.apache.org/r/10688/diff/5/?file=288052#file288052line377>
> >
> >     Nit: Incorrect indentation.

Thanks will fix
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java, lines 368-370
> > <https://reviews.apache.org/r/10688/diff/5/?file=288052#file288052line368>
> >
> >     Nit: Incorrect indentation.

Thanks  will fix
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 849
> > <https://reviews.apache.org/r/10688/diff/5/?file=288043#file288043line849>
> >
> >     I think that we also want to skip invoking the output committer in case of hadoop 2.

Good catch.  I will modify the version check to just say if it is Hadoop2

Thanks
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java, lines 76-77
> > <https://reviews.apache.org/r/10688/diff/5/?file=288039#file288039line76>
> >
> >     Nit: Those extra lines seems unnecessary here because the next row also have type constant.

Will remove the extra lines - thanks
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java, line 389
> > <https://reviews.apache.org/r/10688/diff/5/?file=288034#file288034line389>
> >
> >     This comment seems to be artifact from development. I would suggest to improve the message and move it into "debug" state in case that we would like to have it around.

Will remove
> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java, lines 68-74
> > <https://reviews.apache.org/r/10688/diff/5/?file=288033#file288033line68>

Yes,   Good point.   It makes sense (and that is what I started with), but since I had to change the ExportJob in place, I used a similar scheme for consistency and followed what we did for other storage formats like Avro.   Also, this will make sure that irrespective of the changes to import job (on what it gets its feed from), we will be able to send it to HCat.

Good point Jarek.  Actually I had that implementation first - but then we will not be able to support update/upsert and call by procedure would  need to be modified to handle the HCat format.   Since we were using HCat more as storage format like Avro, I decided to implement in place.  And followed similar logic for Imports as well

Timestamp is currently not a supported datatype in HCat (even though Hive supports it).   I will create a JIRA issue on HCat to support that now that HCatalog is a sub project of Hive.

I had an issue building without the explicit dependency listed - may be because the repos were not having all the artifacts and the data nucleus was only available from datanucleus repository.   I will try to remove the dependency and retry.  
Thanks
- Venkat
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review20756
On May 4, 2013, 11:46 p.m., Venkat Ranganathan wrote: