Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> SolrCell help!


+
Flavio Pompermaier 2013-07-22, 16:18
+
Wolfgang Hoschek 2013-07-22, 18:12
+
Flavio Pompermaier 2013-07-22, 18:43
+
Wolfgang Hoschek 2013-07-22, 19:21
+
Flavio Pompermaier 2013-07-22, 20:41
+
Wolfgang Hoschek 2013-07-22, 21:02
+
Flavio Pompermaier 2013-07-22, 21:14
+
Flavio Pompermaier 2013-07-23, 07:51
+
Wolfgang Hoschek 2013-07-23, 08:22
Copy link to this message
-
Re: SolrCell help!
I still get this error:

 Failed to read artifact descriptor for
commons-daemon:commons-daemon:jar:1.0.3: Could not transfer artifact
commons-daemon:commons-daemon:pom:1.0.3 from/to repo (
http://dev.okkam.it/artifactory/repo): Failed to transfer file:
http://dev.okkam.it/artifactory/repo/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom.
Return code is: 409 -> [Help 1]
On Tue, Jul 23, 2013 at 10:22 AM, Wolfgang Hoschek <[EMAIL PROTECTED]>wrote:

> Tests pass on java 6 but fail on java 7. Correspondingly, I have filed
> https://issues.cloudera.org/browse/CDK-80. We'll fix it. Meanwhile,
> please try java 6.
>
> Wolfgang.
>
> On Jul 23, 2013, at 12:51 AM, Flavio Pompermaier wrote:
>
> > I tried to download the current trunk but it doesn't compile..for
> example it hangs on
> >
> https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml
> > that doesn't exists anymore..
> >
> >
> > On Mon, Jul 22, 2013 at 11:14 PM, Flavio Pompermaier <
> [EMAIL PROTECTED]> wrote:
> > You couldn't be more precise ;)
> >
> > Thanks,
> > Flavio
> >
> > On Mon, Jul 22, 2013 at 11:02 PM, Wolfgang Hoschek <
> [EMAIL PROTECTED]> wrote:
> > Docs for the xquery and xslt morphline commands are here (look for
> xquery"):
> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence
> >
> > Example morphlines for the new xquery and xslt commands are here:
> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-morphlines
> >
> > Sample input data is here:
> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-documents
> >
> > Unit tests are here:
> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-saxon/src/test/java/com/cloudera/cdk/morphline/saxon/SaxonMorphlineTest.java
> >
> > Wolfgang.
> >
> > On Jul 22, 2013, at 1:41 PM, Flavio Pompermaier wrote:
> >
> > > Ok, I'll try to follow the code! Just one last thing: for
> morphine-neon I manage to find the test (in cdk repository) but for the new
> xslt and xquery I'm not able to find the tests code..could you give me an
> hook?
> > >
> > > On Mon, Jul 22, 2013 at 9:21 PM, Wolfgang Hoschek <
> [EMAIL PROTECTED]> wrote:
> > > There are many tests for this in the morphlines repo.
> > >
> > > Wolfgang.
> > >
> > > On Jul 22, 2013, at 11:43 AM, Flavio Pompermaiert wrote:
> > >
> > > >
> > > > Thank you for the great support Wolfgang!
> > > > Flume + Morphlines is undoubtedly an exciting road but its taking me
> too much time :(
> > > > Do you think you could add some more tests including readJson and
> the new xquery and xslt in trunk?
> > > >
> > > > Best,
> > > > Flavio
> > > > On Mon, Jul 22, 2013 at 8:12 PM, Wolfgang Hoschek <
> [EMAIL PROTECTED]> wrote:
> > > > Looks like the DcXMLParser spits out a metadata field called "title"
> and another title as part of the Tika XML stream. That metadata field is
> then added to the solr document by solrcell. If you add "title" to the
> captures the title from the XML stream gets added as well by solrcell.
> > > >
> > > > JSON support has been released in morphlines-0.4.1 (which flume
> trunk is now depending on):
> http://cloudera.github.io/cdk/docs/0.4.1/cdk-morphlines/morphlinesReferenceGuide.html#readJson
> > > >
> > > > Note that Tika XML doesn't really support/capture XPath extraction
> with SolrCell. We have added proper support for reading, extracting and
> transforming XML and HTML with XPath, XQuery and XSLT on the current
> morphlines trunk (not yet released), similar to the way we already support
> JSON and Avro. This should make XML handling a lot more straightforward,
> and make the very limited XML SolrCell approach obsolete. Look for the new
> "xquery" and "xslt" command in
> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence
+
Flavio Pompermaier 2013-07-23, 08:33
+
Flavio Pompermaier 2013-07-23, 08:36
+
Wolfgang Hoschek 2013-07-23, 17:48
+
Flavio Pompermaier 2013-07-23, 22:20
+
Flavio Pompermaier 2013-07-24, 12:34