Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - SolrCell help!


+
Flavio Pompermaier 2013-07-22, 16:18
+
Wolfgang Hoschek 2013-07-22, 18:12
+
Flavio Pompermaier 2013-07-22, 18:43
+
Wolfgang Hoschek 2013-07-22, 19:21
+
Flavio Pompermaier 2013-07-22, 20:41
+
Wolfgang Hoschek 2013-07-22, 21:02
+
Flavio Pompermaier 2013-07-22, 21:14
+
Flavio Pompermaier 2013-07-23, 07:51
+
Wolfgang Hoschek 2013-07-23, 08:22
+
Flavio Pompermaier 2013-07-23, 08:31
Copy link to this message
-
Re: SolrCell help!
Flavio Pompermaier 2013-07-23, 08:33
Sorry, this is caused of our mirror..I remove it and I'll retry..
On Tue, Jul 23, 2013 at 10:31 AM, Flavio Pompermaier
<[EMAIL PROTECTED]>wrote:

>
> I still get this error:
>
>  Failed to read artifact descriptor for
> commons-daemon:commons-daemon:jar:1.0.3: Could not transfer artifact
> commons-daemon:commons-daemon:pom:1.0.3 from/to repo (
> http://dev.okkam.it/artifactory/repo): Failed to transfer file:
> http://dev.okkam.it/artifactory/repo/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom.
> Return code is: 409 -> [Help 1]
>
>
> On Tue, Jul 23, 2013 at 10:22 AM, Wolfgang Hoschek <[EMAIL PROTECTED]>wrote:
>
>> Tests pass on java 6 but fail on java 7. Correspondingly, I have filed
>> https://issues.cloudera.org/browse/CDK-80. We'll fix it. Meanwhile,
>> please try java 6.
>>
>> Wolfgang.
>>
>> On Jul 23, 2013, at 12:51 AM, Flavio Pompermaier wrote:
>>
>> > I tried to download the current trunk but it doesn't compile..for
>> example it hangs on
>> >
>> https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml
>> > that doesn't exists anymore..
>> >
>> >
>> > On Mon, Jul 22, 2013 at 11:14 PM, Flavio Pompermaier <
>> [EMAIL PROTECTED]> wrote:
>> > You couldn't be more precise ;)
>> >
>> > Thanks,
>> > Flavio
>> >
>> > On Mon, Jul 22, 2013 at 11:02 PM, Wolfgang Hoschek <
>> [EMAIL PROTECTED]> wrote:
>> > Docs for the xquery and xslt morphline commands are here (look for
>> xquery"):
>> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence
>> >
>> > Example morphlines for the new xquery and xslt commands are here:
>> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-morphlines
>> >
>> > Sample input data is here:
>> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-documents
>> >
>> > Unit tests are here:
>> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-saxon/src/test/java/com/cloudera/cdk/morphline/saxon/SaxonMorphlineTest.java
>> >
>> > Wolfgang.
>> >
>> > On Jul 22, 2013, at 1:41 PM, Flavio Pompermaier wrote:
>> >
>> > > Ok, I'll try to follow the code! Just one last thing: for
>> morphine-neon I manage to find the test (in cdk repository) but for the new
>> xslt and xquery I'm not able to find the tests code..could you give me an
>> hook?
>> > >
>> > > On Mon, Jul 22, 2013 at 9:21 PM, Wolfgang Hoschek <
>> [EMAIL PROTECTED]> wrote:
>> > > There are many tests for this in the morphlines repo.
>> > >
>> > > Wolfgang.
>> > >
>> > > On Jul 22, 2013, at 11:43 AM, Flavio Pompermaiert wrote:
>> > >
>> > > >
>> > > > Thank you for the great support Wolfgang!
>> > > > Flume + Morphlines is undoubtedly an exciting road but its taking
>> me too much time :(
>> > > > Do you think you could add some more tests including readJson and
>> the new xquery and xslt in trunk?
>> > > >
>> > > > Best,
>> > > > Flavio
>> > > > On Mon, Jul 22, 2013 at 8:12 PM, Wolfgang Hoschek <
>> [EMAIL PROTECTED]> wrote:
>> > > > Looks like the DcXMLParser spits out a metadata field called
>> "title" and another title as part of the Tika XML stream. That metadata
>> field is then added to the solr document by solrcell. If you add "title" to
>> the captures the title from the XML stream gets added as well by solrcell.
>> > > >
>> > > > JSON support has been released in morphlines-0.4.1 (which flume
>> trunk is now depending on):
>> http://cloudera.github.io/cdk/docs/0.4.1/cdk-morphlines/morphlinesReferenceGuide.html#readJson
>> > > >
>> > > > Note that Tika XML doesn't really support/capture XPath extraction
>> with SolrCell. We have added proper support for reading, extracting and
>> transforming XML and HTML with XPath, XQuery and XSLT on the current
>> morphlines trunk (not yet released), similar to the way we already support
>> JSON and Avro. This should make XML handling a lot more straightforward,
+
Flavio Pompermaier 2013-07-23, 08:36
+
Wolfgang Hoschek 2013-07-23, 17:48
+
Flavio Pompermaier 2013-07-23, 22:20
+
Flavio Pompermaier 2013-07-24, 12:34