Subject: [jira] [Commented] (NUTCH-2405) jsoup-extractor
structure correction, typo fixed




[
https://issues.apache.org/jira/browse/NUTCH-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120339#comment-16120339
]

Hudson commented on NUTCH-2405:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1590 (See
[https://builds.apache.org/job/Nutch-nutchgora/1590/])
NUTCH-2405 1. Missed root tag <extractor> added in jsoup-extractor.xml
(kaidulislam90:
[https://github.com/apache/nutch/commit/49ff77e83cc1e62cf10c377027c122e6a7d83128])
* (edit) conf/jsoup-extractor.xml
* (edit) conf/jsoup-extractor-example.xml
* (edit)
src/plugin/jsoup-extractor/src/java/org/apache/nutch/parse/jsoup/extractor/JsoupHtmlParser.java


> jsoup-extractor structure correction, typo fixed
> ------------------------------------------------
>
> Key: NUTCH-2405
> URL: https://issues.apache.org/jira/browse/NUTCH-2405
> Project: Nutch
> Issue Type: Bug
> Components: plugin
> Affects Versions: 2.4
> Reporter: Kaidul Islam
> Assignee: Kaidul Islam
> Priority: Minor
> Fix For: 2.4
>
>
> Several bugs faced during testing with my project have been fixed
> 1. Missed root tag <ext...

ractor> added in jsoup-extractor.xml like

> jsoup-extractor-example.xml
> 2. jsoup API text() used instead of ownText() to get full contents under CSS
> selector
> 3. <default> => <default-value> typo fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)



Programming list archiving by: Enterprise Git Hosting