Package org.htmlparser.beans
Class FilterBean
- java.lang.Object
-
- org.htmlparser.beans.FilterBean
-
- All Implemented Interfaces:
java.io.Serializable
public class FilterBean extends java.lang.Object implements java.io.Serializable
Extract nodes from a URL using a filter.FilterBean fb = new FilterBean ("http://cbc.ca"); fb.setFilters (new NodeFilter[] { new TagNameFilter ("META") }); fb.setURL ("http://cbc.ca"); System.out.println (fb.getNodes ().toHtml ());
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected NodeFilter[]
mFilters
The filter set.protected NodeList
mNodes
The nodes extracted from the URL.protected Parser
mParser
The parser used to filter.protected java.beans.PropertyChangeSupport
mPropertySupport
Bound property support.protected boolean
mRecursive
The recursion behaviour for elements of the filter array.static java.lang.String
PROP_CONNECTION_PROPERTY
Property name in event where the connection changes.static java.lang.String
PROP_NODES_PROPERTY
Property name in event where the URL contents changes.static java.lang.String
PROP_TEXT_PROPERTY
Property name in event where the URL contents changes.static java.lang.String
PROP_URL_PROPERTY
Property name in event where the URL changes.
-
Constructor Summary
Constructors Constructor Description FilterBean()
Create a FilterBean object.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addPropertyChangeListener(java.beans.PropertyChangeListener listener)
Add a PropertyChangeListener to the listener list.protected NodeList
applyFilters()
Apply each of the filters.java.net.URLConnection
getConnection()
Get the current connection.NodeFilter[]
getFilters()
Get the current filter set.NodeList
getNodes()
Return the nodes of the URL matching the filter.Parser
getParser()
Get the parser used to fetch nodes.boolean
getRecursive()
Get the current recursion behaviour.java.lang.String
getText()
Convenience method to apply aStringBean
to the filter results.java.lang.String
getURL()
Get the current URL.static void
main(java.lang.String[] args)
Unit test.void
removePropertyChangeListener(java.beans.PropertyChangeListener listener)
Remove a PropertyChangeListener from the listener list.void
setConnection(java.net.URLConnection connection)
Set the parser's connection.void
setFilters(NodeFilter[] filters)
Set the filters for the bean.protected void
setNodes()
Fetch the URL contents and filter it.void
setParser(Parser parser)
Set the parser for the bean.void
setRecursive(boolean recursive)
Set the recursion behaviour.void
setURL(java.lang.String url)
Set the URL to extract strings from.protected void
updateNodes(NodeList nodes)
Assign theNodes
property, firing the property change.
-
-
-
Field Detail
-
PROP_NODES_PROPERTY
public static final java.lang.String PROP_NODES_PROPERTY
Property name in event where the URL contents changes.- See Also:
- Constant Field Values
-
PROP_TEXT_PROPERTY
public static final java.lang.String PROP_TEXT_PROPERTY
Property name in event where the URL contents changes.- See Also:
- Constant Field Values
-
PROP_URL_PROPERTY
public static final java.lang.String PROP_URL_PROPERTY
Property name in event where the URL changes.- See Also:
- Constant Field Values
-
PROP_CONNECTION_PROPERTY
public static final java.lang.String PROP_CONNECTION_PROPERTY
Property name in event where the connection changes.- See Also:
- Constant Field Values
-
mPropertySupport
protected java.beans.PropertyChangeSupport mPropertySupport
Bound property support.
-
mParser
protected Parser mParser
The parser used to filter.
-
mFilters
protected NodeFilter[] mFilters
The filter set.
-
mNodes
protected NodeList mNodes
The nodes extracted from the URL.
-
mRecursive
protected boolean mRecursive
The recursion behaviour for elements of the filter array. Iftrue
the filters are applied recursively.
-
-
Method Detail
-
updateNodes
protected void updateNodes(NodeList nodes)
Assign theNodes
property, firing the property change.- Parameters:
nodes
- The new value of theNodes
property.
-
applyFilters
protected NodeList applyFilters() throws ParserException
Apply each of the filters. The first filter is applied to the output of the parser. Subsequent filters are applied to the output of the prior filter.- Returns:
- A list of nodes passed through all filters. If there are no filters, returns the entire page.
- Throws:
ParserException
- If an encoding change occurs or there is some other problem.
-
setNodes
protected void setNodes()
Fetch the URL contents and filter it. Only do work if there is a valid parser with it's URL set.
-
addPropertyChangeListener
public void addPropertyChangeListener(java.beans.PropertyChangeListener listener)
Add a PropertyChangeListener to the listener list. The listener is registered for all properties.- Parameters:
listener
- The PropertyChangeListener to be added.
-
removePropertyChangeListener
public void removePropertyChangeListener(java.beans.PropertyChangeListener listener)
Remove a PropertyChangeListener from the listener list. This removes a registered PropertyChangeListener.- Parameters:
listener
- The PropertyChangeListener to be removed.
-
getNodes
public NodeList getNodes()
Return the nodes of the URL matching the filter. This is the primary output of the bean.- Returns:
- The nodes from the URL matching the current filter.
-
getURL
public java.lang.String getURL()
Get the current URL.- Returns:
- The URL from which text has been extracted, or
null
if this property has not been set yet.
-
setURL
public void setURL(java.lang.String url)
Set the URL to extract strings from. The text from the URL will be fetched, which may be expensive, so this property should be set last.- Parameters:
url
- The URL that text should be fetched from.
-
getConnection
public java.net.URLConnection getConnection()
Get the current connection.- Returns:
- The connection that the parser has or
null
if it hasn't been set or the parser hasn't been constructed yet.
-
setConnection
public void setConnection(java.net.URLConnection connection)
Set the parser's connection. The text from the URL will be fetched, which may be expensive, so this property should be set last.- Parameters:
connection
- New value of property Connection.
-
getFilters
public NodeFilter[] getFilters()
Get the current filter set.- Returns:
- The current filters.
-
setFilters
public void setFilters(NodeFilter[] filters)
Set the filters for the bean. If the parser has been set, it is reset and the nodes are refetched with the new filters.- Parameters:
filters
- The filter set to use.
-
getParser
public Parser getParser()
Get the parser used to fetch nodes.- Returns:
- The parser used by the bean.
-
setParser
public void setParser(Parser parser)
Set the parser for the bean. The parser is used immediately to fetch the nodes, which for a null filter means all the nodes- Parameters:
parser
- The parser to use.
-
getText
public java.lang.String getText()
Convenience method to apply aStringBean
to the filter results. This may yield duplicate or multiple text elements if the node list contains nodes from two or more levels in the same nested tag heirarchy, but if the node list contains only one tag, it provides access to the text within the node.- Returns:
- The textual contents of the nodes that pass through the filter set, as collected by the StringBean.
-
getRecursive
public boolean getRecursive()
Get the current recursion behaviour.- Returns:
- The recursion (applies to children, children's children, etc) behavior currently being used.
-
setRecursive
public void setRecursive(boolean recursive)
Set the recursion behaviour.- Parameters:
recursive
- Iftrue
theextractAllNodesThatMatch()
call is performed recursively.
-
main
public static void main(java.lang.String[] args)
Unit test.- Parameters:
args
- Pass arg[0] as the URL to process, and optionally a node name for filtering.
-
-