Package org.htmlparser.nodes
Class TagNode
- java.lang.Object
-
- org.htmlparser.nodes.AbstractNode
-
- org.htmlparser.nodes.TagNode
-
- Direct Known Subclasses:
BaseHrefTag,CompositeTag,DoctypeTag,FrameTag,ImageTag,InputTag,JspTag,MetaTag,ProcessingInstructionTag
public class TagNode extends AbstractNode implements Tag
TagNode represents a generic tag. If no scanner is registered for a given tag name, this is what you get. This is also the base class for all tags created by the parser.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.util.HashtablebreakTagsSet of tags that breaks the flow.protected java.util.VectormAttributesThe tag attributes.protected static ScannermDefaultScannerThe default scanner for non-composite tags.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaccept(NodeVisitor visitor)Default tag visiting code.booleanbreaksFlow()Determines if the given tag breaks the flow of text.java.lang.StringgetAttribute(java.lang.String name)Returns the value of an attribute.AttributegetAttributeEx(java.lang.String name)Returns the attribute with the given name.java.util.VectorgetAttributesEx()Gets the attributes in the tag.java.lang.String[]getEnders()Return the set of tag names that cause this tag to finish.intgetEndingLineNumber()Get the line number where this tag ends.TaggetEndTag()Get the end tag for this (composite) tag.java.lang.String[]getEndTagEnders()Return the set of end tag names that cause this tag to finish.java.lang.String[]getIds()Return the set of names handled by this tag.java.lang.StringgetRawTagName()Return the name of this tag.intgetStartingLineNumber()Get the line number where this tag starts.intgetTagBegin()Gets the nodeBegin.intgetTagEnd()Gets the nodeEnd.java.lang.StringgetTagName()Return the name of this tag.java.lang.StringgetText()Return the text contained in this tag.ScannergetThisScanner()Return the scanner associated with this tag.booleanisEmptyXmlTag()Is this an empty xml tag of the form <tag/>.booleanisEndTag()Predicate to determine if this tag is an end tag (i.e.voidremoveAttribute(java.lang.String key)Remove the attribute with the given key, if it exists.voidsetAttribute(java.lang.String key, java.lang.String value)Set attribute with given key, value pair.voidsetAttribute(java.lang.String key, java.lang.String value, char quote)Set attribute with given key, value pair where the value is quoted by quote.voidsetAttribute(Attribute attribute)Set an attribute.voidsetAttributeEx(Attribute attribute)Set an attribute.voidsetAttributesEx(java.util.Vector attribs)Sets the attributes.voidsetEmptyXmlTag(boolean emptyXmlTag)Set this tag to be an empty xml node, or not.voidsetEndTag(Tag end)Set the end tag for this (composite) tag.voidsetTagBegin(int tagBegin)Sets the nodeBegin.voidsetTagEnd(int tagEnd)Sets the nodeEnd.voidsetTagName(java.lang.String name)Set the name of this tag.voidsetText(java.lang.String text)Parses the given text to create the tag contents.voidsetThisScanner(Scanner scanner)Set the scanner associated with this tag.java.lang.StringtoHtml(boolean verbatim)Render the tag as HTML.java.lang.StringtoPlainTextString()Get the plain text from this node.java.lang.StringtoString()Print the contents of the tag.-
Methods inherited from class org.htmlparser.nodes.AbstractNode
clone, collectInto, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
-
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.htmlparser.Node
clone, collectInto, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
-
-
-
-
Field Detail
-
mDefaultScanner
protected static final Scanner mDefaultScanner
The default scanner for non-composite tags.
-
mAttributes
protected java.util.Vector mAttributes
The tag attributes. Objects of typeAttribute. The first element is the tag name, subsequent elements being either whitespace or real attributes.
-
breakTags
protected static java.util.Hashtable breakTags
Set of tags that breaks the flow.
-
-
Constructor Detail
-
TagNode
public TagNode()
Create an empty tag.
-
TagNode
public TagNode(Page page, int start, int end, java.util.Vector attributes)
Create a tag with the location and attributes provided- Parameters:
page- The page this tag was read from.start- The starting offset of this node within the page.end- The ending offset of this node within the page.attributes- The list of attributes that were parsed in this tag.- See Also:
Attribute
-
TagNode
public TagNode(TagNode tag, TagScanner scanner)
Create a tag like the one provided.- Parameters:
tag- The tag to emulate.scanner- The scanner for this tag.
-
-
Method Detail
-
getAttribute
public java.lang.String getAttribute(java.lang.String name)
Returns the value of an attribute.- Specified by:
getAttributein interfaceTag- Parameters:
name- Name of attribute, case insensitive.- Returns:
- The value associated with the attribute or null if it does not exist, or is a stand-alone or
- See Also:
Tag.setAttribute(java.lang.String, java.lang.String)
-
setAttribute
public void setAttribute(java.lang.String key, java.lang.String value)Set attribute with given key, value pair. Figures out a quote character to use if necessary.- Specified by:
setAttributein interfaceTag- Parameters:
key- The name of the attribute.value- The value of the attribute.- See Also:
Tag.getAttribute(java.lang.String),Tag.setAttribute(String,String,char)
-
removeAttribute
public void removeAttribute(java.lang.String key)
Remove the attribute with the given key, if it exists.- Specified by:
removeAttributein interfaceTag- Parameters:
key- The name of the attribute.
-
setAttribute
public void setAttribute(java.lang.String key, java.lang.String value, char quote)Set attribute with given key, value pair where the value is quoted by quote.- Specified by:
setAttributein interfaceTag- Parameters:
key- The name of the attribute.value- The value of the attribute.quote- The quote character to be used around value. If zero, it is an unquoted value.- See Also:
Tag.getAttribute(java.lang.String)
-
getAttributeEx
public Attribute getAttributeEx(java.lang.String name)
Returns the attribute with the given name.- Specified by:
getAttributeExin interfaceTag- Parameters:
name- Name of attribute, case insensitive.- Returns:
- The attribute or null if it does not exist.
- See Also:
Tag.setAttributeEx(org.htmlparser.Attribute)
-
setAttributeEx
public void setAttributeEx(Attribute attribute)
Set an attribute.- Specified by:
setAttributeExin interfaceTag- Parameters:
attribute- The attribute to set.- See Also:
setAttribute(Attribute)
-
setAttribute
public void setAttribute(Attribute attribute)
Set an attribute. This replaces an attribute of the same name. To set the zeroth attribute (the tag name), use setTagName().- Parameters:
attribute- The attribute to set.
-
getAttributesEx
public java.util.Vector getAttributesEx()
Gets the attributes in the tag.- Specified by:
getAttributesExin interfaceTag- Returns:
- Returns the list of
Attributesin the tag. The first element is the tag name, subsequent elements being either whitespace or real attributes. - See Also:
Tag.setAttributesEx(java.util.Vector)
-
getTagName
public java.lang.String getTagName()
Return the name of this tag.Note: This value is converted to uppercase and does not begin with "/" if it is an end tag. Nor does it end with a slash in the case of an XML type tag. To get at the original text of the tag name use
getRawTagName(). The conversion to uppercase is performed with an ENGLISH locale.- Specified by:
getTagNamein interfaceTag- Returns:
- The tag name.
- See Also:
Tag.setTagName(java.lang.String)
-
getRawTagName
public java.lang.String getRawTagName()
Return the name of this tag.- Specified by:
getRawTagNamein interfaceTag- Returns:
- The tag name or null if this tag contains nothing or only whitespace.
-
setTagName
public void setTagName(java.lang.String name)
Set the name of this tag. This creates or replaces the first attribute of the tag (the zeroth element of the attribute vector).- Specified by:
setTagNamein interfaceTag- Parameters:
name- The tag name.- See Also:
Tag.getTagName()
-
getText
public java.lang.String getText()
Return the text contained in this tag.- Specified by:
getTextin interfaceNode- Overrides:
getTextin classAbstractNode- Returns:
- The complete contents of the tag (within the angle brackets).
- See Also:
Node.setText(java.lang.String)
-
setAttributesEx
public void setAttributesEx(java.util.Vector attribs)
Sets the attributes. NOTE: Values of the extended hashtable are two element arrays of String, with the first element being the original name (not uppercased), and the second element being the value.- Specified by:
setAttributesExin interfaceTag- Parameters:
attribs- The attribute collection to set.- See Also:
Tag.getAttributesEx()
-
setTagBegin
public void setTagBegin(int tagBegin)
Sets the nodeBegin.- Parameters:
tagBegin- The nodeBegin to set
-
getTagBegin
public int getTagBegin()
Gets the nodeBegin.- Returns:
- The nodeBegin value.
-
setTagEnd
public void setTagEnd(int tagEnd)
Sets the nodeEnd.- Parameters:
tagEnd- The nodeEnd to set
-
getTagEnd
public int getTagEnd()
Gets the nodeEnd.- Returns:
- The nodeEnd value.
-
setText
public void setText(java.lang.String text)
Parses the given text to create the tag contents.- Specified by:
setTextin interfaceNode- Overrides:
setTextin classAbstractNode- Parameters:
text- A string of the form <TAGNAME xx="yy">.- See Also:
Node.getText()
-
toPlainTextString
public java.lang.String toPlainTextString()
Get the plain text from this node.- Specified by:
toPlainTextStringin interfaceNode- Specified by:
toPlainTextStringin classAbstractNode- Returns:
- An empty string (tag contents do not display in a browser).
If you want this tags HTML equivalent, use
toHtml().
-
toHtml
public java.lang.String toHtml(boolean verbatim)
Render the tag as HTML. A call to a tag'stoHtml()method will render it in HTML.- Specified by:
toHtmlin interfaceNode- Specified by:
toHtmlin classAbstractNode- Parameters:
verbatim- Iftruereturn as close to the original page text as possible.- Returns:
- The tag as an HTML fragment.
- See Also:
Node.toHtml()
-
toString
public java.lang.String toString()
Print the contents of the tag.- Specified by:
toStringin interfaceNode- Specified by:
toStringin classAbstractNode- Returns:
- An string describing the tag. For text that looks like HTML use #toHtml().
-
breaksFlow
public boolean breaksFlow()
Determines if the given tag breaks the flow of text.- Specified by:
breaksFlowin interfaceTag- Returns:
trueif following text would start on a new line,falseotherwise.
-
accept
public void accept(NodeVisitor visitor)
Default tag visiting code. Based onisEndTag(), calls eithervisitTag()orvisitEndTag().- Specified by:
acceptin interfaceNode- Specified by:
acceptin classAbstractNode- Parameters:
visitor- The visitor that is visiting this node.
-
isEmptyXmlTag
public boolean isEmptyXmlTag()
Is this an empty xml tag of the form <tag/>.- Specified by:
isEmptyXmlTagin interfaceTag- Returns:
- true if the last character of the last attribute is a '/'.
-
setEmptyXmlTag
public void setEmptyXmlTag(boolean emptyXmlTag)
Set this tag to be an empty xml node, or not. Adds or removes an ending slash on the tag.- Specified by:
setEmptyXmlTagin interfaceTag- Parameters:
emptyXmlTag- If true, ensures there is an ending slash in the node, i.e. <tag/>, otherwise removes it.
-
isEndTag
public boolean isEndTag()
Predicate to determine if this tag is an end tag (i.e. </HTML>).
-
getStartingLineNumber
public int getStartingLineNumber()
Get the line number where this tag starts.- Specified by:
getStartingLineNumberin interfaceTag- Returns:
- The (zero based) line number in the page where this tag starts.
-
getEndingLineNumber
public int getEndingLineNumber()
Get the line number where this tag ends.- Specified by:
getEndingLineNumberin interfaceTag- Returns:
- The (zero based) line number in the page where this tag ends.
-
getIds
public java.lang.String[] getIds()
Return the set of names handled by this tag. Since this a a generic tag, it has no ids.
-
getEnders
public java.lang.String[] getEnders()
Return the set of tag names that cause this tag to finish. These are the normal (non end tags) that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, the default is no enders.
-
getEndTagEnders
public java.lang.String[] getEndTagEnders()
Return the set of end tag names that cause this tag to finish. These are the end tags that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, it has no end tag enders.- Specified by:
getEndTagEndersin interfaceTag- Returns:
- The names of following end tags that stop further scanning.
-
getThisScanner
public Scanner getThisScanner()
Return the scanner associated with this tag.- Specified by:
getThisScannerin interfaceTag- Returns:
- The scanner associated with this tag.
- See Also:
Tag.setThisScanner(org.htmlparser.scanners.Scanner)
-
setThisScanner
public void setThisScanner(Scanner scanner)
Set the scanner associated with this tag.- Specified by:
setThisScannerin interfaceTag- Parameters:
scanner- The scanner for this tag.- See Also:
Tag.getThisScanner()
-
getEndTag
public Tag getEndTag()
Get the end tag for this (composite) tag. For a non-composite tag this always returnsnull.- Specified by:
getEndTagin interfaceTag- Returns:
- The tag that terminates this composite tag, i.e. </HTML>.
- See Also:
Tag.setEndTag(org.htmlparser.Tag)
-
setEndTag
public void setEndTag(Tag end)
Set the end tag for this (composite) tag. For a non-composite tag this is a no-op.- Specified by:
setEndTagin interfaceTag- Parameters:
end- The tag that terminates this composite tag, i.e. </HTML>.- See Also:
Tag.getEndTag()
-
-