Package org.htmlparser.lexer
Class InputStreamSource
- java.lang.Object
-
- java.io.Reader
-
- org.htmlparser.lexer.Source
-
- org.htmlparser.lexer.InputStreamSource
-
- All Implemented Interfaces:
java.io.Closeable,java.io.Serializable,java.lang.AutoCloseable,java.lang.Readable
public class InputStreamSource extends Source
A source of characters based on an InputStream such as from a URLConnection.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static intBUFFER_SIZEAn initial buffer size.protected char[]mBufferThe characters read so far.protected java.lang.StringmEncodingThe character set in use.protected intmLevelThe number of valid bytes in the buffer.protected intmMarkThe bookmark.protected intmOffsetThe offset of the next byte returned by read().protected java.io.InputStreamReadermReaderThe converter from bytes to characters.protected java.io.InputStreammStreamThe stream of bytes.
-
Constructor Summary
Constructors Constructor Description InputStreamSource(java.io.InputStream stream)Create a source of characters using the default character set.InputStreamSource(java.io.InputStream stream, java.lang.String charset)Create a source of characters.InputStreamSource(java.io.InputStream stream, java.lang.String charset, int size)Create a source of characters.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intavailable()Get the number of available characters.voidclose()Does nothing.voiddestroy()Close the source.protected voidfill(int min)Fetch more characters from the underlying reader.chargetCharacter(int offset)Retrieve a character again.voidgetCharacters(char[] array, int offset, int start, int end)Retrieve characters again.voidgetCharacters(java.lang.StringBuffer buffer, int offset, int length)Append characters already read into aStringBuffer.java.lang.StringgetEncoding()Get the encoding being used to convert characters.java.io.InputStreamgetStream()Get the input stream being used.java.lang.StringgetString(int offset, int length)Retrieve a string.voidmark(int readAheadLimit)Mark the present position in the source.booleanmarkSupported()Tell whether this source supports the mark() operation.intoffset()Get the position (in characters).intread()Read a single character.intread(char[] cbuf)Read characters into an array.intread(char[] cbuf, int off, int len)Read characters into a portion of an array.booleanready()Tell whether this source is ready to be read.voidreset()Reset the source.voidsetEncoding(java.lang.String character_set)Begins reading from the source with the given character set.longskip(long n)Skip characters.voidunread()Undo the read of a single character.
-
-
-
Field Detail
-
BUFFER_SIZE
public static int BUFFER_SIZE
An initial buffer size. Has a default value of {16384}.
-
mStream
protected transient java.io.InputStream mStream
The stream of bytes. Set tonullwhen the source is closed.
-
mEncoding
protected java.lang.String mEncoding
The character set in use.
-
mReader
protected transient java.io.InputStreamReader mReader
The converter from bytes to characters.
-
mBuffer
protected char[] mBuffer
The characters read so far.
-
mLevel
protected int mLevel
The number of valid bytes in the buffer.
-
mOffset
protected int mOffset
The offset of the next byte returned by read().
-
mMark
protected int mMark
The bookmark.
-
-
Constructor Detail
-
InputStreamSource
public InputStreamSource(java.io.InputStream stream) throws java.io.UnsupportedEncodingExceptionCreate a source of characters using the default character set.- Parameters:
stream- The stream of bytes to use.- Throws:
java.io.UnsupportedEncodingException- If the default character set is unsupported.
-
InputStreamSource
public InputStreamSource(java.io.InputStream stream, java.lang.String charset) throws java.io.UnsupportedEncodingExceptionCreate a source of characters.- Parameters:
stream- The stream of bytes to use.charset- The character set used in encoding the stream.- Throws:
java.io.UnsupportedEncodingException- If the character set is unsupported.
-
InputStreamSource
public InputStreamSource(java.io.InputStream stream, java.lang.String charset, int size) throws java.io.UnsupportedEncodingExceptionCreate a source of characters.- Parameters:
stream- The stream of bytes to use.charset- The character set used in encoding the stream.size- The initial character buffer size.- Throws:
java.io.UnsupportedEncodingException- If the character set is unsupported.
-
-
Method Detail
-
getStream
public java.io.InputStream getStream()
Get the input stream being used.- Returns:
- The current input stream.
-
getEncoding
public java.lang.String getEncoding()
Get the encoding being used to convert characters.- Specified by:
getEncodingin classSource- Returns:
- The current encoding.
-
setEncoding
public void setEncoding(java.lang.String character_set) throws ParserExceptionBegins reading from the source with the given character set. If the current encoding is the same as the requested encoding, this method is a no-op. Otherwise any subsequent characters read from this page will have been decoded using the given character set.Some magic happens here to obtain this result if characters have already been consumed from this source. Since a Reader cannot be dynamically altered to use a different character set, the underlying stream is reset, a new Source is constructed and a comparison made of the characters read so far with the newly read characters up to the current position. If a difference is encountered, or some other problem occurs, an exception is thrown.
- Specified by:
setEncodingin classSource- Parameters:
character_set- The character set to use to convert bytes into characters.- Throws:
ParserException- If a character mismatch occurs between characters already provided and those that would have been returned had the new character set been in effect from the beginning. An exception is also thrown if the underlying stream won't put up with these shenanigans.
-
fill
protected void fill(int min) throws java.io.IOExceptionFetch more characters from the underlying reader. Has no effect if the underlying reader has been drained.- Parameters:
min- The minimum to read.- Throws:
java.io.IOException- If the underlying reader read() throws one.
-
close
public void close() throws java.io.IOExceptionDoes nothing. It's supposed to close the source, but use destroy() instead.
-
read
public int read() throws java.io.IOExceptionRead a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.
-
read
public int read(char[] cbuf, int off, int len) throws java.io.IOExceptionRead characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.- Specified by:
readin classSource- Parameters:
cbuf- Destination bufferoff- Offset at which to start storing characterslen- Maximum number of characters to read- Returns:
- The number of characters read, or
EOFif the end of the stream has been reached - Throws:
java.io.IOException- If an I/O error occurs.
-
read
public int read(char[] cbuf) throws java.io.IOExceptionRead characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
-
reset
public void reset() throws java.lang.IllegalStateExceptionReset the source. Repositions the read point to begin at zero.
-
markSupported
public boolean markSupported()
Tell whether this source supports the mark() operation.- Specified by:
markSupportedin classSource- Returns:
true.
-
mark
public void mark(int readAheadLimit) throws java.io.IOExceptionMark the present position in the source. Subsequent calls toreset()will attempt to reposition the source to this point.
-
ready
public boolean ready() throws java.io.IOExceptionTell whether this source is ready to be read.
-
skip
public long skip(long n) throws java.io.IOException, java.lang.IllegalArgumentExceptionSkip characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached. Note: n is treated as an int
-
unread
public void unread() throws java.io.IOExceptionUndo the read of a single character.
-
getCharacter
public char getCharacter(int offset) throws java.io.IOExceptionRetrieve a character again.- Specified by:
getCharacterin classSource- Parameters:
offset- The offset of the character.- Returns:
- The character at
offset. - Throws:
java.io.IOException- If the offset is beyondoffset()or the source is closed.
-
getCharacters
public void getCharacters(char[] array, int offset, int start, int end) throws java.io.IOExceptionRetrieve characters again.- Specified by:
getCharactersin classSource- Parameters:
array- The array of characters.offset- The starting position in the array where characters are to be placed.start- The starting position, zero based.end- The ending position (exclusive, i.e. the character at the ending position is not included), zero based.- Throws:
java.io.IOException- If the start or end is beyondoffset()or the source is closed.
-
getString
public java.lang.String getString(int offset, int length) throws java.io.IOExceptionRetrieve a string.- Specified by:
getStringin classSource- Parameters:
offset- The offset of the first character.length- The number of characters to retrieve.- Returns:
- A string containing the
lengthcharacters atoffset. - Throws:
java.io.IOException- If the offset or (offset + length) is beyondoffset()or the source is closed.
-
getCharacters
public void getCharacters(java.lang.StringBuffer buffer, int offset, int length) throws java.io.IOExceptionAppend characters already read into aStringBuffer.- Specified by:
getCharactersin classSource- Parameters:
buffer- The buffer to append to.offset- The offset of the first character.length- The number of characters to retrieve.- Throws:
java.io.IOException- If the offset or (offset + length) is beyondoffset()or the source is closed.
-
destroy
public void destroy() throws java.io.IOException
-
offset
public int offset()
Get the position (in characters).
-
-