Agent System POND 1.2 (28.2.2002)

FIM.Util.WWW
Class HTMLBreakFilterReader

java.lang.Object
  |
  +--java.io.Reader
        |
        +--java.io.FilterReader
              |
              +--FIM.Util.WWW.HTMLBreakFilterReader

public class HTMLBreakFilterReader
extends FilterReader

Converts a stream of HTML-code to a canonical format of linebreaks. Removes all linebreaks (\r and \n) and changes tabs to a single space. Afterwards, in front of <p>, </p> and <br> "\n" is inserted. The same happens in front of <table> and <tr>, although only if no double-linebreaks are created through this.

Version:
1.0, 1.7.2000
Author:
Michael Sonntag

Field Summary
protected  String buffer
          A buffer for reading a whole tag.
protected  int lastChar
          The last character we read from the input stream.
protected  boolean lastWhitespace
          Marks if the last characters returned was a linebreak (following whitespaces are ignored for this).
 
Fields inherited from class java.io.FilterReader
in
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
HTMLBreakFilterReader(Reader in)
          Creates a new reader.
 
Method Summary
protected  int getNextChar()
          Retrieves the next character.
 boolean markSupported()
          Tell whether this stream supports the mark() operation (this stream does not).
 int read()
          Read a single character.
 int read(char[] cbuf, int off, int len)
          Read characters into an array.
 long skip(long n)
          Skip characters.
 
Methods inherited from class java.io.FilterReader
close, mark, ready, reset
 
Methods inherited from class java.io.Reader
read
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lastChar

protected int lastChar
The last character we read from the input stream.

lastWhitespace

protected boolean lastWhitespace
Marks if the last characters returned was a linebreak (following whitespaces are ignored for this).

buffer

protected String buffer
A buffer for reading a whole tag.
Constructor Detail

HTMLBreakFilterReader

public HTMLBreakFilterReader(Reader in)
Creates a new reader.
Parameters:
in - the reader to read from
Method Detail

markSupported

public boolean markSupported()
Tell whether this stream supports the mark() operation (this stream does not).
Overrides:
markSupported in class FilterReader
Returns:
always false

read

public int read(char[] cbuf,
                int off,
                int len)
         throws IOException
Read characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
Overrides:
read in class FilterReader
Parameters:
cbuf - destination buffer
off - the offset in the buffer the first character will be written to
len - the number of characters to read
Returns:
the number of bytes read, or -1 if the end of the stream has been reached
Throws:
IOException - if an I/O error occurs

read

public int read()
         throws IOException
Read a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.
Overrides:
read in class FilterReader
Returns:
the character read, as an integer in the range 0 to 16383 (0x00-0xffff), or -1 if the end of the stream has been reached
Throws:
IOException - if an I/O error occurs

getNextChar

protected int getNextChar()
                   throws IOException
Retrieves the next character. Skips \r, \n and converts tabs to spaces.
Returns:
the next character or -1 if end of stream reached
Throws:
IOException - if an I/O error occurs

skip

public long skip(long n)
          throws IOException
Skip characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached.
Overrides:
skip in class FilterReader
Parameters:
n - the number of characters to skip
Returns:
the number of characters actually skipped
Throws:
IOException - if an I/O error occurs

Agent System POND 1.2 (28.2.2002)

Submit a bug

Copyright 2001,2002 Michael Sonntag & Institute for Information Processing and Microprocessor Technology (FIM), Johannes-Kepler-University Linz, Altenbergerstr. 69, A-4040 Linz, Austria.