Agent System POND 1.2 (28.2.2002)

FIM.Util.WWW
Class WWWPageUtils

java.lang.Object
  |
  +--FIM.Util.WWW.WWWPageUtils

public class WWWPageUtils
extends Object

Utility functions for managing WWW-Pages. Allows retrieving webpages and posting forms. Includes support for using cookies.

Version:
1.0, 1.7.2000
Author:
Michael Sonntag

Field Summary
static int MAX_REDIRECT_DEPTH
          Maximum number of redirects to follow (MAX_REDIRECT_DEPTH+1 will be the final page).
 
Constructor Summary
WWWPageUtils()
           
 
Method Summary
static String canonicalize(String str)
          Takes a string of HTML as an input an canonicalizes it.
static String fetchForm(String page, CookieStore cookies, String request, URL referer, boolean post)
          Send a form and retrieve the response.
static String fetchForm(URL page, CookieStore cookies, String request, URL referer, boolean post)
          Send a form and retrieve the response.
static String fetchPage(String page, CookieStore cookies)
          Retrieve a page and return it as a single string.
static String fetchPage(URL page, CookieStore cookies)
          Retrieve a page and return it as a single string.
static String stripTags(String str)
          Strips all tags from a string.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_REDIRECT_DEPTH

public static final int MAX_REDIRECT_DEPTH
Maximum number of redirects to follow (MAX_REDIRECT_DEPTH+1 will be the final page).
Constructor Detail

WWWPageUtils

public WWWPageUtils()
Method Detail

fetchPage

public static String fetchPage(String page,
                               CookieStore cookies)
                        throws MalformedURLException,
                               IOException
Retrieve a page and return it as a single string. Lines are separated by '\n'.
Parameters:
page - the page to fetch
cookies - the CookieStore to use (if null cookies will be ignored and none given out)
Returns:
the page as a single string
Throws:
IOException - if an error occured during fetching the page
MalformedURLException - if the string could not be converted to an URL

fetchPage

public static String fetchPage(URL page,
                               CookieStore cookies)
                        throws IOException
Retrieve a page and return it as a single string. Lines are separated by '\n'. Must be synchronized as fecthPage and fetchForm both modify HttpURLConnection.setFollowRedirects(), which is a static method and the same for all connections.
Parameters:
page - the page to fetch
cookies - the CookieStore to use (if null cookies will be ignored and none given out)
Returns:
the page as a single string
Throws:
IOException - if an error occured during fetching the page

fetchForm

public static String fetchForm(String page,
                               CookieStore cookies,
                               String request,
                               URL referer,
                               boolean post)
                        throws MalformedURLException,
                               IOException
Send a form and retrieve the response.
Parameters:
page - the URL the form to post to. Must include the full action (e. g. "http://www.acme.com/doIt), but not the parameters (e. g. "?do=action&param=now+and+then")
cookies - the CookieStore to use (if null cookies will be ignored and none given out)
request - the request of the page (i. e. the content of the inputs; e. g. "?do=action&param=now+and+then")
referer - the URL of the page containing the form (request property "Referer") (not set if null)
post - if true, submit method is POST, otherwise GET
Returns:
the page as a single string
Throws:
IOException - if an error occured during fetching the page
MalformedURLException - if the string could not be converted to an URL

fetchForm

public static String fetchForm(URL page,
                               CookieStore cookies,
                               String request,
                               URL referer,
                               boolean post)
                        throws IOException
Send a form and retrieve the response.
Parameters:
page - the URL the form to post to. Must include the full action (e. g. "http://www.acme.com/doIt), but not the parameters (e. g. "?do=action&param=now+and+then")
cookies - the CookieStore to use (if null cookies will be ignored and none given out)
request - the request of the page (i. e. the content of the inputs; e. g. "?do=action&param=now+and+then")
referer - the URL of the page containing the form (request property "Referer") (not set if null)
post - if true, submit method is POST, otherwise GET
Returns:
the page as a single string
Throws:
IOException - if an error occured during fetching the page
MalformedURLException - if the string could not be converted to an URL

stripTags

public static String stripTags(String str)
Strips all tags from a string. This will remove everything between '<' and '>'. Converts <br>, <p> and </p> to "\n" and removes all other linebreaks and tabs. Also converts the special characters (&nbsp; = " ", &auml; = "ä", ...) and types of "&#160;".
Returns:
the input string stripped from all HTML-tags

canonicalize

public static String canonicalize(String str)
Takes a string of HTML as an input an canonicalizes it. All linebreaks and multiple spaces within tags are removed. E. g. "<\n / p> will be changed to "</p>".
Parameters:
str - the input string
Returns:
the canonicalized string

Agent System POND 1.2 (28.2.2002)

Submit a bug

Copyright 2001,2002 Michael Sonntag & Institute for Information Processing and Microprocessor Technology (FIM), Johannes-Kepler-University Linz, Altenbergerstr. 69, A-4040 Linz, Austria.