org.das2.util.filesystem.HtmlUtil

HTML utilities, such as getting a directory listing, where a "file" is a link below the directory we are listing, and read a URL into a String.

HtmlUtil( )


checkRedirect

checkRedirect( java.net.URLConnection urlConnection ) → java.net.URLConnection

check for 301, 302 or 303 redirects, and return a new connection in this case. This should be called immediately before the urlConnection.connect call, as this must connect to get the response code.

Parameters

urlConnection - if an HttpUrlConnection, check for 301 or 302; return connection otherwise.

Returns:

a connection, typically the same one as passed in.

See Also:

HttpUtil#checkRedirect(java.net.URLConnection)


[search for examples] [view on GitHub] [view on old javadoc] [view source]


consumeStream

consumeStream( java.io.InputStream err ) → void

nice clients consume both the stderr and stdout coming from websites. This reads everything off of the stream and closes it. http://docs.oracle.com/javase/1.5.0/docs/guide/net/http-keepalive.html suggests that you "do not abandon connection"

Parameters

err - the input stream

Returns:

void (returns nothing)

See Also:

HttpUtil#consumeStream(java.io.InputStream)


[search for examples] [view on GitHub] [view on old javadoc] [view source]


getDirectoryListing

getDirectoryListing( java.net.URL url, java.io.InputStream urlStream ) → java.net.URL[]

Get the listing of the web directory, returning links that are "under" the given URL. Note this does not handle off-line modes where we need to log into a website first, as is often the case for a hotel. This was refactored to support caching of listings by simply writing the content to disk.

Parameters

url - the address.
urlStream - stream containing the URL content, which must be UTF-8 (or US-ASCII)

Returns:

list of URIs referred to in the page.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getDirectoryListing( java.net.URL url, java.io.InputStream urlStream, boolean childCheck ) → java.net.URL[]
getDirectoryListing( java.net.URL url ) → java.net.URL[]

getInputStream

getInputStream( java.net.URL url ) → java.io.InputStream

get the inputStream, following redirects if a 301 or 302 is encountered. The scientist may be prompted for a password, but only if "user@" is in the URL. Note this does not explicitly close the connections to the server, and Java may not know to release the resources. TODO: fix this by wrapping the input stream and closing the connection when the stream is closed. This was done in Autoplot's DataSetURI.downloadResourceAsTempFile

Parameters

url - an URL

Returns:

input stream

See Also:

org.autoplot.datasource.DataSetURI#downloadResourceAsTempFile


[search for examples] [view on GitHub] [view on old javadoc] [view source]


getLinks

getLinks( java.net.URL url, String content ) → java.util.List

return the links found in the content, using url as the context.

Parameters

url - null or the url for the context.
content - the html content.

Returns:

a list of URLs.

[search for examples] [view on GitHub] [view on old javadoc] [view source]


getMetadata

getMetadata( java.net.URL url, java.util.Map props ) → java.util.Map

return the metadata about a URL. This will support http, https, and ftp, and will check for redirects. This will allow caching of head requests.

Parameters

url - ftp,https, or http URL
props - a java.util.Map

Returns:

the metadata

See Also:

HttpUtil#getMetadata(java.net.URL, java.util.Map)


[search for examples] [view on GitHub] [view on old javadoc] [view source]


isDirectory

isDirectory( java.net.URL url ) → boolean

Parameters

url - an URL

Returns:

boolean

[search for examples] [view on GitHub] [view on old javadoc] [view source]


readToString

readToString( java.net.URL url ) → String

read the contents of the URL into a string, assuming UTF-8 encoding.

Parameters

url - an URL

Returns:

a String

[search for examples] [view on GitHub] [view on old javadoc] [view source]