org.das2.qds.util.AsciiParser
Class for reading ASCII tables into a QDataSet. This parses a file by breaking
it up into records, and passing the record off to a delegate record parser.
The record parser then breaks up the record into fields, and each field is
parsed by a delegate field parser. Each column of the table has a Unit, field name,
and field label associated with it.
Examples of record parsers include
DelimParser, which splits the record by a delimiter such as a tab or comma,
RegexParser, which processes each record with a regular expression to get the fields,
and FixedColumnsParser, which splits the record by character positions.
Example of field parsers include DOUBLE_PARSER which parses the value
as a double, and UNITS_PARSER, which uses the Unit attached to the column
to interpret the value.
When the first record with the correct number of fields is found but is not
parseable, we look for field labels and units.
The skipLines property tells the parser to skip a given number of header lines
before attempting to parse the record. Also, commentPrefix identifies lines to be
ignored. In either the header or in comments, we look for propertyPattern, and
if a property is matched, then the builder property
is set. Two Patterns are provided NAME_COLON_VALUE_PATTERN and
NAME_EQUAL_VALUE_PATTERN for convenience.
Adapted to QDataSet model, Jeremy, May 2007.
AsciiParser( )
Creates a new instance. This is created and then
configured before any files can be parsed.
NAME_COLON_VALUE_PATTERN
pattern for name:value.
NAME_EQUAL_VALUE_PATTERN
pattern for name=value.
PROPERTY_FIELD_NAMES
PROPERTY_FILE_HEADER
PROPERTY_FIRST_RECORD
PROPERTY_FIELD_PARSER
DELIM_COMMA
DELIM_TAB
DELIM_WHITESPACE
UNIT_UTC
Convenient unit for parsing UTC times.
PROP_HEADERDELIMITER
DOUBLE_PARSER
parses the field using Double.parseDouble, Java's double parser.
UNITS_PARSER
delegates to the unit object set for this field to parse the data.
ENUMERATION_PARSER
uses the EnumerationUnits for the field to create a Datum.
PROP_VALIDMIN
PROP_VALIDMAX
addPropertyChangeListener
addPropertyChangeListener( java.beans.PropertyChangeListener l ) → void
Adds a PropertyChangeListener to the listener list.
Parameters
l - The listener to add.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getDelimParser
getDelimParser( int fieldCount, String delim ) → org.das2.qds.util.AsciiParser.DelimParser
provide more control to external codes by providing a way to assert that
an N-column delim parser should be used.
Parameters
fieldCount - an int
delim - the delimiter pattern, such as "," or "\s+"
Returns:
the DelimParser.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getFieldCount
getFieldCount( ) → int
return the number of fields in each record. Note the RecordParsers
also have a fieldCount, which should be equal to this. This allows them
to be independent of the parser.
Returns:
an int
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getFieldIndex
getFieldIndex( String string ) → int
returns the index of the field. Supports the name, or field0, or 0, etc.
returns -1 when the column is not identified.
Parameters
string - the label for the field, such as "field2" or "time"
Returns:
-1 or the index of the field.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getFieldLabels
getFieldLabels( ) → String[]
return the labels found for each field. If a label wasn't found,
then the name is returned.
Returns:
a String[]
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getFieldNames
getFieldNames( ) → String[]
return the name of each field. field0, field1, ... are the default names when
names are not discovered in the table. Changing the array will not affect
internal representation.
Returns:
a String[]
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getFieldUnits
getFieldUnits( ) → String[]
return the units that were associated with the field. This might also be
the channel label for spectrograms.
In "field0(str)" or "field0[str]" this is str.
elements may be null if not found.
Returns:
a String[]
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getFillValue
getFillValue( ) → double
return the fillValue. numbers that parse to this value are considered
to be fill. Note validMin and validMax may be used as well.
Returns:
Value of property fillValue.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getHeaderDelimiter
getHeaderDelimiter( ) → String
get the header delimiter
Returns:
the header delimiter.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getRecordParser
getRecordParser( ) → org.das2.qds.util.AsciiParser.RecordParser
Getter for property recordParser.
Returns:
Value of property recordParser.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getRegexForFormat
getRegexForFormat( String format ) → String
Convert FORTRAN (F77) style format to C-style format specifiers.
Parameters
format - for example "%5d%5d%9f%s"
Returns:
for example "d5,d5,f9,a"
See Also:
org.autoplot.metatree.MetadataUtil#normalizeFormatSpecifier
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getRegexParser
getRegexParser( String regex ) → org.das2.qds.util.AsciiParser.RegexParser
return a regex parser for the given regular expression. Groups are used
for the fields, for example getRegexParser( 'X (\d+) (\d+)' ) would
parse lines like "X 00005 00006".
Parameters
regex - a String
Returns:
the regex parser
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getRegexParserForFormat
getRegexParserForFormat( String format ) → org.das2.qds.util.AsciiParser.RegexParser
see private TimeParser(String formatString, Map fieldHandlers),
which is very similar.
- "%5d%5d%9f%s"
- "d5,d5,f9,a"
Parameters
format - a String
Returns:
an org.das2.qds.util.AsciiParser.RegexParser
See Also:
org.das2.datum.TimeParser
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getRichFields
getRichFields( ) → java.util.Map
returns the high rank rich fields in a map from NAME to LABEL.
NAME:>fieldX< or NAME:>fieldX-fieldY<
Returns:
the high rank rich fields in a map from NAME to LABEL.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getUnits
getUnits( int index ) → Units
Indexed getter for property units.
Parameters
index - Index of the property.
Returns:
Value of the property at index
.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getValidMax
getValidMax( ) → double
get the maximum value for any field.
Returns:
the validMax
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
getValidMin
getValidMin( ) → double
get the minimum valid value for any field.
Returns:
validMin
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
guessDelimParser
guessDelimParser( String line ) → org.das2.qds.util.AsciiParser.DelimParser
Parameters
line - a String
Returns:
org.das2.qds.util.AsciiParser.DelimParser
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
guessDelimParser( String line, int lineNumber ) → org.das2.qds.util.AsciiParser.DelimParser
guessFieldCount
guessFieldCount( String filename ) → int
return the field count that would result in the largest number of records parsed. The
entire file is scanned, and for each line the number of decimal fields is counted. At the end
of the scan, the fieldCount with the highest record count is returned.
Parameters
filename - the file name, a local file opened with a FileReader
Returns:
the apparent field count.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
guessLengthForFormat
guessLengthForFormat( String format ) → int
return the length of the format specifier. %30d -> 30 %30d%5f -> 35.
TODO: consider String.format(format,1) or String.format(format,1.0).
Parameters
format - a String
Returns:
an int
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
guessSkipAndDelimParser
guessSkipAndDelimParser( String filename ) → org.das2.qds.util.AsciiParser.DelimParser
read in records, allowing for a header of non-records before
guessing the delim parser. This will return a reference to the
DelimParser and set skipLines. DelimParser header field is set as well.
One must set the record parser explicitly.
Parameters
filename - a String
Returns:
the record parser to use, or null if no records are found.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
guessSkipLines
guessSkipLines( String filename, org.das2.qds.util.AsciiParser.RecordParser recParser ) → int
try to figure out how many lines to skip by looking for the line where
the number of fields becomes stable.
Parameters
filename - a String
recParser - an AsciiParser.RecordParser
Returns:
an int
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
isHeader
isHeader( int iline, String lastLine, String thisLine, int recCount ) → boolean
returns true if the line is a header or comment.
Parameters
iline - the line number in the file, starting with 0.
lastLine - the last line read.
thisLine - the line we are testing.
recCount - the number of records successfully read.
Returns:
true if the line is a header line.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
isIso8601Time
isIso8601Time( String s ) → boolean
quick-n-dirty check to see if a string appears to be an ISO8601 time.
minimally 2000-002T00:00, but also 2000-01-01T00:00:00Z etc.
Note that an external code may explicitly indicate that the field is a time,
This is just to catch things that are obviously times.
Parameters
s - a String
Returns:
true if this is clearly an ISO time.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
isKeepFileHeader
isKeepFileHeader( ) → boolean
Getter for property keepHeader.
Returns:
Value of property keepHeader.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
isRichHeader
isRichHeader( String header ) → boolean
return true if the header appears to contain JSON code which could be
interpreted as a "Rich Header" (a.k.a. JSONHeadedASCII). This is
a very simple test, simply looking for #{
and #}
with a colon contained within.
Parameters
header - string containing the commented header.
Returns:
true if parsing as a Rich Header should be attempted.
See Also:
https://github.com/JSONheadedASCII/examples
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
isRichHeader( ) → boolean
newParser
newParser( int fieldCount ) → org.das2.qds.util.AsciiParser
creates a parser with @param fieldCount fields, named "field0,...,fieldN"
Parameters
fieldCount - the number of fields
Returns:
the file parser
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
newParser( String[] fieldNames ) → org.das2.qds.util.AsciiParser
readFile
readFile( String filename, ProgressMonitor mon ) → org.das2.qds.WritableDataSet
Parse the file using the current settings.
Parameters
filename - the file to read
mon - a monitor
Returns:
a rank 2 dataset.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
readFirstParseableRecord
readFirstParseableRecord( String filename ) → String
returns the first record that the record parser parses successfully. The
recordParser should be set and configured enough to identify the fields.
If no records can be parsed, then null is returned.
The first record should be in the first 1000 lines.
Parameters
filename - a String
Returns:
the first parseable line, or null if no such line exists.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
readFirstRecord
readFirstRecord( String filename ) → String
return the first record that the parser would parse. If skipLines is
more than the total number of lines, or all lines are comments, then null
is returned.
Parameters
filename - a String
Returns:
the first line after skip lines and comment lines.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
readFirstRecord( java.io.BufferedReader reader ) → String
readStream
readStream( java.io.Reader in, ProgressMonitor mon ) → org.das2.qds.WritableDataSet
Parse the stream using the current settings.
Parameters
in - the input stream
mon - a ProgressMonitor
Returns:
an org.das2.qds.WritableDataSet
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
readStream( java.io.Reader in, String firstRecord, ProgressMonitor mon ) → org.das2.qds.WritableDataSet
readString
readString( String str, ProgressMonitor mon ) → org.das2.qds.WritableDataSet
Parameters
str - the data, encoded in a UTF-8 string
mon - null or a progress monitor
Returns:
the data
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
removePropertyChangeListener
removePropertyChangeListener( java.beans.PropertyChangeListener l ) → void
Removes a PropertyChangeListener from the listener list.
Parameters
l - The listener to remove.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setCommentPrefix
setCommentPrefix( String comment ) → void
Records starting with this are not processed as data, for example "#".
This is initially "#". Setting this to null disables this check.
Parameters
comment - the prefix
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setDelimParser
setDelimParser( String filename, String delimRegex ) → org.das2.qds.util.AsciiParser.DelimParser
The DelimParser splits each record into fields using a delimiter like ","
or "\\s+".
Parameters
filename - filename to read in.
delimRegex - the delimiter, such as "," or "\t" or "\s+"
Returns:
the record parser that will split each line into fields
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setDelimParser( String line, String delimRegex, int expectedColumnCount ) → org.das2.qds.util.AsciiParser.DelimParser
setDelimParser( java.io.Reader in, String delimRegex ) → org.das2.qds.util.AsciiParser.DelimParser
setFieldParser
setFieldParser( int field, org.das2.qds.util.AsciiParser.FieldParser fp ) → void
set the special parser for a field.
Parameters
field - the field number, 0 is the first column.
fp - the parser
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setFillValue
setFillValue( double fillValue ) → void
numbers that parse to this value are considered to be fill.
Parameters
fillValue - New value of property fillValue.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setFixedColumnsParser
setFixedColumnsParser( String filename, String delim ) → org.das2.qds.util.AsciiParser.FixedColumnsParser
looks at the first line after skipping, and splits it to calculate where
the columns are. The FixedColumnsParser is the fastest of the three parsers.
Parameters
filename - filename to read in.
delim - regex to split the initial line into the fixed columns.
Returns:
the record parser that will split each line.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setFixedColumnsParser( java.io.Reader in, String delim ) → org.das2.qds.util.AsciiParser.FixedColumnsParser
setFixedColumnsParser( int[] columnOffsets, int[] columnWidths, org.das2.qds.util.AsciiParser.FieldParser[] parsers ) → org.das2.qds.util.AsciiParser.FixedColumnsParser
setHeaderDelimiter
setHeaderDelimiter( String headerDelimiter ) → void
set the delimiter which explicitly separates header from the data.
For example "-------" could be used. Normally the parser just looks at
the number of fields and this is sufficient.
Parameters
headerDelimiter - a String
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setKeepFileHeader
setKeepFileHeader( boolean keepHeader ) → void
Setter for property keepHeader. By default false but if true, the file header
ignored by skipLines is put into the property PROPERTY_FILE_HEADER.
Parameters
keepHeader - New value of property keepHeader.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setPropertyPattern
setPropertyPattern( java.util.regex.Pattern propertyPattern ) → void
specify the Pattern used to recognize properties. Note property
values are not parsed, they are provided as Strings. This is a regular
expression with two groups for the property name and value.
For example, (.+)=(.+)
Parameters
propertyPattern - regular expression Pattern with two groups.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setRecordCountLimit
setRecordCountLimit( int recordCountLimit ) → void
limit the number of records read. parsing will stop once this number of
records is read into the result. This is Integer.MAX_VALUE by default.
Parameters
recordCountLimit - an int
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setRecordParser
setRecordParser( org.das2.qds.util.AsciiParser.RecordParser recordParser ) → void
Setter for property recordParser.
Parameters
recordParser - New value of property recordParser.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setRecordStart
setRecordStart( int recordStart ) → void
set the number of records to skip before accumulating the result.
Parameters
recordStart - an int
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setRegexParser
setRegexParser( String[] fieldNames ) → org.das2.qds.util.AsciiParser.RecordParser
The regex parser is a slow parser, but gives precise control.
Parameters
fieldNames - a java.lang.String[]
Returns:
the parser for each record.
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setSkipLines
setSkipLines( int skipLines ) → void
skip a number of lines before trying to parse anything. This can be
set to point at the first valid line, and the RecordParser will be
configured using that line.
Parameters
skipLines - an int
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setUnits
setUnits( int index, Units units ) → void
Indexed setter for property units. This now sets the field parser for
the field to be a UNITS_PARSER if it is the default DOUBLE_PARSER.
Parameters
index - Index of the property.
units - New value of the property at index
.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setUnits( Units[] u ) → void
setValidMax
setValidMax( double validMax ) → void
set the maximum value for any field. Values above this are to be
considered invalid.
Parameters
validMax - a double
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setValidMin
setValidMin( double validMin ) → void
set the minimum valid value for any field. Values less than
this are to be considered invalid.
Parameters
validMin - a double
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]
setWhereConstraint
setWhereConstraint( String sparm, String op, String sval ) → void
allow constraint for where condition is true. This doesn't
need the data to be interpreted for "eq", string equality is checked
for nominal data. Note sval is compared after trimming outside spaces.
Parameters
sparm - column name, such as "field4"
op - constraint, one of eq gt ge lt le ne
sval - String value. For nominal columns, String equality is used.
Returns:
void (returns nothing)
[search for examples]
[view on GitHub]
[view on old javadoc]
[view source]