public class AsciiParser
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
class |
AsciiParser.DelimParser
DelimParser splits the line on a regex (like "," or "\\s+") to create the fields.
|
static interface |
AsciiParser.FieldParser
A FieldParser takes character data and returns a number representing
the data.
|
class |
AsciiParser.FixedColumnsParser
Record parser looks at fixed column positions for each record.
|
static interface |
AsciiParser.RecordParser |
static class |
AsciiParser.RegexParser
parser uses a regular expression to match each record.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DELIM_COMMA |
static java.lang.String |
DELIM_TAB |
static java.lang.String |
DELIM_WHITESPACE |
static AsciiParser.FieldParser |
DOUBLE_PARSER
parses the field using Double.parseDouble, Java's double parser.
|
AsciiParser.FieldParser |
ENUMERATION_PARSER
uses the EnumerationUnits for the field to create a Datum.
|
protected java.lang.String |
headerDelimiter |
static java.util.regex.Pattern |
NAME_COLON_VALUE_PATTERN
pattern for name:value.
|
static java.util.regex.Pattern |
NAME_EQUAL_VALUE_PATTERN
pattern for name=value.
|
static java.lang.String |
PROP_HEADERDELIMITER |
static java.lang.String |
PROP_VALIDMAX |
static java.lang.String |
PROP_VALIDMIN |
static java.lang.String |
PROPERTY_FIELD_NAMES |
static java.lang.String |
PROPERTY_FIELD_PARSER |
static java.lang.String |
PROPERTY_FILE_HEADER |
static java.lang.String |
PROPERTY_FIRST_RECORD |
static Units |
UNIT_UTC
Convenient unit for parsing UTC times.
|
AsciiParser.FieldParser |
UNITS_PARSER
delegates to the unit object set for this field to parse the data.
|
protected double |
validMax |
protected double |
validMin |
Constructor and Description |
---|
AsciiParser()
Creates a new instance.
|
Modifier and Type | Method and Description |
---|---|
void |
addPropertyChangeListener(java.beans.PropertyChangeListener l)
Adds a PropertyChangeListener to the listener list.
|
AsciiParser.DelimParser |
getDelimParser(int fieldCount,
java.lang.String delim)
provide more control to external codes by providing a way to assert that
an N-column delim parser should be used.
|
int |
getFieldCount()
return the number of fields in each record.
|
int |
getFieldIndex(java.lang.String string)
returns the index of the field.
|
java.lang.String[] |
getFieldLabels()
return the labels found for each field.
|
java.lang.String[] |
getFieldNames()
return the name of each field.
|
java.lang.String[] |
getFieldUnits()
return the units that were associated with the field.
|
double |
getFillValue()
return the fillValue.
|
java.lang.String |
getHeaderDelimiter()
get the header delimiter
|
AsciiParser.RecordParser |
getRecordParser()
Getter for property recordParser.
|
static java.lang.String |
getRegexForFormat(java.lang.String format)
Convert FORTRAN (F77) style format to C-style format specifiers.
|
AsciiParser.RegexParser |
getRegexParser(java.lang.String regex)
return a regex parser for the given regular expression.
|
AsciiParser.RegexParser |
getRegexParserForFormat(java.lang.String format)
see
private TimeParser(String formatString, Map<String,FieldHandler> fieldHandlers)</tt> ,
which is very similar.
"%5d%5d%9f%s"
"d5,d5,f9,a"
|
java.util.Map<java.lang.String,java.lang.String> |
getRichFields()
returns the high rank rich fields in a map from NAME to LABEL.
|
Units |
getUnits(int index)
Indexed getter for property units.
|
double |
getValidMax()
get the maximum value for any field.
|
double |
getValidMin()
get the minimum valid value for any field.
|
AsciiParser.DelimParser |
guessDelimParser(java.lang.String line) |
AsciiParser.DelimParser |
guessDelimParser(java.lang.String line,
int lineNumber)
read in the first record, then guess the delimiter and possibly the column headers.
|
static int |
guessFieldCount(java.lang.String filename)
return the field count that would result in the largest number of records parsed.
|
static int |
guessLengthForFormat(java.lang.String format)
return the length of the format specifier.
|
AsciiParser.DelimParser |
guessSkipAndDelimParser(java.lang.String filename)
read in records, allowing for a header of non-records before
guessing the delim parser.
|
int |
guessSkipLines(java.lang.String filename,
AsciiParser.RecordParser recParser)
try to figure out how many lines to skip by looking for the line where
the number of fields becomes stable.
|
boolean |
isHeader(int iline,
java.lang.String lastLine,
java.lang.String thisLine,
int recCount)
returns true if the line is a header or comment.
|
boolean |
isIso8601Time(java.lang.String s)
quick-n-dirty check to see if a string appears to be an ISO8601 time.
|
boolean |
isKeepFileHeader()
Getter for property keepHeader.
|
boolean |
isRichHeader()
return true if the parsed file provided a rich ascii header.
|
static boolean |
isRichHeader(java.lang.String header)
return true if the header appears to contain JSON code which could be
interpreted as a "Rich Header" (a.k.a.
|
static AsciiParser |
newParser(int fieldCount)
creates a parser with @param fieldCount fields, named "field0,...,fieldN"
|
static AsciiParser |
newParser(java.lang.String[] fieldNames)
creates a parser with the named fields.
|
WritableDataSet |
readFile(java.lang.String filename,
ProgressMonitor mon)
Parse the file using the current settings.
|
java.lang.String |
readFirstParseableRecord(java.lang.String filename)
returns the first record that the record parser parses successfully.
|
java.lang.String |
readFirstRecord(java.io.BufferedReader reader)
return the first line of the freshly opened file.
|
java.lang.String |
readFirstRecord(java.lang.String filename)
return the first record that the parser would parse.
|
WritableDataSet |
readStream(java.io.Reader in,
ProgressMonitor mon)
Parse the stream using the current settings.
|
WritableDataSet |
readStream(java.io.Reader in,
java.lang.String firstRecord,
ProgressMonitor mon)
read in the stream, including the first record if non-null.
|
WritableDataSet |
readString(java.lang.String str,
ProgressMonitor mon) |
void |
removePropertyChangeListener(java.beans.PropertyChangeListener l)
Removes a PropertyChangeListener from the listener list.
|
void |
setCommentPrefix(java.lang.String comment)
Records starting with this are not processed as data, for example "#".
|
AsciiParser.DelimParser |
setDelimParser(java.io.Reader in,
java.lang.String delimRegex)
The DelimParser splits each record into fields using a delimiter like ","
or "\\s+".
|
AsciiParser.DelimParser |
setDelimParser(java.lang.String filename,
java.lang.String delimRegex)
The DelimParser splits each record into fields using a delimiter like ","
or "\\s+".
|
AsciiParser.DelimParser |
setDelimParser(java.lang.String line,
java.lang.String delimRegex,
int expectedColumnCount)
The DelimParser splits each record into fields using a delimiter like ","
or "\\s+".
|
void |
setFieldParser(int field,
AsciiParser.FieldParser fp)
set the special parser for a field.
|
void |
setFillValue(double fillValue)
numbers that parse to this value are considered to be fill.
|
AsciiParser.FixedColumnsParser |
setFixedColumnsParser(int[] columnOffsets,
int[] columnWidths,
AsciiParser.FieldParser[] parsers)
set the record parser to be a fixed columns parser
|
AsciiParser.FixedColumnsParser |
setFixedColumnsParser(java.io.Reader in,
java.lang.String delim)
looks at the first line after skipping, and splits it to calculate where
the columns are.
|
AsciiParser.FixedColumnsParser |
setFixedColumnsParser(java.lang.String filename,
java.lang.String delim)
looks at the first line after skipping, and splits it to calculate where
the columns are.
|
void |
setHeaderDelimiter(java.lang.String headerDelimiter)
set the delimiter which explicitly separates header from the data.
|
void |
setKeepFileHeader(boolean keepHeader)
Setter for property keepHeader.
|
void |
setPropertyPattern(java.util.regex.Pattern propertyPattern)
specify the Pattern used to recognize properties.
|
void |
setRecordCountLimit(int recordCountLimit)
limit the number of records read.
|
void |
setRecordParser(AsciiParser.RecordParser recordParser)
Setter for property recordParser.
|
void |
setRecordStart(int recordStart)
set the number of records to skip before accumulating the result.
|
AsciiParser.RecordParser |
setRegexParser(java.lang.String[] fieldNames)
The regex parser is a slow parser, but gives precise control.
|
void |
setSkipLines(int skipLines)
skip a number of lines before trying to parse anything.
|
void |
setUnits(int index,
Units units)
Indexed setter for property units.
|
void |
setUnits(Units... u)
Set all the units at once.
|
void |
setValidMax(double validMax)
set the maximum value for any field.
|
void |
setValidMin(double validMin)
set the minimum valid value for any field.
|
void |
setWhereConstraint(java.lang.String sparm,
java.lang.String op,
java.lang.String sval)
allow constraint for where condition is true.
|
public static final java.util.regex.Pattern NAME_COLON_VALUE_PATTERN
public static final java.util.regex.Pattern NAME_EQUAL_VALUE_PATTERN
public static final java.lang.String PROPERTY_FIELD_NAMES
public static final java.lang.String PROPERTY_FILE_HEADER
public static final java.lang.String PROPERTY_FIRST_RECORD
public static final java.lang.String PROPERTY_FIELD_PARSER
public static final java.lang.String DELIM_COMMA
public static final java.lang.String DELIM_TAB
public static final java.lang.String DELIM_WHITESPACE
public static final Units UNIT_UTC
protected java.lang.String headerDelimiter
public static final java.lang.String PROP_HEADERDELIMITER
public static final AsciiParser.FieldParser DOUBLE_PARSER
public final AsciiParser.FieldParser UNITS_PARSER
public final AsciiParser.FieldParser ENUMERATION_PARSER
protected double validMin
public static final java.lang.String PROP_VALIDMIN
protected double validMax
public static final java.lang.String PROP_VALIDMAX
public AsciiParser()
public final boolean isHeader(int iline, java.lang.String lastLine, java.lang.String thisLine, int recCount)
iline
- the line number in the file, starting with 0.lastLine
- the last line read.thisLine
- the line we are testing.recCount
- the number of records successfully read.public final boolean isIso8601Time(java.lang.String s)
s
- public java.lang.String readFirstRecord(java.lang.String filename) throws java.io.IOException
filename
- java.io.IOException
public java.lang.String readFirstRecord(java.io.BufferedReader reader) throws java.io.IOException
reader
- java.io.IOException
public java.lang.String readFirstParseableRecord(java.lang.String filename) throws java.io.IOException
filename
- java.io.IOException
public int guessSkipLines(java.lang.String filename, AsciiParser.RecordParser recParser) throws java.io.IOException
filename
- recParser
- java.io.IOException
public AsciiParser.DelimParser guessSkipAndDelimParser(java.lang.String filename) throws java.io.IOException
filename
- java.io.IOException
public AsciiParser.DelimParser guessDelimParser(java.lang.String line) throws java.io.IOException
java.io.IOException
public AsciiParser.DelimParser guessDelimParser(java.lang.String line, int lineNumber) throws java.io.IOException
line
- a single record to attempt parsing.lineNumber,
- useful for debugging.java.io.IOException
public AsciiParser.DelimParser setDelimParser(java.lang.String filename, java.lang.String delimRegex) throws java.io.IOException
filename
- filename to read in.delimRegex
- the delimiter, such as "," or "\t" or "\s+"java.io.IOException
public AsciiParser.DelimParser setDelimParser(java.lang.String line, java.lang.String delimRegex, int expectedColumnCount) throws java.io.IOException
line
- a single record to readdelimRegex
- the delimiter, such as "," or "\t" or "\s+"expectedColumnCount
- -1 or the number of columns we expect to see.java.io.IOException
java.lang.IllegalArgumentException
- if the positive expectedColumnCount doesn't match the result.public AsciiParser.DelimParser setDelimParser(java.io.Reader in, java.lang.String delimRegex) throws java.io.IOException
in
- delimRegex
- the delimiter, such as "," or "\t" or "\s+"java.io.IOException
public final AsciiParser.RecordParser setRegexParser(java.lang.String[] fieldNames)
fieldNames
- public AsciiParser.FixedColumnsParser setFixedColumnsParser(java.lang.String filename, java.lang.String delim) throws java.io.IOException
filename
- filename to read in.delim
- regex to split the initial line into the fixed columns.java.io.IOException
public AsciiParser.FixedColumnsParser setFixedColumnsParser(java.io.Reader in, java.lang.String delim) throws java.io.IOException
in
- the Reader to get lines from.delim
- regex to split the initial line into the fixed columns.java.io.IOException
public static int guessFieldCount(java.lang.String filename) throws java.io.FileNotFoundException, java.io.IOException
filename
- the file name, a local file opened with a FileReaderjava.io.FileNotFoundException
java.io.IOException
public void setFieldParser(int field, AsciiParser.FieldParser fp)
field
- the field number, 0 is the first column.fp
- the parserpublic static AsciiParser newParser(int fieldCount)
fieldCount
- the number of fieldspublic static AsciiParser newParser(java.lang.String[] fieldNames)
fieldNames
- the names for each fieldpublic void setSkipLines(int skipLines)
skipLines
- public void setRecordCountLimit(int recordCountLimit)
recordCountLimit
- public void setRecordStart(int recordStart)
recordStart
- public void setPropertyPattern(java.util.regex.Pattern propertyPattern)
propertyPattern
- regular expression Pattern with two groups.public void setCommentPrefix(java.lang.String comment)
comment
- the prefixpublic java.lang.String getHeaderDelimiter()
public void setHeaderDelimiter(java.lang.String headerDelimiter)
headerDelimiter
- public WritableDataSet readStream(java.io.Reader in, ProgressMonitor mon) throws java.io.IOException
in
- the input streammon
- java.io.IOException
public WritableDataSet readString(java.lang.String str, ProgressMonitor mon) throws java.io.IOException
str
- the data, encoded in a UTF-8 stringmon
- null or a progress monitorjava.io.IOException
public WritableDataSet readStream(java.io.Reader in, java.lang.String firstRecord, ProgressMonitor mon) throws java.io.IOException
in
- the stream, which is not closed.firstRecord,
- if non-null, parse this record first. This allows information to be extracted about the
records, then fed into this loop.mon
- null or a progress monitorjava.io.IOException
public static boolean isRichHeader(java.lang.String header)
#{
and #}
with a colon contained within.header
- string containing the commented header.https://github.com/JSONheadedASCII/examples
public boolean isRichHeader()
public java.util.Map<java.lang.String,java.lang.String> getRichFields()
public void setWhereConstraint(java.lang.String sparm, java.lang.String op, java.lang.String sval)
sparm
- column name, such as "field4"op
- constraint, one of eq gt ge lt le nesval
- String value. For nominal columns, String equality is used.public AsciiParser.DelimParser getDelimParser(int fieldCount, java.lang.String delim)
fieldCount
- delim
- the delimiter pattern, such as "," or "\s+"public static int guessLengthForFormat(java.lang.String format)
format
- public static java.lang.String getRegexForFormat(java.lang.String format)
format
- for example "%5d%5d%9f%s"MetadataUtil.normalizeFormatSpecifier(java.lang.String)
public AsciiParser.RegexParser getRegexParserForFormat(java.lang.String format)
private TimeParser(String formatString, Map<String,FieldHandler> fieldHandlers)</tt>
,
which is very similar.format
- TimeParser
public AsciiParser.RegexParser getRegexParser(java.lang.String regex)
regex
- public AsciiParser.FixedColumnsParser setFixedColumnsParser(int[] columnOffsets, int[] columnWidths, AsciiParser.FieldParser[] parsers)
columnOffsets
- the start of each columncolumnWidths
- the width of each columnparsers
- the parser for each column.public int getFieldCount()
public java.lang.String[] getFieldNames()
public java.lang.String[] getFieldLabels()
public java.lang.String[] getFieldUnits()
public WritableDataSet readFile(java.lang.String filename, ProgressMonitor mon) throws java.io.IOException
filename
- the file to readmon
- a monitorjava.io.IOException
public void addPropertyChangeListener(java.beans.PropertyChangeListener l)
l
- The listener to add.public void removePropertyChangeListener(java.beans.PropertyChangeListener l)
l
- The listener to remove.public boolean isKeepFileHeader()
public void setKeepFileHeader(boolean keepHeader)
keepHeader
- New value of property keepHeader.public AsciiParser.RecordParser getRecordParser()
public void setRecordParser(AsciiParser.RecordParser recordParser)
recordParser
- New value of property recordParser.public Units getUnits(int index)
index
- Index of the property.index
.public void setUnits(int index, Units units)
index
- Index of the property.units
- New value of the property at index
.public void setUnits(Units... u)
u
- array (or varargs) of units to be applied to the 0,1,2nd,... fields.public int getFieldIndex(java.lang.String string)
string
- the label for the field, such as "field2" or "time"public double getFillValue()
public void setFillValue(double fillValue)
fillValue
- New value of property fillValue.public double getValidMin()
public void setValidMin(double validMin)
validMin
- public double getValidMax()
public void setValidMax(double validMax)
validMax
-