QDataSets are the data model used within Das2 and Autoplot. It was preceded by a more specific data model, used to developed to deliver spectrogram time series data sets where the dataset structure would change over time, and the interface is highly optimized for that environment. It's difficult to express many datasets in these terms, so the simpler "quick" QDataSet was introduced.
The QDataSet can be thought of as a fast Java array that has name-value metadata attached to it. These arrays of data can have an arbitrary number of indexes, or rank, although currently the interface limits rank to 0, 1, 2, 3, and 4. Each index's length can vary, like Java arrays, and datasets where the dimensions do not vary in length are colloquially called "Qubes."
QDataSets can have other QDataSets as property values, for example the property QDataSet.DEPEND_0 indicates that the values are dependent parameters of the "tags" QDataSet found there. This how how we get to the same abstraction level of the legacy Das2 dataset.
This is inspired by the CDF data model and PaPCo's dataset model.
type QDataSet, this dataset is a dependent parameter of the independent parameter represented in this DataSet. The tags for the DataSet's 0th index are identified by this tags dataset.
type QDataSet, this dataset is a dependent parameter of the independent parameter represented in this DataSet. The tags for the DataSet's 1st index are identified by this tags dataset. When DEPEND_1 is rank 2, then its first dimension goes with DEPEND_0 and its second are the tags for the second dimension.
type QDataSet, this dataset is a dependent parameter of the independent parameter represented in this DataSet. The tags for the DataSet's 2nd index are identified by this tags dataset. When DEPEND_2 is rank 2, then it's first dimension goes with DEPEND_0 and it's second are the tags for the second dimension.
type QDataSet, this dataset is a dependent parameter of the independent parameter represented in this DataSet. The tags for the DataSet's 3nd index are identified by this tags dataset. When DEPEND_3 is rank 2, then it's first dimension goes with DEPEND_0 and it's second are the tags for the second dimension.
type QDataSet describing each of the bundled datasets (Bundle Descriptor). This dataset describes how the columns should be split up into separate parameters. This rank 2 dataset has a length that is equal to the number of bundled datasets. The values(i,*) are the qube dimensions of the dataset, except for the first dimension. When all the bundled datasets are rank 1, then length(*) will be equal to zero. property(*,UNITS) will yield the unit for each dataset. Bundle dimensions generally add one physical dimension for each bundled dataset. property(*,DEPEND_0) is special, because it will return a string rather than a QDataSet. This string should refer to one of the bundled datasets by its NAME property. (Any property that returns a QDataSet should return a string referring to another dataset in the bundle.) Also the dataset is necessarily a QUBE.
type QDataSet describing each position of the rank 1 dataset (Bundle Descriptor). This dataset describes how the columns should be split up into separate parameters. See BUNDLE_1. Note slicing a dataset on the zeroth dimension will move BUNDLE_1 to BUNDLE_0. Properties defined in this dataset will be overwritten by the BUNDLE dataset's properties. For example, if the dataset has property( UNITS, 0 ) defined as "Hz" but the bundle has property( UNITS,0 ) as "Hertz" then "Hertz" is used.
type QDataSet Bundle Descriptor. When multiple BUNDLES are present, they must be simple bundles, bundling just rank 1 datasets.
type QDataSet Bundle Descriptor. When multiple BUNDLES are present, they must be simple bundles, bundling just rank 1 datasets.
type Integer, only found in a bundle descriptor (BUNDLE_0 or BUNDLE_1), this returns the integer index of the start of the current dataset. If this is null, then the index used to access the value may be used. (E.g. a bundle of Rank 1 datasets.)
type String which is a comma-delimited list of keywords that describe the boundary type for each column. For example, "min,max" "min,maxInclusive" or "c95min,mean,c95max". A bins dimension doesn't add a physical dimension. Autoplot uses just "min,max" and "min,maxInclusive"
type String which is a comma-delimited list of keywords that describe the boundary type for each column. This comma-delimited list of keywords that describe the boundary type for each column. For example, "min,max" "min,maxInclusive" or "c95min,mean,c95max". A bins dimension doesn't add a physical dimension. Autoplot uses just "min,max" and "min,maxInclusive"
type String, non-null string identifies that elements in this dimension are instances of data with the same dimensions. ds[2,20] where JOIN_0="DEPEND_1" should be equivalent to ds[40]. It's not clear if the text should indicate anything, but for now let's just indicate the next dimension.
type QDataSet, a correlated plane of data. An additional dependent DataSet that is correlated by the first index. Note "0" is just a count, and does not refer to the 0th index. All correlated datasets must be correlated by the first index. TODO: what about two rank 2 datasets? Note that if PLANE_i==null then PLANE_(i+1) must also be null.
type QDataSet, that stores the position of a slice or range in a collapsed dimension. In "Flux(Energy) @ Time=2009-03-16T11:19 UT", the Time=... comes from a context property. Note "0" is just a count, and does not refer to the 0th index. A dataset can have any number of contexts: Temperature @ ( Time, Long, Lat ): 37 deg F @ ( 2009-03-16T11:19 UT, 91.5331 deg West, 41.6579 deg North ) Typically this will be a rank 0 dataset, but may also be a rank 1 dataset with a bins dimension.
type QDataSet, that stores the position of a slice or range in a collapsed dimension. In "Flux(Energy) @ Time=2009-03-16T11:19 UT", the Time=... comes from a context property. Note "1" is just a count, and does not refer to the 1th index. A dataset can have any number of contexts: Temperature @ ( Time, Long, Lat ): 37 deg F @ ( 2009-03-16T11:19 UT, 91.5331 deg West, 41.6579 deg North ) Typically this will be a rank 0 dataset, but may also be a rank 1 dataset with a bins dimension.
the maximum number of allowed planes. This should be used to enumerate all the planes.
maximum number of same-unit bundled dimensions (e.g. B_GSM[time,Bundle]). This was introduced when CDF dataset fa_k0_tms_20040224_v01.cdf?O+_en had 48 energy channels, was marked as time_series but wouldn't render because view code limited to 12.
the highest rank supported by the library. Arbitrary high rank datasets are supported through RankNDataSet, but must be sliced to be accessed.
the highest rank supported by the library, without direct access to datums. Some codes may choose to use this when supporting high rank data is trivial.
reference value for the size of a dataset where we would expect to start seeing performance degradation. For example, a linear algorithm totalling the dataset would perform within one second when the dataset is within this limit. This is of course a somewhat arbitrary limit, but it shows what the expectations are.
type Units indicating the units of the dataset in the enumeration of org.das2.datum.Units, as in org.das2.datum.Units.km. New unit types can be introduced with Units.lookup. For example,
from org.das2.datum import Units u= Units.lookupUnits('seconds since 2015-001T00:00') ds= findgen(3600) ds= putProperty( ds, QDataSet.UNITS, u ) plot( ds ) # plots line from 00:00 to 01:00.
type String, Java/C format string for formatting the values. This should imply precision, and codes that serialize data can use this to correctly format the data. Examples include:
type Number, value to be considered fill (invalid) data. Note because all data is accessed as doubles, noise may be inadvertently affect numbers.
type Number, minimum bounding measurements to be considered valid. Lower and Upper bounds are inclusive. FILL_VALUE should be used to make the lower bound or upper bound exclusive. Note DatumRange contains logic is exclusive on the upper bound.
type Number, maximum bounding measurements to be considered valid. Lower and Upper bounds are inclusive. FILL_VALUE should be used to make the lower bound or upper bound exclusive. Note DatumRange contains logic is exclusive on the upper bound.
type Number that is min used to discover datasets. This should be a reasonable representation of the expected dynamic range of the dataset.
type Number that is the max used to discover datasets. This should be a reasonable representation of the expected dynamic range of the dataset.
String, "linear" or "log", hinting at the preference for linear or a log axis.
String, Indicates how numbers should be combined in this space. Possible values are linear, geometric, mod24, mod360, and none. The value "none" indicates that no averaging is allowed (for example with nominal data) and only nearest neighbor averaging can be done. Note this is similar to SCALE_TYPE, where often geometric AVERAGE_TYPE will have a log SCALE_TYPE. When AVERAGE_TYPE is missing, linear should be assumed. See also https://spdf.gsfc.nasa.gov/istp_guide/vattributes.html#AVG_TYPE and https://sourceforge.net/p/autoplot/feature-requests/593/.
String, Concise Human-consumable label suitable for a plot label (~10 chars).
String, Human-consumable string suitable for a plot title (~100 chars).
String, Human-consumable string suitable for describing the data more fully. This should be html text, or just a link to other documentation (one URL, or two sentences to one page of text).
QDataSet, dataset of same geometry that indicates the weights for each point. Often weights are computed in processing, and this is where they should be stored for other routines. When the weights plane is present, routines can safely ignore the FILL_VALUE, VALID_MIN, and VALID_MAX properties, and use non-zero weight to indicate valid data. Further, averages of averages will compute accurately.
Boolean, Boolean.TRUE if dataset is monotonically increasing, and the data is rank 1. Data may only contain invalid values at the beginning or end, and may contain repeated values. Generally this will be used with tags datasets.
QDataSet of rank0, which is the expected distance between successive measurements where it is valid to make inferences about the data. For example, interpolation is disallowed for points 1.5*CADENCE apart. This property only makes sense with a tags dataset. Note this may be a "ratiometric" datum, like 110 percentIncrease, for logarithmically spaced data. Cadence must be positive.
QDataSet of rank 0, or correlated QDataSet that limits accuracy. This should be interpreted as the one standard deviation confidence level, and must be positive.
QDataSet of rank 0, or correlated QDataSet that limits accuracy. This should be interpreted as the one standard deviation confidence level, and must be positive.
QDataSet of rank 0 or correlated QDataSet identifies boundary. This is added to the measurements and should be interpreted as the upper limit of 100% confidence interval where a measurement was collected. Note if both DELTA_PLUS and BIN_PLUS are found, then BIN_PLUS must be greater or equal to DELTA_PLUS. This would be used where a rank 0 dataset could be used, and where it varies, BIN_MAX and BIN_MINUS are preferred.
QDataSet of rank 0 or correlated QDataSet identifies boundary. This is subtracted from the measurements and should be interpreted as the lower limit of the 100% confidence interval where a measurement was collected.
QDataSet of rank 1 identifies boundary in the same units as the dataset. This should be interpreted as the upper limit of 100% confidence interval where a measurement was collected. When this is found, BIN_PLUS and BIN_MINUS should be ignored.
QDataSet of rank 1 identifies boundary in the same units as the dataset. This should be interpreted as the lower limit of the 100% confidence interval where a measurement was collected. When this is found, BIN_PLUS and BIN_MINUS should be ignored.
name of the dataset in a bundle to be connected to BIN_MIN.
name of the dataset in a bundle to be connected to BIN_MAX.
name of the dataset in a bundle to be connected to BIN_MINUS.
name of the dataset in a bundle to be connected to BIN_PLUS.
name of the dataset in a bundle to be connected to DELTA_PLUS.
name of the dataset in a bundle to be connected to DELTA_MINUS.
CacheTag, indicating the coverage and resolution of a dimension. This is an object that represents the coverage and resolution of the interval covered. For example, in Autoplot the TimeSeriesBrowse uses this to keep track of what's already been read.
String, hint as to preferred rendering method. Examples include "spectrogram", "time_series", and "stack_plot", "nnSpectrogram", "hugeScatter", "series", "scatter", "colorScatter", "stairSteps", "fillToZero" "digital", "image", "pitchAngleDistribution", "eventsBar". Note these are just suggestions and are not interpreted in the library.
combining numbers is not allowed, and often nearest neighbor is a suitable result.
typical averages sum(ds)/len(ds)
geometric mean where result is exp(sum(log(ds))/len(ds))
mod24 mean where avg([23,1]) is 0.
mod360 mean where avg([359,1]) is 0.
modpi mean where avg( [5*PI/6,7*PI/6] ) is 0.
modtau mean where avg( [5*TAU/6,7*TAU/6] ) is 0.
full-fidelity rendering of buckshot and connect-a-dot plots
use blocks to draw each point, so data extents can be seen.
draw events bars
values are drawn.
values are an RGB image, a rank 3 dataset [w,h,3] or [w,h,4]. The "3" should be R,G, and B channels, and when "4" is used, ARGB is the default. There can be a DEPEND_2 that is a QDataSet with ordinal data, specifying the channels like so Ops.labelsDataset(['a','b','g','r']) or Ops.labelsDataset(['a','b','g','r']). Only bgr or rgb models are supported in the RGBImageRenderer, but future versions could support other color models.
triangle mesh type
String, a java identifier that should can be used when an identifier is needed. This is originally introduced for debugging purposes, so datasets can have a concise, meaningful name that is decoupled from the label. When NAMEs are used, properties with the same name should only refer to the named dataset.
Boolean.TRUE indicates that the dataset is a "qube," meaning that all dimensions have fixed length and certain optimizations and operators are allowed. Note that when DEPEND_1 is a rank 1 dataset, this implies QUBE. Likewise BUNDLE_1 is a qube. Note the result of any slice must be a qube.
String, representing the coordinate frame of the vector index. The units of a dataset should be EnumerationUnits which convert the data in this dimension to dimension labels that are understood in the coordinate frame label context. (E.g. X,Y,Z in GSM.) (Note this is before BUNDLE dimensions were formalized and is not used.)
Map<String,Object> representing additional properties used by client codes. No interpretation is done of these properties, but they are passed around as much as possible. Note Object can be String, Double, or Map<String,Object>. METADATA_MODEL is a string identifying the type of metadata, a scheme for the metadata tree, such as ISTP-CDF or SPASE.
String, a scheme for the metadata tree, such as ISTP or SPASE. This should identify a node's type when the node is present, but should not require that the node be present. When a required node is missing, this should be treated as if none of the metadata is available. This logic is to support aggregating metadata.
the metadata is ISTP-CDF metadata
the metadata is SPASE (Space Physics Archive Search Extract)
the value is a complex number, having two elements, the first is real second is imaginary.
String, human consumable identifying data version. Presently this is intended for human consumption, but eventually we may make them usable by software as well. Note if multiple versions go into making a product (e.g. aggregation), the version string should contain space-delimited version ids, so note versions must not contain spaces for other purposes. Also two version strings containing the same value can be coalesced. If this is prefixed with "<scheme>:", then this is to be interpreted as such:
String, Human-consumable string identifying the source of a dataset, such as the file or URI from which it was read. Clearly this is easily lost as processes are applied to the data, but when no other source is involved in a process (excluding library code itself), then the source should be preserved.
QDataSet of events scheme, containing a list of messages encountered during processing that annotate the data. For example, the AggregatingDataSource in Autoplot would add an event to the dataset when a file could not be used. This is a rank 2 dataset with BUNDLE_1=startTime,stopTime,message for now, but may soon allow for bounding qubes: BUNDLE_1=startTime,stopTime,startEn,stopEn,message, and this should be visualized via the EventsRenderer.
String, the name of another dataset in the bundle descriptor. Before this was introduced, a BundleDescriptor could have DEPEND_0 be a string.
String, the name of another dataset in the bundle descriptor. Before this was introduced, a BundleDescriptor could have DEPEND_1 be a string. Note this should only be used if DEPEND_1 is rank 2, otherwise the dataset should be a property of DEPEND_1.
String, the name of the rank 2 or more dataset in a bundle descriptor.
String, the label of the rank 2 or more dataset in a bundle descriptor.
int array, the dimensions of the element. A rank 0 is implicitly [], a rank 1, n by 1, would be [1]. This is similar to the size command, for one record of the data.
Map<String,Object> representing additional properties used by client codes. No interpretation is done of these properties, but they are passed around as much as possible. The object values should be but don't have to be limited to: double, double array, datum, QDataSet, String, String array.
typical bin is min,max with min inclusive and max exclusive.
typical bin is min,max with min inclusive and max exclusive.
scale type to suggest log axes and bins.
scale type to suggest linear axes and bins.
the minimum length of each of the waveform packets in a rank 2 waveform dataset.
the fill value often used in codes.
return null or an object implementing the capability for the given interface For example:
This allows operations to be performed efficiently. Note there is no WriteCapability class, this is just an example.ds= DDataSet.createRank1(100); WriteCapability write= ds.capability( WriteCapability.class ); write.putValue( 99, -1e31 );
return the length of the first dimension
accessor for properties attached to the dataset. See final static members for example properties.
returns the rank of the dataset, which is the number of indeces used to access data. Only rank 0, 1, 2, 3, and 4 datasets are supported in the interface. When a dataset's rank is 5 or greater, it should implement the HighRankDataSet interface which affords a slice operation to reduce rank.
return a dataset that is a slice of this dataset, slicing on the zeroth dimension. A slice will be the elements at this index, for example if this dataset is a rank 2 dataset flux(Time,Energy) then the slice of this will be a rank 1 dataset flux(Energy). The result of any slice will be a qube. Note the index must be positive. Negative indices, referenced from the end of the dataset, are not supported.
rank 0 accessor which provides the string value
return a dataset that is a subset of this dataset. For example:
ds= DDataSet.createRank1(100); QDataSet trim= ds.trim(50,60); assert( trim.length()==10 );Note start and end must be positive. Negative indices, referenced from the end of the dataset, are not supported here.
rank 0 accessor.