Notes on a Parameter Model for Data Analysis
Introduction
A parameter set or "pset" is a named collection of parameter objects. Psets
are often used for component (e.g., "task") input and output parameters, but
can also be independent data entities not associated with any single task.
Essentially any data model can be represented as a set of parameters.
Parameters come in a variety of predefined or user defined types, and a
parameter value may be of arbitrary size. While we don't suggest that
parameter sets should replace other mechanisms for general data storage
(FITS, VOTable, DBMS, etc.) they provide a simple but powerful mechanism
for controlling tasks from within a scripting language, and for passing
modest amounts of data about within a high level scripted application.
The most common use for a pset is to pass parameters into and out of
a "task" type of component. The use of psets is not limited to task
components but to simplify the discussion we will mostly refer to tasks
components here. The terms
keyword and
parameter mean much the
same thing; both refer to parameters, but
keyword normally refers to
an instance of a parameter.
Concepts
Parameter Sets
- A component (e.g., a task) may make use of multiple parameter sets.
- Psets may correspond to task parameters or may exist separately
from any task as independent data entities.
- Typical parameter sets are a task's input or output parameters, or a
data entity (data structure) such as a WCS or parameter fit.
- Parameter sets are either input at task initiation, or output upon
successful task termination.
- During execution a task has exclusive access to a pset instance
loaded into memory in the task's address space.
- Pset instances may, and often do, persist after task completion.
- A data entity returned as an output pset from one task may be an
input pset to another task.
Parameter Definition and Runtime
- A pset definition (similar to a schema) fully defines each
parameter, including the parameter name, type, semantic type, units,
description, legal values, and so forth.
- A pset instance includes only the parameter name and the attributes
of a parameter which vary at runtime (basically this is the name
and value). Multiple pset instances may reference the same pset
definition .
- A keyword table consists of several parameter set instances all
collected together in the same table. Each parameter is represented
by at least the pset name, parameter name, and parameter value fields.
For example, if the input to a task consists of its input parameters plus
a WCS object represented as a pset, the task would be passed an input
keyword table consisting of all the parameters from these two psets.
A parameter in the main input pset would identify the WCS pset to be used.
Parameters
- Parameters may be either scalars or 1D sequences of scalar values.
- The base type of a parameter is one of the primitive types: int,
float, complex, string, bool, byte (for binary blobs), and so forth.
- Parameters may also have a semantic type denoted by a UCD or UTYPE.
- More complex data entities than these are represented as psets.
- Parameter values may be indirect, referencing another pset,
parameter, or other data entity.
- A scalar parameter value may optionally be followed by an error
estimate (statistical error, symmetrical, same units as the value).
- More complex errors should be modeled and represented using
multiple parameters.
As an example of the mechanism for representing simple errors suppose
we have a quantity 8453.23 with an associated error of 0.394. The value
and its error would be represented as "8453.23 0.394", with both values
in the same units.
Some Details
Parameter Set Instance
Logically a parameter set
instance consists of:
- Some system-defined pset attributes describing the pset itself;
this is known as the pset descriptor.
- Any number of data keywords (parameters) comprising the user-defined
part of the pset.
The system-defined pset attributes are something like the following:
| Attribute | Description |
| name | pset instance name |
| type | type of pset (task i/o, data model, etc.) |
| psetDef | URI of pset definition |
| ctime | creation date and time |
| mtime | modify date and time |
| description | short description of what this pset is |
The pset attributes are included in every pset instance, hence it is
possible, given the psetDef URI, to retrieve the pset definition schema
to fully understand the pset.
Keyword Attributes and Values
A keyword instance should include anything which can vary with the
instance; everything else is in the parameter definition. The minimum
possible keyword attributes are these:
Other possible runtime keyword fields might be the unit or UCD if these
are allowed to vary at runtime.
In addition to conventional valid numerical or string values, parameter
values can be null, unset, or indefinite:
- A null value means that the parameter has been assigned a value,
but the value is null or empty, whatever this means for the particular
type of parameter involved. For example, a null pointer, or an
empty string. A null value can also be used by an application to
indicate that the client does not want to specify a parameter value
and prefers that the application apply some built-in default (the
alternative would be to have the application use some reserved value).
- An unset value means that no value has been assigned. For example,
a CLI might be configured to generate an interactive prompt if a
parameter is used for which no value has been set. In this case,
if the value has been set to null no query would result and the
null indicator would be silently passed on to the application.
- An indef value is numerically indefinite in the sense of IEEE
arithmetic. That is, a valid numerical value has been set but this
value is indefinite. Any numerical computation involving an indef
value produces an indefinite value as the result.
A parameter value can also be
indirect, for example it can refer to a
pset, parameter, file, etc., via a URI. A "pset" URI refers to another
pset or an individual parameter within a pset. A "file" URI refers to
a local file. An Internet URI ("http" etc.) points to a data object
accessed via the Internet. Indirection is a property of the parameter
value (rather than the parameter itself) so that a given parameter can
have either a direct or indirect value in a given instance.
Parameter Set Definition
The full definition of a parameter set is specified once and applies to all
instantiated psets of that type. For a component the pset definition is
part of the metadata and interface definition for the component. For data
psets the pset is defined elsewhere, e.g., in the metadata for a package.
The pset definition is required to fully understand a pset instance since
all parameter attributes are not stored in a pset instance file.
As with a parameter set instance, the parameter set definition consists of
both global attributes describing the entire pset, as well as definitions
of each individual user-defined parameter. The system-defined pset
attributes required are something like the following:
| Attribute | Description |
| Type | Pset class name |
| Version | Pset definition version |
| Identifier | Pset definition URI (global identifier) |
| Description | Brief description of pset |
For each parameter something like the following attributes are required
to fully define the parameter:
Attribute  | Description |
| DefaultValue | Default value if any (Null, Indef, or value) |
| EnumValue | Enumerated list of acceptable values |
| MaxValue | Maximum value if any |
| MinValue | Minimum value if any |
| Description | Brief description (for popup help) |
| Length | 1=scalar, 0=variable, N=fixedarray |
| Mode | Usage mode (query, hidden, learn) |
| Name | Parameter name |
| Prompt | Short prompt string |
| Type | Data type (int, float, string, blob, etc.) |
| UCD | Type of physical quantity |
| Unit | Pysical unit |
| UType | User type (DM utype or object type for blob) |
Pset Representation in the Keyword Table
To keep things simple at runtime we would like to pass multiple pset
instances encoded within a single keyword table. Each row of the keyword
table represents a single parameter instance. This works fine for multiple
psets since the pset name is a table column. But what do we do about pset
descriptors, i.e., the global attributes which apply to the entire pset?
Logically this is most naturally represented as a hierarchical structure,
but for simplicity of implementation of the parameter mechanism we would
like to represent the logical structure in a simple fashion within a
single keyword table.
It appears that this problem can be resolved fairly easily by representing
the pset descriptor as a data entity and encoding it as a special case of
a pset, a sort of meta-pset of class "pset descriptor". Hence, a pset
instance is represented in the keyword table as
two psets, the pset
descriptor encoded as a set of parameters in a "pset descriptor" pset,
and a data pset consisting of only the data keywords of the actual pset.
Pset Representation
Physically a parameter set is defined as an XML document. The schema
of this document is "parameter set definition" and is the same for all
parameter sets.
The keyword table is also normally represented in XML, however this does
not have to be the case. If a binary protocol is used it may be possible,
as an optimization, to pass binary representations of parameter data
through from the client to the component via the container and execution
framework.
(NRAO/ALMA already have simplified prototypes of both the parameter set
definition and keyword table as XML representations).
--
DougTody - 22 Jun 2005
to top