Monitoring and the SamplingPoint object - Part 3

Abstract



After Monitoring and the SamplingPoint object - Part 2 we left the Monitoring section of DIGGSML in a workable state.

There is however a problem associated with this method, it's verbosity, adding a large amount of data will rapidly produce a prohibitively large file.

This article details how this problem arises and two possible solutions, using a space separated lists of values, and using a tabular system proposed by John Bobbitt of the Petrochemical Open Standards Consortium (POSC) as well as taking looking at the OGC's standard for this type of data, SensorML.


Current Structure

The following example shows how wind speed and direction readings from an Annometer would be reported in DIGGSML.

Example 1 - Multiple ReadingGroups in an Annometer

<instruments>
  <SamplingInstrument>
    <name codeSpace="http://www.diggsml.com">Annometer 1</name>
    <readingGroups>
      <PlaneAngleReadingGroup>
        <planeAngleReadings>
          <PlaneAngleReading>
            <dateTime>2007-01-01T00:01:00</dateTime>
            <planeAngle uom="dega">22</planeAngle>
          </PlaneAngleReading>
          <PlaneAngleReading>
            <dateTime>2007-01-02T00:01:00</dateTime>
            <planeAngle uom="dega">30</planeAngle>
          </PlaneAngleReading>
          <PlaneAngleReading>
            <dateTime>2007-01-03T00:01:00</dateTime>
            <planeAngle uom="dega">27</planeAngle>
          </PlaneAngleReading>
        </planeAngleReadings>
      </PlaneAngleReadingGroup>
 
      <VelocityReadingGroup>
        <velocityReadings>
          <VelocityReading>
            <dateTime>2007-01-01T00:01:00</dateTime>
            <velocity uom="m/s">1.34</velocity>
          </VelocityReading>
          <VelocityReading>
            <dateTime>2007-01-02T00:01:00</dateTime>
            <velocity uom="m/s">0.23</velocity>
          </VelocityReading>
          <VelocityReading>
            <dateTime>2007-01-03T00:01:00</dateTime>
            <velocity uom="m/s">2.9</velocity>
          </VelocityReading>
        </velocityReadings>
      </VelocityReadingGroup>
    </readingGroups>
  </SamplingInstrument>
</instruments>

As is shown by this example adding a single reading requires the addition of four elements and some 166 characters before even including any data!

Possible Solution


There is currently a proposal to add a supplemental way of defining the data using lists of values rather than repeating elements as illustrated in Example 2 below.

Example 2 - A More Compact Annometer

<instruments>
  <SamplingInstrument>
    <name codeSpace="http://www.diggsml.com">Annometer 1</name>
    <readingGroups>
      <PlaneAngleReadingGroup>
        <planeAngleReadings>
          <dateTimeList>2007-01-01T00:01:00 2007-01-02T00:01:00 2007-01-03T00:01:00</dateTimeList>
          <planeAngleList uom="dega">22 30 27</planeAngleList>
        </planeAngleReadings>
      </PlaneAngleReadingGroup>
 
      <VelocityReadingGroup>
        <velocityReadings>
          <dateTimeList>2007-01-01T00:01:00 2007-01-02T00:01:00 2007-01-03T00:01:00</dateTimeList>
          <velocityList uom="m/s">1.34 0.23 2.9</velocityList>
        </velocityReadings>
      </VelocityReadingGroup>
     </readingGroups>
  </SamplingInstrument>
</instruments>

This example shows the use of lists (space separated lists of values) for dateTime,planeAngle and velocity, meaning that adding a value is simply the act of adding an entry to the dateTimeList and the velocityList or planeAngleList.

Drawbacks

As you can see from the planeAngleList and velocityList elements the unit of measure is defined at the list level, not at each measurement level, so all measurements in the list must be reported using the same units.

Another possible, and more serious, hazard associated with using this list method, it is much less "strict", for example there is no easy way to validate the fact that the number of elements in each list is the same using purely XML Schema, this logic needs to be placed in the Schematron file used for extended validation. Therefore if an entry is missed from either list there is nothing to tell where it was lost from, so the data is "out of sync" with no way of checking without using the extended validation method.

Even more generic

I've recently been looking at how other people solve this problem and come across a solution posed by John Bobbitt, someone who helped me with the integration of WITSML at the start of my involvement with DIGGSML.

My preferred choice of Bobbitt's (2004) solutions is Method 4, a generic Table object, containing many Component objects describing the contents of each column of data (name, description, datatype, uom etc) and many Row objects defining the data itself as a list of delimited values (cleverly the delimiter is specified in the Table definition, I would however extend this to allow delimiter specification on a per-Row basis).

Example 3 - A Tabular Approach

<ResultsTable>
  <delimiter>|</delimiter>
  <!-- define the three columns -->
  <column>
    <index>1</index>
    <name>Date Time</name>
    <type>dateTime</type>
    <uom>dega</uom>
  </column>
  <column>
    <index>2</index>
    <name>Wind Angle</name>
    <type>witsmlPlaneAngleMeasure</type>
    <uom>dega</uom>
  </column>
  <column>
    <index>3</index>
    <name>Wind Speed</name>
    <type>witsmlVelocityMeasure</type>
    <uom>m/s</uom>
  </column>
  <!-- actually include the data -->
  <row index="1">2007-01-01T00:01:00|22|1.34</row>
  <row index="2">2007-01-02T00:01:00|30|0.23</row>
  <row index="3">2007-01-03T00:01:00|27|2.9</row>
</ResultsTable>

I think this looks like an excellent way of describing lots of tabular data, whilst the overhead of metadata is higher appending more values is far more succinct that using lists of values, although does have the same drawback of not being able to be validated without using the advanced features of Schematron. The other drawback is that values are not checked for type by the XML Schema, however since the type is specified as a property of the table, this could also possibly be validated by some very sophisticated Schematron but that would require further research.

SensorML

There are other possibile ways of recording monitoring data in XML, including Open Geospatial Consortium (OGC) standards for Observations and Measurements (2006) and the particularly promising SensorML (2007). SensorML uses DataComponent types for storing values, These can be Quantity, Count, Boolean, Text, Time etc. These DataComponents are used to store values that are assigned to properties. This is similar to the way DIGGSML works however without the level of granularity that DIGGSML enforces, for example SensorML may specify that an element must be of type Quantity, an arbitrary quantity of something where as DIGGSML will specify a property must be of type witsmlPressureMeasure which is still a quantity, but by specifying it as the more restrictive pressure type DIGGSML enforces proper units of measure for that value and therefore reduces errors in the data.

There is however another method of transmitting large amounts of data (or raw instrument data) encoded as an ASCII block in SensorML, actually very similar to the tabular approach shown by Bobbitt (2004) as shown by this example from the OGC's SensorML Specification (OGC 2007, p74).

Example 4 - ASCII Encoded Data in SensorML

<swe:DataArray>
  <swe:elementCount>
    <swe:Count definition="urn:ogc:def:property:OGC:timeSteps">4</swe:Count>
  </swe:elementCount>
  <swe:elementType>
    <swe:DataRecord definition="urn:ogc:def:property:OGC:atmosphericConditions">
      <swe:field name="Time">
        <swe:Time definition="urn:ogc:def:property:OGC:Time">
          <swe:uom xlink=" urn:ogc:def:unit:ISO:8601"/>
        </swe: Time>
      </swe:field>
      <swe:field name="AirTemperature">
        <swe:Quantity definition="urn:ogc:def:property:OGC:AirTemperature">
          <swe:uom code="Cel"/>
        </swe:Quantity>
      </swe:field>
      <swe:field name="AtmosphericPressure">
        <swe:Quantity definition="urn:ogc:def:property:OGC:AtmosphericPressure">
          <swe:uom code="hPa"/>
        </swe:Quantity>
      </swe:field>
      <swe:field name="RelativeHumidity">
        <swe:Quantity definition="urn:ogc:def:property:OGC:RelativeHumidity">
          <swe:uom code="%"/>
        </swe:Quantity>
      </swe:field>
      <swe:field name="Visibility">
        <swe:Category definition="urn:ogc:def:property:OGC:SkyCondition"/>
      </swe:field>
    </swe:DataRecord>
  </swe:elementType>
  <swe:encoding>
    <swe:TextBlock tokenSeparator="&#x20;" blockSeparator="," decimalSeparator="."/>
  </swe:encoding>
  <swe:values>
    2006-10-05T12:30:00Z 35.1 950.0 32.0 clear,
    2006-10-05T13:00:00Z 35.8 940.0 331 clear,
    2006-10-05T13:30:00Z 36.5 938.0 35.8 hazy,
    2006-10-05T14:00:00Z 38.0 935.0 37.0 cloudy
  </swe:values>
</swe:DataArray>

This shows SensorML's method of encoding reading data in an ASCII block (the swe:values element) separated between columns by spaces and between rows by commas, similar to the table structure illustrated above.

Binary Data

SensorML also allows the user to encode Binary data in it's swe:DataArray blocks, specifying the encoding, as well as the format of the data itself, this goes against the grain in terms of DIGGSML which has been oriented around sending semi human readable data. SensorML also allows for attachments of binary data in these swe:DataArray blocks (by way of an xlink:href attribute on the swe:values element), this is also different to DIGGSML which prefers to include attached data in its own AttachedFiles elements, allowing for files to be attached in multiple ways would potentially confuse users and developers.

Conclusion

Although DIGGSML is a great framework for transferring Geotechnical, Geoenvironmental, Piling and Monitoring data it can become very verbose when large amounts of data need to be transferred. This verbosity will lead to large file sizes, maybe even prohibitively large file sizes. This article has presented two possible ways of reducing the volume of descriptive metadata required to enclose the actual data values.

All methods have advantages and disadvantages with respect to each other, neither is a "silver bullet" to solve the large data problem, implementing a tabular format such as Bobbitt (2004) suggests looks to be reasonably simple for both the framework and software developers. Whereas the list based approach looks at storing the data more formally with respect to its type and values but does not allow for quite the level of flexibility, shown in Bobbitt's (2004) tabular approach.

Importing the swe:DataArray would initially look to be a very quick and easy way of leveraging the power of SensorML and adding a method of allowing ASCII encoded blocks of large amounts of data in DIGGSML, it is not without drawbacks however as it does use a different set of units and measurements to DIGGSML and would therefore require implementing applications to add a considerable amount of logic to understand both OGC style units of measure and WITSML style as used elsewhere in DIGGSML. Using this method would also allow the user to add Binary data blocks, violating DIGGSMLs aim of keeping the data values as human readable as possible. It would be difficult to import an ASCII type swe:DataArray, and prohibit a Binary type swe:DataArray.

Owing to the drawbacks of importing the swe:DataArray from SensorML it may be more prudent to adopt a 'SensorML-like' approach and implement the tabular approach illustrated by Bobbitt (2004) using DIGGSML semantics for units of measure and data types, whilst at the same time studying the SensorML approach for any improvements it may offer over. One such example evident even from the short example above is the swe:Count element showing how many rows of data there are, this makes it easier for software applications to import the data and adds a check to the file to make sure it hasn't been damaged in transit and inadvertently added or removed data. There may be also a way of adding a checksum or hash to the data element to ensure this integrity across transmission, there are many algorithms publicly available for calculating these checksums, from the traditional CRC32 to the more cryptographically targeted MD5 and SHA the possible implementation of these check functions will be covered at a later time.

References

J Bobbitt (2004), "XML Tables", Available online from http://www.posc.org/ebiz/Guidelines/XMLTables.html

Open Geospatial Consortium (OGC) (2006), "Observations and Measurements". Availiable online from http://portal.opengeospatial.org/files/?artifact_id=17038

Open Geospatial Consortium (OGC) (2007), "OpenGIS Sensor Model Language (SensorML) Implementation Specification". Availiable online from http://portal.opengeospatial.org/files/?artifact_id=21273