Duncan Temple Lang

University of California at Davis

Department of Statistics


The idea of this document is to present some discussion of how the different types in an XML schema are mapped to R data types.

We'll focus on the XMCDA-2.0.0 schema initially.

x = readSchema("inst/samples/XMCDA-2.0.0.xsd")
types = sapply(x, class)

Currently, the methodMessages complexType has a singe choice element and an attributeGroup reference. The choice has a min and max occurs of 0 and Inf/unbounded. So we can have a list of these elements. If they are compatible atomic types, e.g. strings, integers, etc. we could use a vector to hold them. Otherwise, we can use a list. We can have slots for the attributes. We want the attributes to be considered separately so we can convert non-strings values (e.g. integers, dates, dates and times) and maintain them in their natural type

readSchema() converts this to a UnionDefinition. (Not certain why at this point.)

The bibliography type is very similar as are all of the UnionDefinition objects for this schema. They have an annotation node, a choice, and some have an attributeGroup.

doc = xmlParse("inst/samples/XMCDA-2.0.0.xsd")
nodes = sapply(names(x)[types == "UnionDefinition"], function(x) getNodeSet(doc, sprintf("//xs:complexType[@name='%s']", x, "xs")))
sapply(nodes, names)

What is the count for each of these

sapply(x[types == "UnionDefinition"], function(x) x@slotTypes[[1]]@count)

ArrayClassDefinition

Both message and rankedLabel are represented as ArrayClassDefinition. message has an all and an attributeGroup. rankedLabel has just an all. This maps directly to a regular ClassDefinition, with possibly omitted values for some slots.

character

"preferenceDirection" "alternativeType" "valuationType" "status" are all of type character. These are enumerated string constants, e.g. active and inactive for status; standard and bipolar for valuationType. These are restrictions of xs:string.

The definition does need to include the possible values, counts, etc. So we need a StringEnum type.

ExtendedClassDefinition

There is but one of these: projectReference. This is a complexType and has a single node which is a complexContent.

    <xs:complexContent>
	<xs:extension base='xmcda:description'>
		<xs:attributeGroup ref="xmcda:defaultAttributes"/>
	</xs:extension>
    </xs:complexContent>

What does this actually mean in terms of what can appear. Where is the base xmcda:description defined. Appears to be just adding the attributes to the xmcda:description element.

SimpleSequenceType, Element

SimpleLement

From pmml

  <xs:element name="MatCell">
    <xs:complexType >
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute name="row" type="INT-NUMBER" use="required" />
          <xs:attribute name="col" type="INT-NUMBER" use="required" />
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

So this means we have a MatCell element with string content and 2 attributes - row and col.

ParameterFied has no simpleContent and just adds attributes.

  <xs:element name="ParameterField">
    <xs:complexType>
      <xs:attribute name="name" type="xs:string" use="required" />
      <xs:attribute name="optype" type="OPTYPE" />
      <xs:attribute name="dataType" type="DATATYPE" />
    </xs:complexType>
  </xs:element>

Level just adds attributes. Trend adds attributes but puts a restriction on the type to be a NMTOKEN with an enumerated value.

ClusteringModel is different.

  <xs:element name="ClusteringModel">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="MiningSchema"/>
        <xs:element ref="Output" minOccurs="0" />
        <xs:element ref="ModelStats" minOccurs="0"/>
        <xs:element ref="ModelExplanation" minOccurs="0"/>
        <xs:element ref="LocalTransformations" minOccurs="0" />
        <xs:element ref="ComparisonMeasure"/>
        <xs:element ref="ClusteringField" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="MissingValueWeights" minOccurs="0"/>
        <xs:element ref="Cluster" maxOccurs="unbounded"/>
        <xs:element ref="ModelVerification" minOccurs="0"/>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="modelName" type="xs:string" use="optional"/>
      <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
      <xs:attribute name="algorithmName" type="xs:string" use="optional"/>
      <xs:attribute name="modelClass" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="centerBased"/>
            <xs:enumeration value="distributionBased"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="numberOfClusters" type="INT-NUMBER" use="required"/>
    </xs:complexType>
  </xs:element>