The idea of this document is to present some discussion of how the different types in an XML schema are mapped to R data types.
We'll focus on the XMCDA-2.0.0 schema initially.
x = readSchema("inst/samples/XMCDA-2.0.0.xsd") types = sapply(x, class)
Currently, the methodMessages complexType has a singe choice element and an attributeGroup reference. The choice has a min and max occurs of 0 and Inf/unbounded. So we can have a list of these elements. If they are compatible atomic types, e.g. strings, integers, etc. we could use a vector to hold them. Otherwise, we can use a list. We can have slots for the attributes. We want the attributes to be considered separately so we can convert non-strings values (e.g. integers, dates, dates and times) and maintain them in their natural type
readSchema() converts this to a UnionDefinition. (Not certain why at this point.)
The bibliography type is very similar as are all of the UnionDefinition objects for this schema. They have an annotation node, a choice, and some have an attributeGroup.
doc = xmlParse("inst/samples/XMCDA-2.0.0.xsd") nodes = sapply(names(x)[types == "UnionDefinition"], function(x) getNodeSet(doc, sprintf("//xs:complexType[@name='%s']", x, "xs"))) sapply(nodes, names)
What is the count for each of these
sapply(x[types == "UnionDefinition"], function(x) x@slotTypes[[1]]@count)
Both message and rankedLabel are represented as ArrayClassDefinition. message has an all and an attributeGroup. rankedLabel has just an all. This maps directly to a regular ClassDefinition, with possibly omitted values for some slots.
"preferenceDirection" "alternativeType" "valuationType" "status" are all of type character. These are enumerated string constants, e.g. active and inactive for status; standard and bipolar for valuationType. These are restrictions of xs:string.
The definition does need to include the possible values, counts, etc. So we need a StringEnum type.
There is but one of these: projectReference. This is a complexType and has a single node which is a complexContent.
<xs:complexContent> <xs:extension base='xmcda:description'> <xs:attributeGroup ref="xmcda:defaultAttributes"/> </xs:extension> </xs:complexContent>
What does this actually mean in terms of what can appear. Where is the base xmcda:description defined. Appears to be just adding the attributes to the xmcda:description element.
From pmml
<xs:element name="MatCell"> <xs:complexType > <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="row" type="INT-NUMBER" use="required" /> <xs:attribute name="col" type="INT-NUMBER" use="required" /> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
So this means we have a MatCell element with string content and 2 attributes - row and col.
ParameterFied has no simpleContent and just adds attributes.
<xs:element name="ParameterField"> <xs:complexType> <xs:attribute name="name" type="xs:string" use="required" /> <xs:attribute name="optype" type="OPTYPE" /> <xs:attribute name="dataType" type="DATATYPE" /> </xs:complexType> </xs:element>
Level just adds attributes. Trend adds attributes but puts a restriction on the type to be a NMTOKEN with an enumerated value.
ClusteringModel is different.
<xs:element name="ClusteringModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0" /> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="ModelExplanation" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0" /> <xs:element ref="ComparisonMeasure"/> <xs:element ref="ClusteringField" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MissingValueWeights" minOccurs="0"/> <xs:element ref="Cluster" maxOccurs="unbounded"/> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string" use="optional"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" /> <xs:attribute name="algorithmName" type="xs:string" use="optional"/> <xs:attribute name="modelClass" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="centerBased"/> <xs:enumeration value="distributionBased"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="numberOfClusters" type="INT-NUMBER" use="required"/> </xs:complexType> </xs:element>