This article will define exactly what is meant by a Sample and a Specimen, how they differ in meaning within DIGGSML and how they relate to each other. This article will explain some of the methods of transferring Sample and Specimen data that were considered for DIGGSML and detail the method finally chosen and the reasons behind its choice.
I will also include some best practices for companies implementing DIGGSML in the real world to help make the use of DIGGSML as productive as possible.
What exactly are Samples and Specimens? How do they differ? How are they related to each other?
At its very base level, in DIGGSML a Sample is defined as:
This Sample may be later turned into one or more Specimens, each of these Specimens will be defined as:
Applied to the real world that is, the Sample is what you take from the ground, the Specimen is the piece of the Sample (or Samples in the case of an amalgamated Specimen) that the laboratory uses to conduct a single specific test.
Over the lifetime of DIGGSML the problem of how to locate Sample and Specimen objects within the file has been tackled in many ways, each of which in turn has bee seen to show problems in one way or another. After a prolonged consultation the most recent method has been produced which seems to solve all of the referencing problems without offering any of the drawbacks of the previous implementations.
The proposal in DIGGSML v0.8 is to locate Specimens in the project directly and keep Samples as children of their associated Hole as so:
01<project> 2 <subsurface> 3 <Hole> 4 <!-- hole information here --> 5 <samples> 6 <Sample> 7 <id>1296548</id> 8 <!-- sample information here--> 9 </Sample> 10 </samples> 1 </Hole> 2 </subsurface> 3</project> 4<specimens> 5 <specimen> 6 <!-- specimen information here--> 7 <origin> 8 <Ref xlink:href="//*[./id='1296548']" amount="10%" /> 9 <Ref xlink:href="//*[./id='1296549']" amount="90%" /> 20 </origin> 1 <laboratoryTesting> 2 <!-- test data here --> 3 </laboratoryTesting> 4 </specimen> 5</specimens>
This is an important change to note as it goes against DIGGSML's general practice of following the "down the hierarchy" model, placing the Specimens directly inside the Project, independently of their associated Samples.
This change came about to aid the DIGGSML's integration between the various links in the data chain, as Roger pointed out in his earlier article (The Importance of AGS Key Fields for Sample Data) laboratories often prefer only to use one field to identify a Sample, this method allows the Specimen results reported by laboratories to be reported on their own, without the need for the whole Sample information, just its identifier. This does not fit in the conventional hierarchy method since there would be no link from Sample to Specimen unless the laboratory were reporting Sample data (which may not even be known when the Sample is sent to the laboratory!).
Making this change allows laboratories to create Specimen objects including their test results without having to report anything other than the identifier of the Sample that Specimen was created from, as illustrated in the following example.
1<Diggs> 2 <specimens> 3 <specimen> 4 <!-- specimen information here--> 5 <origin> 6 <Ref xlink:href="//*[./id='1296548']" amount="10%" /> 7 <Ref xlink:href="//*[./id='1296549']" amount="90%" /> 8 </origin> 9 <laboratoryTesting> 10 <!-- test data here --> 1 </laboratoryTesting> 2 </specimen> 3 </specimens> 4</Diggs>
It was felt that this was the most robust and simple way to solve the sample referencing (and orphan or phantom generation) problem as best as a transfer format can, the various previous implementations had flaws, some were obvious, others less so, this was felt to be the best method of solving those many problems.
At the time I first saw the DIGGSML schemas (v0.5) the Sample element sat inside the Hole element and Specimens sat inside Sample. Since one Hole contained many Samples and one Sample was split into many Specimens this was acceptable.
1<project> 2 <subsurface> 3 <Hole> 4 <!-- hole information here --> 5 <samples> 6 <Sample> 7 <!-- sample information here--> 8 <specimens> 9 <Specimen> 10 <!-- specimen information here--> 1 <laboratoryTesting> 2 <!-- lab test results here --> 3 </laboratoryTesting> 4 </Specimen> 5 </specimens> 6 </Sample> 7 </samples> 8 </Hole> 9 </subsurface> 20</project>
One problem with the v0.5 implementation was that there was no way of illustrating that a Specimen was made up from more than one Sample.
The DIGGSML v0.6 implementation built on the DIGGSML v0.5 implementation, adding an AmalgamationOf tag, enabling you to link more than one Sample together into an amalgamated sample.
1<project> 2 <subsurface> 3 <Hole> 4 <!-- hole information here --> 5 <samples> 6 <Sample> 7 <!-- sample information here--> 8 <specimens> 9 <Specimen> 10 <!-- specimen information here--> 1 <amalgamationOf> 2 <Ref xlink:href="#1296548" quantity="80%" /> 3 <Ref xlink:href="#1296549" quantity="20%" /> 4 </amalgamationOf> 5 <laboratoryTesting> 6 <!-- lab test results here --> 7 </laboratoryTesting> 8 </Specimen> 9 </specimens> 20 </Sample> 1 </samples> 2 </Hole> 3 </subsurface> 4</project>
This implementation still has problems, when a laboratory performs testing on a Sample, they create a number of Specimens, but in order to report that testing information (and associated Specimens) back to the source they need to keep track of the Holes and all the associated data, there is no facility to report just the Samples.
The current DIGGSML v0.7 schema's build on the v0.6 implementation, but also allow for a Project to contain Samples, without their Hole information, this enables Laboratories to report testing without having to know the entire structure of the exploratory hole. That Hole would contain references to its Samples as required to maintain the structural integrity of the file.
1<project> 2 <subsurface> 3 <Hole> 4 <!-- hole information here --> 5 <samples> 6 <Ref xlink:href="sample1" /> 7 </samples> 8 </Hole> 9 </subsurface> 10 <samples> 1 <Sample> 2 <!-- sample information here--> 3 <specimens> 4 <Specimen> 5 <!-- specimen information here--> 6 <amalgamationOf> 7 <Ref xlink:href="#1296548" quantity="80%" /> 8 <Ref xlink:href="#1296549" quantity="20%" /> 9 </amalgamationOf> 20 <laboratoryTesting> 1 <!-- lab test results here --> 2 </laboratoryTesting> 3 </Specimen> 4 </specimens> 5 </Sample> 6 </samples> 7</project>
However, this implementation is also not without problems, if the laboratory is reporting Sample and Specimen data back to source this means the laboratory still needs to track the Sample information (type in the form of the object name, top etc) in order to add the Specimens to the Sample itself.
The big problem with nesting Sample inside Hole (or independently at Project) is the assumption that this Sample data is actually known!
There may be situations where the Sample is taken, an identifier is assigned (in the form of a uniquely numbered label) on site and the Sample is sent straight to the laboratory for immediate testing, before the electronic data is even produced. In that case the laboratory cannot report Sample data along with any Specimen data they produce, since that Sample data is not electronically available even to the engineer, let alone the laboratory!
It is this exact situation that causes the common scenario where "Phantom" samples are created.
How best to implement Sample data transfer in DIGGSML depends upon whether you are the sample producer or the sample receiver, each role will be addressed in turn.
It is this identifier that never changes for the Sample's life (in the examples above 1296548 and 1296549 are used to identify two samples). It is this identifier that the Sample receiver must export with the Specimen information after conducting testing.
However if Sample Producers and Sample Receivers both stick to the best practices outlined above errors associated with Orphan Samples or Phantom Hole records will be reduced as far as possible, as far as is possible for any data transfer format.