The introduction of the Monitoring groups in the AGS data interchange format was a big step forward. Instead of creation of different data groups for different time-related monitoring instruments (piezometers, slope inclinometers, settlement gages, etc.), one set of data groups was created to accommodate all such data. This made the format expandable with just the addition of new instrument types in the pick list. This avoided creation of new groups and issuance of updated formats.
The DIGGS format currently has taken an approach opposite of the Monitoring groups in the AGS. There exist multiple sample and hole construction tables and plans to expand this multiple table idea to other types of data that are currently stored in one table. The tables in each type share common fields (roles, specifications, etc.) but differ in type-specific fields. For example, driven or pushed samples have length and diameter fields whereas block samples have height, width, and length fields. This structure allows validation rules that are easily constructed and are automatically enforced by the schema.
There are a number of problems with this approach. The first is that of perception. The DIGGS format already has a large number of tables. With this approach the number of sample tables went from one to five. Selling the DIGGS format to the user community is difficult enough without the bloat that this methodology will create.
The second problem is that if new sampling methods, hole construction types, etc. are required the schema will need to change with the addition of new tables. Maintainability is crucial to the design of any data interchange standard and this methodology could create a maintenance problem. Granted that the currently established tables are quite good in terms of probably covering all possible types but there is no guarantee that is the case.
The third problem is one of non-validating data. We have worked with thousands of clients and I would say the vast majority do not, for example, record the sample diameter. Is this good database design? No. Should these data be deemed invalid? Also No. Even if this was made a requirement for all new projects transmitted with DIGGS, what about legacy data? This is just one example of the kinds of problems that can be encountered with strict enforcement of these types of validation rules with legacy information.
The fourth problem is the added difficulty in performing other validation. For example, is it reasonable that a hole was advanced using an excavation from 0 to 3 meters and then augered from 2 to 8 meters? Having that information in two tables makes validation more difficult.
In summary, this approach increasing complexity, leads to possible maintenance issues, may prevent the transmittal of some legacy information, and make some types of validation difficult with only the one, minor advantage of being able to easily set up a few validation rules.
Our proposed alternative is to revert to the structure of one table for one class of data, that is, one sample table, one hole construction table, etc. Each table would contain all the commonly used fields that are currently contained in the multiple tables. Therefore, pushed samples would use the length and diameter fields and not the width and height fields used by the block samples.
For maintenance, each of these tables would also contain a complex type field that is made up of value triplets: Data type, value, units. This allows easy expansion of the types of information that can be transmitted without changing the schema.
Finally, conditional validation rules can be written in Schematron. Therefore, the same rules that would be used by the multiple table approach could be enforced. This has the disadvantage of writing code external to the schema but this will have to be done in many other places in the format. Also, this code writing does not impact the user. Whether rules are written in Schematron or automatically handled by the schema is an irrelevant fact to the user. Without sacrificing data integrity, the format needs to insulate the end user from unnecessary complexity.
To have to make the format more complex and less maintainable just to avoid writing some one-time external rules is not justified.