I wouldn't call myself a DBA, but I have done enough late night
babysitting of production databases to have an idea how my design and
architectural decisions impact the database the least scalable of any
part of the architecture. So in my opinion XML - as a bloated,
misunderstood, misinterpreted, over-hyped and often badly implemented
technology should be nowhere near the database. End of discussion no
budging. Holding such a view won't get me fired and will win the
welcome support of the production DBAs who could always point fingers
at my bad SQL syntax and data structures as a reason for poor
I confess that I have used XML in the database in my latest project.
There, I have said it and in the company of expert DBAs who have had enough of people like me putting all sorts of crap into the database. Amongst such company I had better come up with an explanation and fast. Allow me to do so.
There are two situations where XML in SQL 2005 comes in quite handy. They are when implementing classifications and temporal data designs.
One thing that is missing in SQL databases is a construct to handle
classifications. It is a problem that designers always encounter and the
mechanisms for the physical implementations vary extensively.
What do I mean by a classification? Consider a system that has an order and a customer table, where the key from the customer is put into the order table simple normalization. Now what if your customer can be a person, with a first name, surname and date of birth or it can be an organization, with a name and company registration number? The problem is that it doesn't feel right to create a single table with all of the attributes (breaking some relational model rules in the process) and it is difficult to implement separate [Person] and [Organization] tables.
There are two basic approaches, either a roll-up or a roll-down approach.
In the roll-up approach a single table is created with all of the fields and discriminator or classification attributes to distinguish between the classes. The individual classes can then be implemented as views as follows:
CREATE TABLE Stakeholder(
CREATE VIEW Person
SELECT StakeholderId, Firstname, Surname, DateOfBirth
CREATE VIEW Organization
SELECT StakeholderId, Name, RegistrationNo
In the roll-down approach multiple tables are created with their correct
attributes and the 'leaf' tables are unioned together for an index table as
CREATE TABLE Person(
CREATE TABLE Organization(
CREATE VIEW Stakeholder
SELECT StakeholderId, Firstname+' '+Surname AS Name, CAST(1 AS bit) AS isPerson, CAST(0 AS bit) AS isOrganization
SELECT StakeholderId, Name, CAST(0 AS bit) AS isPerson, CAST(1 AS bit) AS isOrganization
Combinations of the two approaches also exist where some fields are created on
an index table and other fields exist only on the leaf tables. Things get a bit
complicated when implementing complex classification structures. What if some
persons are employees (add an employee number) and what if some employees are
contractors (add contract period)? Not only do complex structures become
difficult to implement but CRUD across index tables and views becomes a pain.
I have implemented such mechanisms on large databases quite successfully - such as a bank that had 14 million customers. But it can become quite complex and I was looking for a simple classification mechanism that would be reasonably easy to implement, not contain excessive tables or attributes and would be able to be extended or replaced. Enter the XML temptress
I create a simple index table with a column to store all the class-specific attributes as follows:
CREATE TABLE Stakeholder(
StakeholderId int IDENTITY(1,1),
Since my application data access layer uses only sprocs to access the database, some insert and update sprocs for the person and the organization need to be written. For example with person:
CREATE PROCEDURE InsertPerson
DECLARE @Extended xml
IF (@DateOfBirth IS NOT NULL)
DECLARE @Dob nvarchar(10)
INSERT INTO Stakeholder(Name,ClassAttributes,isPerson)
Executing the above sproc like this:
EXEC InsertPerson 'Joe', 'Soap', '1 Jan 1980'
Results in a record with the basic information and an XML structure that neatly contains all of the other bits of information and would store the following XML:
Notice the use of the XQuery insert that only adds the attribute if it is not
null, resulting in neater looking XML data.
A similar sproc for organization would store XML something like this:
My individual [Person] and [Organization] tables are implemented as views like this:
CREATE VIEW Person
SELECT StakeholderId, ClassAttributes.value('(/person/firstName)', 'varchar(50)') AS FirstName,
ClassAttributes.value('(/person/surname)', 'varchar(100)') AS Surname,
ClassAttributes.value('(/person/dateOfBirth)', 'datetime') AS DateOfBirth,
ClassAttributes.value('(/person/title)', 'varchar(10)') AS Title
WHERE (isPerson = 1)
CREATE VIEW Organization
SELECT StakeholderId, Name,
ClassAttributes.value('(/organization/organizationTypeId)', 'int') AS OrganizationTypeId,
ClassAttributes.value('(/organization/registrationNo)', 'varchar(20)') AS RegistrationNo
WHERE (isOrganization = 1)
The views are an interesting implementation in that from a relational model point of view they are valid relations and the syntax to use them will be standard SQL. Consider the query where we want to search on the name of a stakeholder, but with people we need to query the surname and on organizations we need to query the name. The following query, even though it has XML innards is a perfectly valid and understandable query.
SELECT StakeholderId, Name
WHERE Name LIKE 'S%'
SELECT StakeholderId, Surname
WHERE Surname LIKE 'S%'
There are other ways to query the XML directly using XQuery but I want to
stay as close to ANSI 92 syntax as possible.
Even though we are storing non-relational data in our SQL database we don't really break that many relational rules. Technically the relational model states that the storage of data is independent of the model so, the argument that the use of views is non-relational is invalid (sort of) - if [Person] and [Organization] are implemented as views, which are valid relations, then we are not breaking any rules.
By now, any real DBA would be thinking This guy is frikkin insane, it will perform like a dog! This is both a correct and incorrect thought no, I am not frikkin insane and yes, performance can be an issue. I would not recommend this approach if you have a structure with millions of records, which would be the case with the stakeholder structure in large enterprises. But what about structures with fewer rows, even in the tens or hundreds of thousands? Maybe a product classification or retail outlet classification would perform adequately? You may also notice in this example that Name is redundant, since it is contained in the XML anyway this has been done on purpose for performance reasons since most of the queries only want the name which is a common attribute, so there is no point in mucking about with the XML.
Another key aspect with regard to performance is understanding the interfaces. In my particular implementation, if I wanted to create proper fields for the attributes there would be no far-reaching impact. The interfaces to the sprocs wouldn't change and the fact that I may have replaced the Person view with a table would make no difference to existing SQL.
SELECT o.OrderId, o.PartId, o.Quantity, o.Cost, p.Name, p.StockAvailable
FROM OrderItem o INNER JOIN Part p ON p.PartId=o.PartId
As with most requirements it is not specific enough and should read "For
ordered items, display the part name and the stock on hand when the order was
The problem with the above query is that part information, particularly the available stock, changes continuously and the query doesn't take this into account. Many similar problems exist with data that is date dependant (temporal data). Again, there are many ways to resolve this problem you could have [Part] movement records and join based on the order date to the movement tables, or you could denormalize your structures and create an [OrderItem] table with the [Part].[Name] and [Part].[StockAvailable] values in fields on [OrderItem]. The most common approach by far is to do neither and land up with all sorts of issues relating to the temporality of the data which puts the integrity of the entire database in question by the users.
Generally, unless I need to create a specific structure to handle temporality where it may be important I tend to take the easy route and denormalize my structures. The problem is figuring out how many of the fields from [Part] should be stored on [OrderItem] to handle all the various combinations of queries that the users may think up in future. Also, it looks ugly when [Part] fields are reproduced on the [OrderItem] table apart from breaking a few relational model rules along the way.
In a recent system there was a need to store some 'snapshot' temporal data, but since other parts of the system were not specified, never mind developed, we were unsure which fields to store the solution was to store most of them in an XML field and worry about it later.
So in the above example I would create a table something like this:
CREATE TABLE OrderItem(
With sprocs to handle the inserts and updates,
CREATE PROCEDURE InsertOrderItem
DECLARE @PartStore xml
INSERT INTO OrderItem(OrderId,PartId,Quantity,Cost,PartStore)
This would create a store of temporal data something like this
When querying data I simply look in the XML for the part data that I need,
SELECT OrderId, PartId, Quantity, Cost,
PartStore.value('(/part/name)', 'varchar(50)') AS Name,
PartStore.value('(/part/stockAvailable)', 'int') AS StockAvailable
And as per the classification examples above, I can wrap the queries into nicely
named views if I prefer.
The plan is that over time, provided my interfaces remain the same, I can add Part attributes directly to the OrderItem table if needed. This can be done on a production database and I would just need to alter the table,
ALTER TABLE OrderItem ADD PartName varchar(50)
SET PartName=PartStore.value('(/part/name)', 'varchar(50)')
and change any sprocs or views that reference the table all very backwards compatible.
I would still be in line violently opposing persisting objects as XML in the database as object oriented bigots would be tempted to do, after all, they say that the database is just a persistence mechanism for their objects. I can picture object orientation bigots advocating a database of one table with two attributes, ObjectId (GUID) and ObjectData (XML) such an image is concerning.
However, I hope that I have demonstrated that if carefully thought through that
XML in the database can be useful and elegant provided that it is done within
the context of the overall architecture. The biggest issue with XML in the
database is understanding when performance becomes the overriding factor as to
where and how data is stored after all it is mostly the performance of
databases that your DBA has to deal with anyway.
Simon Munro is currently pursuing the Microsoft Certified Architect (MCA) certification and maintains a blog at http://www.delphi.co.za/ that covers many interesting ideas on using Microsoft technologies.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/1697933/viewspace-906947/，如需转载，请注明出处，否则将追究法律责任。