In a recent post Joe Shepley examined the conflict between structured and unstructured approaches to data management. This is a bit of a recurring theme for me, primarily because I hate the term unstructured data. There are a number of reasons why but it boils down to the fact that there is no real line between the two. There are practical distinctions but more often than not the terms are used by one group or the other to limit their own scope and throw problems they don’t want to deal with at another box on the org chart.
No content is unstructured. Some content simply lacks the uniformity of rectilinear (boxlike) structures. Design decisions on how best to manage content should hopefully be made on pragmatism but too often are based on habit. All content management systems have access to and manage structured data about and in content. It is simply more efficient to treat some data as too complex to manage. Therefore you trust that unit of work to a consuming application or a human mind.
I much prefer the terms structured and complex as I think this pair better represent the technical challenge.
One might argue that a raster image (tiff) of a document, as a simple one dimensional representation of a higher form lacks structure. The file formats for this and similar data have a great deal of structure as do the rendered images themselves. It is true that the ability of the document to communicate to a machine its own purpose by simple means is lost when the data fields become ink on a page and that are transformed into pixels in a really long string embedded in a file.
Content often has a highly complex structure that is beyond the capabilities of mere databases. Unstructured data as a term however is a myth and content is anything but simple. Believing this myth can lead to ignoring the complexity of what we call content and this denial of structure is often the root cause for the difficulties we face in implementing systems to manage it.