I’ve been in a lot of meetings recently where I’ve found myself in a conundrum. As you’ve seen from a lot of my posts I proclaim that content management is an easy thing to understand. I really do believe that it’s true. But then I’m confronted with developers that want to build their own content management functionality. (You won’t see me calling it “platform”.) What’s up with that. Content management is EASY, but not that easy to build.
Let me explain it this way. If you wanted to travel from east coast to the Pacific Ocean, you have several ways to do it. One of those is following in the footsteps of Lewis and Clark and blaze your own trail. But is it the first choice? Why not follow an existing path?
Surveying the Landscape
We’re in an interesting position today. Content management, honestly, at its root is simple management of an index of files with metadata and the pointer to those file location. Plain and simple that is it.
That’s what I built in ’90s to manage the documents associated to processing loans for used cars for banks. I had a simple dBase (later FoxBase) application that maintained a record on each loan and then had a file reference number associated to it. It worked well too, that is until we started running loans from multiple locations. Well yes, my problem was based on physical realities but my issues are the same. What looks simple in prototypes or small scale become complex at the enterprise.
I’ve been told, “in it’s simplest forms an index of files is easy.” I’ve heard, “you have two choices; index and blobs.” And of course my favorite, “there really is no value in existing platforms.” Oh really? Ok I’ll agree, but not at the scale I’m talking about and, if you’re reading this blog, the scale that you’re interested in. We’re talking about hundred plus documents are we not?
Most developers rarely look at what they see as extremes. You see to many developers a thousand documents are “a lot.” Those of us that have been in ECM know that it’s just the tip of the iceberg. Just do the math and you’ll often find out what you think are real numbers are really LOW estimates.
Blazing a Trail
Some looking down the path of building ECM have already recognized the fact that maintaining relationships between content and metadata is complex. Even Oracle recognized this.
Simply maintaining relationships between content and metadata can be complex. Changing metadata and content files may require two separate indexes, one for metadata one for files. (I’m not giving up the recipe to the secret sauce.) These relationships can be complex to handle.
Ignoring this, you run into is the issue about performance and what can be done with the number of files in a single directory. In some cases you even run into issues with the number of files in a system. If you think about a company with 40,000 employees and each have 25 documents, that’s 1,000,000 objects. Oracle thought they resolved that issue with blobs.
Blobs are a great resolution to managing files. Just simply add them as rows in documents. But an average file size may have been 120k a while back but these days it’s closer to 500k that means that, using the numbers above, 500 gigs of a single database file is now associated to binary file data. Any Oracle DBA willing to step up and state what happens in an RDBS that big?
Follow the Maintained Trail
Today there is no reason not to build on an existing platform. We were in the same situation a few years back but few followed it with ODMA but it still required joining a vendors partner program. But now, existing content management platforms are making it easier and easier to build on them by giving access to developer software for free and focusing on standards.
Most ECM vendors have recognized that their growth is tied to other software vendors building their solution on top of them. Content management vendors recogize that they are a better fit as a platform. They are now looking for other software vendors to build the last mile to the end user. Alfresco and Documentum, and Oracle are leading the way.
Alfresco has become the open source standard. Being open source their code is made available to all. Instead of writing code from scratch, why not simply “borrow” the Alfresco code line. As they say good programmers don’t create the borrow. Add to this a large development community continuing to build new functionality and there’s even more reasons to consider them.
Then there’s Documentum, a solid vendor in the space with a huge customer base. Of course their partner program is expensive right? Not the case, exactly. Sure to become a partner there’s a cost but if you want to try out the code line, it’s FREE. As of July, anyone is able to download a fully functioning developers environment of Documentum for free. Not only does this mean that you have access to the full bredth of technology but once you’ve developed your solution, you have access to an existing user community. If you’re interested in understanding what’s possible with the platform, Pie has just posted on his review.
Oracle also makes its content management platform available for download. This of course was an obvious strategy from Oracle. They changed the relational database market with their focus on ODBC which completely changed the dynamics on how software vendors worked with their RDBMS. This same change is what I believe is underway in the content management market.
But which platform to use; Alfresco, Documentum, Oracle (Content Manager, FileNet, etc.)? If you don’t want to make a choice between vendors you don’t have to. CMIS, Content Management Interoperability Services, is a universal API set being published by AIIM to allow customer to write code once to access several content management platforms. Lee looked at CMIS last year.
Personally I think this is much more valuable to vendors looking to develop solutions themselves rather than the content management customer base. CMISbmeans that vendors can write code once to CMIS standard and it will allow CM functionality within any platform that supports the standard. And CMIS has already been shown to work with Alfresco and Documentum, thanks again Pie, at the AIIM conference. In addition to Alfresco and Documentum both IBM and Microsoft have also stated they will support the standard.
Why Build an ECM System? Because I Can.
I’m sure several of you out there still want to build your own CM functionality. Sure it’s fun to prove you can do it but let’s get down to business.
CM functionality that will require two developers, one primary and one backup. There’s two salaries, though my gut says more like three or four developers. You’ll also need to make sure that you can tie the functionality into the rest of your solution. Let’s say at all the integration points this take the equivalent of one half a person over the year. Of course developers can’t provide support. I’ll be a little more forgiving this this time and say 1 and a half resources to support. This means you now have four people supporting ECM functionality. And to make it easy, let’s say one resource (fully loaded) is $100,000. That’s $400,000 each year to support and maintain CM functionality.
Going down the integrated path, you need the support around the integration points. This is the same building it yourself as it is to integrate. But to be less optimistic, let’s say one full resource. Support, of the ECM, is handled by the vendor so no resources should be needed. So from a resource perspective we’re talking about $100,000. In my opinion, this will be less year two as your not maintaining the code line. You will have the partnership fees but what’s interesting is that this is often balanced by access to an existing customer base.
Learning Lessons the Hard Way
Five years ago I got asked to join an interesting project, to do of all things QA testing. I had no desire to do the work but the job was interesting in two fronts. For one I got to work in in a major metropolitan (protect the name) but the really interesting part was I was there to help validate a new custom ECM solution. My friend knew I had been in ECM for years and he knew what was being built would not scale. My job? To help prove it.
It ended up that being in QA was the right place. So in between my test scenarios for this button does that, I added a few of my own. “What happens if I load a bunch of documents?” (Emulating real world capacity.) “What happens if a lot of people try to write their files at once?” (Like the 5:00 peak.) And as would be expected most failed, some were fixed except one.
I had discovered one hole in the system that I could not even replicate myself. In the end I think it happened because someone else was testing the same area I was and we overlapped. It was a serious problem which was ignored. It was only rediscovered after one month into production (and I was long gone). Interestingly enough this failure ended up in the ultimate removal of the application. The costs? Three years of development hours and an ultimate delay of the release by one year.
It Just Makes Cents (and Dollars)
Not only will building on existing platform save you development dollars but it can also get you to the finish line first. CMS has it’s lessons to be learned, so why learn them the hard way. Also it is INEVITABLE that at some point you will be asked to integrate with an existing platform, so why not do it first? You can choose to imbed an open source platform, partner with an existing tier one vendor, or follow an existing standard. In the end you’ll be saving time and money.
Still need more reasons? If one of your developers recommended building an RDBMS from scratch, what would your response be?