The Text Analytics/ Predictive Analytics World conferences this month were eye opening. I felt that I was standing on the edge of an abyss and trying hard to both look down but not fall in. While predictive analytics is huge, especially in social media, text analytics is still looking for its opportunity. At the vendor expo, the most common answer to, “Do you support text analytics?” was a finger pointing to a room in a distant corner, “You belong there.” Of the few vendors open to conversation, only a few understood the challenge. Most said put copies of the documents in Hadoop. Information Governance is why that doesn’t work.
A guiding principle of Information Governance is adherence to retention and disposition policies with support for legal holds. This is done by bringing all of the content under the control of a system or systems that support these policies. Too often, content in an organization is stored in disparate locations. Files are stored on network drive, local drives, flash drives, cloud solutions, email, business solutions that store content, and enterprise content management platforms. The more collections that need to be addressed, the more difficult Information Governance is.
Maintaining the appropriate destruction schedules while still allowing those schedules to be overridden for open legal action is critical. This difficulty can end up costing an organization tens of thousands of dollars during litigation. Because of this, companies are looking to store content in as few controlled environments as possible.
Putting a copy of any content into a Hadoop repository would require Information Governance controls be built in. A copy of a document in a Hadoop repository cannot be ignored in eDiscovery, as eDiscovery does not ignore physical copies or even backup copies. A stray copy of a document can put a company at risk.
Data analytics is the polar opposite of this. Its idea is that all content should be kept forever because there is value in that content. This is contrary to retention aspects. There are arguments that an organization that “does no wrong” should be willing to keep every document forever. Most organizations look at the potential risk and delete content at their first possible opportunity.
The trend at AIIM16 was that many of the people asking how their company could take advantage of text analytics were records managers. There are plenty of use cases for text analytics for these retired documents. Information Professionals need to ensure that analytics does not contradict their organization’s requirements for Information Governance.
The intersections are increasingly complex…and not so simple to even articulate an understandable value proposition.
You’re right that Text Analytics is very complex. Much of the work being done is in its early stages. I think this time, as opposed to 6 or 16 years ago, we will start seeing more from this. It will start with auto-categorization but very quickly move from there. A Information Professionals, we need to be ready to make sure that Data Lakes don’t become a source of unnecessary risk to our organizations.
I posted some use cases earlier: