Coming out of the April 2016 AIIM conference, I’ve had a renewed interested in text analytics. There were a few sessions and a growing interest in text analytics. It seemed that Information Professionals were being asked by their managers for examples to find value in their content repositories. Unfortunately there were no use cases to be found. So I went to Text Analytics World to find some use cases.
First, Text Analytics World (TAW) was a co-located conference with three others: Predictive Analytics World (PAW), PAW Business and PAW Manufacturing. TAW was really more of a summit. The TAW audience was the smallest, but the thirty attendees were actively participating in the presentations. The discussions were on the real uses of Text Analytics, not just theory. There were even four vendors who presented: Kaypok, Expert Systems, Revealed Context, and InterSystems. What was most interesting was that I was not the only one from the ECM community in attendance.
So here’s what I learned:
A Use Case for Predictive Corrective Actions
Text analytics is being used to read narratives of corrective actions or maintenance reports to identify and predict problems. They are being used to identify terms and trends that might indicate a larger problem. By reviewing a collection of mechanical reports, new problems can be identified that can be sent for review. This can identify issues that could be far more serious if not discovered earlier. This is being used with both mechanical and human issues.
A Use Case for Fraud Detection outside Finance
The use of text analytics for financial fraud detection is well documented. Those similar solutions can be developed outside finance. It can be used to identify fraud in insurance or any other claims solutions. By reviewing an entire collection, similarity between documents could identify copied claims reports. It can be used to find similar narratives in reports that should be reviewed.
A Use Case for Health and Human Services
The most interesting use cases were around using text analytics to identify common causes in healthcare or social services. Text analytics is being used to review case files to identify similar themes in individual situations. For instance, text analytics is used to identify possible sources of disruptive student’s behavior or to identify common symptoms that might indicate a larger health problem.
Text Analytics is Language Agnostic
Today’s text analytics are less about the meaning of words and more about the relationships of words. Solutions using approaches like word2vec and doc2vec (more on these in a future article), create relationships amongst words. Solutions that were developed originally with English in mind were deployed against Chinese content. This led to discussions about using text analytics with jargon, slang, and “gang” language. There’s even work on using text analytics for author identification.
Predictive and Data Analytic Vendors Don’t Get Content
At the joint expo, there were a dozen vendors and only one spoke content. (The vendor that understood text analytics was Statistica.) The two ways most vendors looked to address content were to store it in the database or copy it into Hadoop. Every Information Professional knows neither of those are the answer. I should have kept count of which response I gave more: 1. content in a database row slows down the database and inflates the file size, or 2. copying documents into another silo creates another digital dumpster that needs to be controlled. My longest debate was with a vendor that insisted every document be copied into Hadoop. Usually the vendors just said they didn’t know and sent me to the little room where TAW was located for answers.
I wish I had been able to attend both days of Text Analytics World. There was a lot more to learn but I had a lot of information to digest as well. We’re right at the tipping point for text analytics. Today, there are a handful of people that really understand text analytics’ potential. The use cases are emerging. Vendors are right there. Text analytics is not common today, but it will be here very soon. As Information Professionals, the race hasn’t begun, but it’s time to put on our running shoes.