Opinions and discussion on content management and document management by two of the biggest guys in the business. *Measured by combined weight

ECM Take Away from Text Analytics World

Follow Marko

Follow Lee

  • random thought-if you want more “data scientists”-don’t call them “scientists”-find a term thats not so stuffy. like wrangler or pirate 6 days ago

Disclaimer

The opinions shared here represent those of the contributor themselves and not those of their employers nor that of Big Men On Content as a whole.

Coming out of the April 2016 AIIM conference, I’ve had a renewed interested in text analytics.  There were a few sessions and a growing interest in text analytics.  It seemed that Information Professionals were being asked by their managers for examples to find value in their content repositories.  Unfortunately there were no use cases to be found.  So I went to Text Analytics World to find some use cases.

First, Text Analytics World (TAW) was a co-located conference with three others: Predictive Analytics World (PAW), PAW Business and PAW Manufacturing.  TAW was really more of a summit.  The TAW audience was the smallest, but the thirty attendees were actively participating in the presentations.  The discussions were on the real uses of Text Analytics, not just theory.  There were even four vendors who presented: Kaypok, Expert SystemsRevealed Context, and InterSystems. What was most interesting was that I was not the only one from the ECM community in attendance.

So here’s what I learned:

A Use Case for Predictive Corrective Actions
Text analytics is being used to read narratives of corrective actions or maintenance reports to identify and predict problems.  They are being used to identify terms and trends that might indicate a larger problem.  By reviewing a collection of mechanical reports, new problems can be identified that can be sent for review.  This can identify issues that could be far more serious if not discovered earlier.  This is being used with both mechanical and human issues.

A Use Case for Fraud Detection outside Finance
The use of text analytics for financial fraud detection is well documented.  Those similar solutions can be developed outside finance.  It can be used to identify fraud in insurance or any other claims solutions.  By reviewing an entire collection, similarity between documents could identify copied claims reports.  It can be used to find similar narratives in reports that should be reviewed.

A Use Case for Health and Human Services
The most interesting use cases were around using text analytics to identify common causes in healthcare or social services.  Text analytics is being used to review case files to identify similar themes in individual situations.  For instance, text analytics is used to identify possible sources of disruptive student’s behavior or to identify common symptoms that might indicate a larger health problem.

Text Analytics is Language Agnostic
Today’s text analytics are less about the meaning of words and more about the relationships of words.  Solutions using approaches like word2vec and doc2vec (more on these in a future article), create relationships amongst words.  Solutions that were developed originally with English in mind were deployed against Chinese content.   This led to discussions about using text analytics with jargon, slang, and “gang” language.   There’s even work on using text analytics for author identification.

Predictive and Data Analytic Vendors Don’t Get Content
At the joint expo, there were a dozen vendors and only one spoke content.  (The vendor that understood text analytics was Statistica.)  The two ways most vendors looked to address content were to store it in the database or copy it into Hadoop.  Every Information Professional knows neither of those are the answer.  I should have kept count of which response I gave more: 1. content in a database row slows down the database and inflates the file size, or 2. copying documents into another silo creates another digital dumpster that needs to be controlled.  My longest debate was with a vendor that insisted every document be copied into Hadoop.  Usually the vendors just said they didn’t know and sent me to the little room where TAW was located for answers.

Summary
I wish I had been able to attend both days of Text Analytics World.  There was a lot more to learn but I had a lot of information to digest as well.  We’re right at the tipping point for text analytics.  Today, there are a handful of people that really understand text analytics’ potential.  The use cases are emerging.  Vendors are right there.  Text analytics is not common today, but it will be here very soon.  As Information Professionals, the race hasn’t begun, but it’s time to put on our running shoes.

Tagged as: , , , , , , , , , , , ,

Categorised in: aiim16, Enterprise Content Management, TAW, Text Analytics

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: