Use of Apache SOLR , Apache Spark and OCR for Text Mining and Search capability for business process improvement and Advanced Analytics

Showcase how to use OCR - Optical Character Recognition technology along with Apache SOLR Search and Apache Spark to utilize text mining capabilities. A very common scenario is to be able to index and search text in image files that were scanned in, for example patient charts, legal documents, etc. In this session we will demonstrate how to use OCR technology to convert scanned documents (jpg, gif, tiff,etc.) to text documents. The converted result text data than can be stored in a HIVE, HBase, SOLR and than can be used further for Data Analysis and Exploration. We will demonstrate how to Apache Spark to text mine the data.

https://hadoopsummit.uservoice.com/forums/344964-application-development/suggestions/11663055-use-of-apache-solr-apache-spark-and-ocr-for-text

Alex's Blog

Pages

Wednesday, March 16, 2016

I am presenting at NJ Data Science - Meetup/User Group in Princeton - Thursday, March 17, 2016

Thursday, March 17, 2016

How to build Recommendation Engine using Apache Spark, Apache Zeppelin on Hortonworks HDP Platform

Tuesday, February 2, 2016

Please vote for my Hadoop Summit Talk

Use of Apache SOLR , Apache Spark and OCR for Text Mining and Search capability for business process improvement and Advanced Analytics