Pages

Tuesday, February 2, 2016

Please vote for my Hadoop Summit Talk

Please vote for my Hadoop Summit Talk

Use of Apache SOLR , Apache Spark and OCR for Text Mining and Search capability for business process improvement and Advanced Analytics

Showcase how to use OCR - Optical Character Recognition technology along with Apache SOLR Search and Apache Spark to utilize text mining capabilities. A very common scenario is to be able to index and search text in image files that were scanned in, for example patient charts, legal documents, etc. In this session we will demonstrate how to use OCR technology to convert scanned documents (jpg, gif, tiff,etc.) to text documents. The converted result text data than can be stored in a HIVE, HBase, SOLR and than can be used further for Data Analysis and Exploration. We will demonstrate how to Apache Spark to text mine the data.