Use of Apache SOLR , Apache Spark and OCR for Text Mining and Search capability for business process improvement and Advanced Analytics
Showcase how to use OCR - Optical Character Recognition technology along with Apache SOLR Search and Apache Spark to utilize text mining capabilities. A very common scenario is to be able to index and search text in image files that were scanned in, for example patient charts, legal documents, etc. In this session we will demonstrate how to use OCR technology to convert scanned documents (jpg, gif, tiff,etc.) to text documents. The converted result text data than can be stored in a HIVE, HBase, SOLR and than can be used further for Data Analysis and Exploration. We will demonstrate how to Apache Spark to text mine the data.
No comments:
Post a Comment