Pages

Wednesday, March 16, 2016

I am presenting at NJ Data Science - Meetup/User Group in Princeton - Thursday, March 17, 2016

I am presenting at  NJ Data Science - Apache Spark

meetup group in Princeton 

Thursday, March 17, 2016




How to build Recommendation Engine using Apache Spark, Apache Zeppelin on Hortonworks HDP Platform


Hope to see you there!


Tuesday, February 2, 2016

Please vote for my Hadoop Summit Talk

Please vote for my Hadoop Summit Talk

Use of Apache SOLR , Apache Spark and OCR for Text Mining and Search capability for business process improvement and Advanced Analytics

Showcase how to use OCR - Optical Character Recognition technology along with Apache SOLR Search and Apache Spark to utilize text mining capabilities. A very common scenario is to be able to index and search text in image files that were scanned in, for example patient charts, legal documents, etc. In this session we will demonstrate how to use OCR technology to convert scanned documents (jpg, gif, tiff,etc.) to text documents. The converted result text data than can be stored in a HIVE, HBase, SOLR and than can be used further for Data Analysis and Exploration. We will demonstrate how to Apache Spark to text mine the data.