Pages

Tuesday, December 1, 2015

SparkSql Federated

Spark SQL comes with a nice feature called: “JDBC to other Databases”, but, it practice, it’s JDBC federation feature.

Example below using sandbox 2.3.2 and spark 1.5.1 TP (https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/):

1- Run Spark SQL Thrift Server with mysql jdbc driver:

[root@sandbox incubator-zeppelin]# /root/dev/spark-1.5.1-bin-hadoop2.6/sbin//start-thriftserver.sh --hiveconf hive.server2.thrift.port=10010 --jars "/usr/share/java/mysql-connector-java.jar"

2- Open beeline and connect to Spark SQL Thrift Server:

beeline -u "jdbc:hive2://localhost:10010/default" -n admin

3- Create a jdbc federated table pointing to existing mysql database, using beeline:

CREATE TABLE mysql_federated_sample
USING org.apache.spark.sql.jdbc
OPTIONS (
  driver "com.mysql.jdbc.Driver",
  url "jdbc:mysql://localhost/hive?user=hive&password=hive",
  dbtable "TBLS"
);
describe mysql_federated_sample;
select * from mysql_federated_sample;
select count(1) from mysql_federated_sample;

Code below using spark-shell, scala code and data frames.

1- Open spark-shell with mysql jdbc driver

/root/dev/spark-1.5.1-bin-hadoop2.6/bin/spark-shell  --jars "/usr/share/java/mysql-connector-java.jar"

2- Create a data frame pointing to mysql table

val jdbcDF = sqlContext.read.format("jdbc").options( 
  Map(
  "driver" -> "com.mysql.jdbc.Driver",
  "url" -> "jdbc:mysql://localhost/hive?user=hive&password=hive",
  "dbtable" -> "TBLS"
  )
).load()

jdbcDF.show

Written with StackEdit.