前言
本文为组内同事整理,这里稍作改动记录。本文主要修改Spark源码,实现Spark Spark Thrift Server 注册到到ZK,通过ZK连接实现负载均衡;另外可以通过使用Kyuubi实现HA,这里不做详细描述
背景
- Spark ThriftServer 不支持Zookeeper连接,不能实现负载均衡。
- 解决方案:修改Spark ThriftServer源码,使其支持Zookeeper连接
- 版本信息:Spark3.1.2 ;Hive2.3.7
Spark ThriftServer 启动命令
1 | nohup spark-submit --master yarn --deploy-mode client --num-executors 2 --executor-cores 1 --executor-memory 1G --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name SparkThriftServer_1 spark-internal --hiveconf hive.server2.thrift.http.port=20003 >> /var/log/spark2/20003.log 2>&1 < /dev/null & |
Spark 源码debug
1 | ./bin/spark-submit --master yarn --deploy-mode client --num-executors 2 --executor-cores 1 --executor-memory 1G --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name SparkThriftServer_1 spark-internal --hiveconf hive.server2.thrift.http.port=20003 --driver-java-options "-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005" |
Spark 连接
单节点
1 | !connect jdbc:hive2://indata-1192-168-44-128:20003/default;principal=HTTP/indata-1192-168-44-128@INDATA.COM?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice; |
Zookeeper
1 | !connect jdbc:hive2://192-168-44-129.indata.com:2181,indata-1192-168-44-128:2181,indata-192-168-44-130.indata.com:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk |