前些天发现了一个巨牛的人工智能学习网站,通俗易懂,风趣幽默,忍不住给大家分享一下。点击跳转到网站:https://www.captainai.net/dongkelun
前言
本文为组内同事整理,这里稍作改动记录。本文主要修改Spark源码,实现Spark Spark Thrift Server 注册到到ZK,通过ZK连接实现负载均衡;另外可以通过使用Kyuubi实现HA,这里不做详细描述
背景
- Spark ThriftServer 不支持Zookeeper连接,不能实现负载均衡。
- 解决方案:修改Spark ThriftServer源码,使其支持Zookeeper连接
- 版本信息:Spark3.1.2 ;Hive2.3.7
Spark ThriftServer 启动命令
1 | nohup spark-submit --master yarn --deploy-mode client --num-executors 2 --executor-cores 1 --executor-memory 1G --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name SparkThriftServer_1 spark-internal --hiveconf hive.server2.thrift.http.port=20003 >> /var/log/spark2/20003.log 2>&1 < /dev/null & |
Spark 源码debug
1 | ./bin/spark-submit --master yarn --deploy-mode client --num-executors 2 --executor-cores 1 --executor-memory 1G --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name SparkThriftServer_1 spark-internal --hiveconf hive.server2.thrift.http.port=20003 --driver-java-options "-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005" |
Spark 连接
单节点
1 | !connect jdbc:hive2://indata-1192-168-44-128:20003/default;principal=HTTP/indata-1192-168-44-128@INDATA.COM?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice; |
Zookeeper
1 | !connect jdbc:hive2://192-168-44-129.indata.com:2181,indata-1192-168-44-128:2181,indata-192-168-44-130.indata.com:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk |
Zookeeper 使用
1 | 连接: |
Spark 源码修改
pom
添加 sql/hive-thriftserver 模块,使其编译,便于修改代码
1 | <module>sql/hive-thriftserver</module> |
HiveThriftServer2
修改包路径:
sql\hive-thriftserver\src\main\scala\org\apache\spark\sql\hive\thriftserver\HiveThriftServer2.scala
源码
添加zookeeper支持,增加hiveConf,如果开启zookeeper,通过反射调用addServerInstanceToZooKeeper
、removeServerInstanceFromZooKeeper
方法
1 | private[hive] class HiveThriftServer2(sqlContext: SQLContext) |
HiveServer2
包路径:
spark\sql\hive-thriftserver\src\main\java\org\apache\hive\service\server\HiveServer2.java
由于spark不支持zookeeper,需要添加addServerInstanceToZooKeeper
、removeServerInstanceFromZooKeeper
方法
1 |
|
Spark 编译
- apache-maven-3.6.3
- scala-2.12.15
全部编译
1 | ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.4 -Phive -Phive-thriftserver -Dscala-2.12 -DskipTests clean package |
指定模块编译
1 | ./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests -pl sql/hive-thriftserver -am clean package |
编译过程中遇到的问题
1 | [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.3.0:compile (scala-compile-first) on project spark-hive-thriftserver_2.12: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.3.0:compile failed: java.lang.AssertionError: assertion failed: Expected protocol to be 'file' or empty in URI jar:file:/D:/Repositories/Maven/org/apache/zookeeper/zookeeper/3.4.14/zookeeper-3.4.14.jar!/org/apache/zookeeper/ZooDefs$Ids.class -> [Help 1] |
验证
替换 spark-hive-thriftserver_2.12-3.1.2.jar
将编译好的包 spark-hive-thriftserver_2.12-3.1.2.jar,替换到 Spark 安装路径 jars/ 下,启动 Spark
修改 Spark conf 下 hive-site 配置
添加:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2_zk</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>192-168-44-129.indata.com:2181,indata-1192-168-44-128:2181,indata-192-168-44-130.indata.com:2181</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
启动 Spark ThriftServer
1 | spark-submit --master yarn --deploy-mode client --num-executors 2 --executor-cores 1 --executor-memory 1G --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name SparkThriftServer_1 spark-internal --hiveconf hive.server2.thrift.http.port=20004 |
连接验证
1 | [root@indata-192-168-44-128 spark-3.1.2]./bin/beeline |
验证过程中遇到的问题
zookeeper解析不到IP
1 | 22/02/11 09:08:46 WARN HiveConnection: Failed to connect to :20004 |
解决方案
通过 Hive JDBC Connector 代码连接zookeeper地址进行debug,发现 hive.server2.thrift.bind.host 值为空,
同时,zookeeper命令查看节点值:
1 | [zk: indata-1192-168-44-128:2181(CONNECTED) 5] get /hiveserver2_zk/serverUri=indata-1192-168-44-128:20004;version=2.3.7;sequence=0000000023 |
hive.server2.thrift.bind.host值同样为空,定位到ThriftServer注册到zookeeper时产生问题
zookeeper注册代码:
1 | private void addConfsToPublish(HiveConf hiveConf, Map<String, String> confsToPublish) { |
代码
完整代码已提交到:https://gitee.com/dongkelun/spark/tree/3.1.2-STS-HA/