[wyh@hadoop1002 spark]$ ************************************************* ERROR : FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to execute spark task, with exception ‘org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 2df0eb9a-15b4-4d81-aea1-24b12094bf44)’ FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 2df0eb9a-15b4-4d81-aea1-24b12094bf44 [wyh@hadoop1002 spark]$ ./sbin/start-all.sh 
内存资源不足,导致hive连接spark客户端超时。
可以选择在配置文件里增加executor内存或减少每个executor的线程数等
按照所需时间查看hive日志:
默认路径/tmp/${user.name}/hive.log(具体根据自己的情况路径查看)
如果它
提示timed out waiting for client connection. 详细: Caused by: java.lang.RuntimeException:java.util.concurrent.ExecutionException:java.util.concurrent.TimeoutException:Timed out waiting for client connection. 表示hive与spark连接时间超时!
1). 将/opt/module/spark/conf/目录下spark-env.sh.template文件改成spark-env.sh,
之后添加内容:
export SPARK_DIST_CLASSPATH=$(hadoop classpath); 2). 将/opt/module/hive/conf目录下hive-site.xml修改hive和spark的连接时间
spark.yarn.jars hdfs://hadoop102:8020/spark-jars/* hive.execution.engine spark 8).在 hive/conf/hive-site.xml中追加:
hive.spark.client.connect.timeout 100000ms 然后重新启动hive服务端,执行insert
hive (default)> insert into table student values(1,'abc'); Query ID = hadoop_20220728201636_11b37058-89dc-4050-a4bf-1dcf404bd579 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Running with YARN Application = application_1659005322171_0009 Kill Command = /datafs/module/hadoop-3.1.3/bin/yarn application -kill application_1659005322171_0009 Hive on Spark Session Web UI URL: http://hadoop104:38030 Query Hive on Spark job[0] stages: [0, 1] Spark job[0] status = RUNNING -------------------------------------------------------------------------------------- STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED -------------------------------------------------------------------------------------- Stage-0 ........ 0 FINISHED 1 1 0 0 0 Stage-1 ........ 0 FINISHED 1 1 0 0 0 -------------------------------------------------------------------------------------- STAGES: 02/02 [==========================>>] 100% ELAPSED TIME: 40.06 s -------------------------------------------------------------------------------------- Spark job[0] finished successfully in 40.06 second(s) WARNING: Spark Job[0] Spent 16% (3986 ms / 25006 ms) of task time in GC Loading data to table default.student OK col1 col2 Time taken: 127.46 seconds hive (default)> 无异常报错,解决
说明:因为执行insert语句运行速度会很慢,所以需要耐心等待一会,如果依次执行不成功可以重新多试几次,本人亲测,重新执行几次真的会成功,好像存在概率问题,很玄乎。
本篇文章是记录我遇到的bug,怕以后遇到忘记怎么解决,记录自己的”遇坑“之路吧