在我做数仓项目的时候,通过Sqoop写入脚本向HDFS传入数据的时候,报了如标题所示的错误,以下是错误信息:

 通过上面的两张图片不难看出,主要是存在两个问题

1.Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-572947236  (找不到HDFS的块——>上网搜寻后大多都是说块损坏)

于是我使用相应查看状态操作查看块

 hdfs fsck /tmp/hadoop-yarn/staging/root/.staging/job_1659322766804_0001/libjars/opencsv-2.3.jar

结果显示:块状态是健康的,即并没有块损坏这一说法

The filesystem under path '/tmp/hadoop-yarn/staging/root/.staging/job_1659322766804_0001/libjars/opencsv-2.3.jar' is HEALTHY

 所以报错的原因很可能是第一张图片的问题,导致了第二张图片的接连报错

2.Application application_xxx failed 2 times due to AM Container for attempt_xxx exited withexitCode: (上网搜寻后发现是配置出现了问题,主要是mapred-site.xml 与 yarn-site.xml)

我的主要是mapred-site.xml 出现了问题

yarn-site.xml

<?xml version="1.0"?>
<configuration>
    <!-- 指定yarn的shuffle技术-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
    <!-- 指定resourcemanager的主机名-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <!--配置resourcemanager的内部通讯地址-->
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>hadoop01:8032</value>
    </property>
    <!--配置resourcemanager的scheduler的内部通讯地址-->
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>hadoop01:8030</value>
    </property>
    <!--配置resoucemanager的资源调度的内部通讯地址-->
 <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>hadoop01:8031</value>
    </property>
    <!--配置resourcemanager的管理员的内部通讯地址-->
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>hadoop01:8033</value>
    </property>
    <!--配置resourcemanager的web ui 的监控页面-->
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>hadoop01:8088</value>
    </property>
     <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
<!-- 日志信息保存在文件系统上的最长时间,单位为秒-->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>640800</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/hadoop-3.1.3/etc/hadoop:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/common/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn:/usr/local/hadoop-3.1.3/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn/*</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
</configuration>

主要的问题所在就在于yarn.application.classpath。我们同样因为Sqoop走的就是mapreduce,所以我们必须在mapreduce上面配置好对应的yarn.application.classpath。

以下是Sqoop的基本工作流程,可以看到Sqoop通过客户端接收到的命令通过Task Translater后转换为mapreduce相关任务

 mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- 指定mapreduce使用yarn资源管理器-->
    <property>        
        <name>mapred.job.tracker</name>                  
        <value>hadoop01:9001</value>       
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!-- 配置作业历史服务器的地址-->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>
    <!-- 配置作业历史服务器的http地址-->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>
 <property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/hadoop-3.1.3/etc/hadoop:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/common/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn:/usr/local/hadoop-3.1.3/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn/*</value>
    </property>
</configuration>

主要就是加上yarn.application.classpath就ok了

最后如下图可以看到,问题完美解决!!!

 

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐