记一个android R上开机启动vendor.boot-hal-1-1进程启动失败的过程分析,总结一下下,也给需要的提供个参考。

问题:
在开机启动过程中,一直报错,vendor.boot-hal-1-1无法正常启动。

[   18.037464]  {1}[1:init]init: starting service 'vendor.boot-hal-1-1'...
[   18.040238]  {1}[1:init]init: Control message: Processed ctl.interface_start for 'android.hardware.boot@1.0::IBootControl/default' from pid: 2387 (/system/bin/hwservicemanager)
[   18.040622]  {1}[1:init]init: Control message: Processed ctl.interface_start for 'android.hardware.boot@1.0::IBootControl/default' from pid: 2387 (/system/bin/hwservicemanager)
[   18.071197]  {1}[1:init]init: Service 'vendor.boot-hal-1-1' (pid 2507) 333 exited with status 1
[   18.071225]  {1}[1:init]init: Sending signal 9 to service 'vendor.boot-hal-1-1' (pid 2507) process group...
[   18.071506]  {1}[1:init]libprocessgroup: Successfully killed process cgroup uid 0 pid 2507 in 0ms
[   18.096360]  {1}[1:init]init: Service 'bpfloader' (pid 2502) 333 exited with status 0 oneshot service took 0.149000 seconds in background
[   18.096385]  {1}[1:init]init: Sending signal 9 to service 'bpfloader' (pid 2502) process group...
[   18.096593]  {1}[1:init]libprocessgroup: Successfully killed process cgroup uid 0 pid 2502 in 0ms
[   19.159220]  {2}[2503:update_verifier]HidlServiceManagement: Waited one second for android.hardware.boot@1.0::IBootControl/default
[   19.173529]  {2}[2503:update_verifier]HidlServiceManagement: getService: Trying again for android.hardware.boot@1.0::IBootControl/default...
[   19.174937]  {3}[1:init]init: starting service 'vendor.boot-hal-1-1'...
[   19.180231]  {3}[1:init]init: Control message: Processed ctl.interface_start for 'android.hardware.boot@1.0::IBootControl/default' from pid: 2387 (/system/bin/hwservicemanager)
[   19.214810]  {3}[1:init]init: Service 'vendor.boot-hal-1-1' (pid 2509) 333 exited with status 1
[   19.224638]  {3}[1:init]init: Sending signal 9 to service 'vendor.boot-hal-1-1' (pid 2509) process group...
[   19.235949]  {3}[1:init]libprocessgroup: Successfully killed process cgroup uid 0 pid 2509 in 0ms
[   20.213341]  {2}[2503:update_verifier]HidlServiceManagement: Waited one second for android.hardware.boot@1.0::IBootControl/default
[   20.227637]  {2}[2503:update_verifier]HidlServiceManagement: getService: Trying again for android.hardware.boot@1.0::IBootControl/default...

看起来是vendor.boot-hal-1-1这个服务起来后,很快就异常了,然后被kill掉了。

如何分析定位?
因为这个log一直打印刷屏,看着有点烦,可以通过如下命令把log打印关闭

echo 0 > /proc/sys/kernel/printk

好了,现在串口不疯狂打印log了,可以借助strace进行定位,命令strace 进程名,当然也可以用strace -p 进程pid

strace /vendor/bin/hw/android.hardware.boot@1.1-service

strace后打印的部分内容如下:

openat(AT_FDCWD, "/vendor/etc/fstab.xxx", O_RDONLY|O_CLOEXEC) = 6
writev(5, [{iov_base="\0\241\nNb\370`\266\236)#", iov_len=11}, {iov_base="\4", iov_len=1}, {iov_base="android.hardware.boot@1.1-servic"..., iov_len=34}, 
{iov_base="[libfs_mgr]ReadDefaultFstab  "..., iov_len=81}], 4) = 127
writev(5, [{iov_base="\0\241\nNb\370`eU8#", iov_len=11}, {iov_base="\6", iov_len=1}, {iov_base="android.hardware.boot@1.1-servic"..., iov_len=34}, 
{iov_base="Could not find bootloader messag"..., iov_len=79}], 4) = 125
openat(AT_FDCWD, "/dev/pmsg0", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
writev(5, [{iov_base="\0\241\nNb\370`/\310E#", iov_len=11}, {iov_base="\6", iov_len=1}, {iov_base="android.hardware.boot@1.1-impl\0", iov_len=31}, 
{iov_base="Could not initialize BootControl"..., iov_len=40}], 4) = 83
close(6)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

根据这里面的打印内容“Could not initialize BootControl”,去代码中找吧,因为是HIDL的进程,直接去android\hardware\interfaces下面检索,最后定位到在android/hardware/interfaces/boot/1.1/default/boot_control/libboot_control.cpp

bool BootControl::Init() {
 std::string device = get_bootloader_message_blk_device(&err);
	if (device.empty()) {
	LOG(ERROR) << "**Could not find bootloader message block device**: " << err;
	return false;
	}
	...
}

继续追一下get_bootloader_message_blk_device的实现

 std::string get_bootloader_message_blk_device(std::string* err) {
	std::string misc_blk_device = get_misc_blk_device(err);
    if (misc_blk_device.empty()) return "";
    if (!wait_for_device(misc_blk_device, err)) return "";
        return misc_blk_device;
    }

继续追get_misc_blk_device的实现,从下面的代码基本上可以断定是从fstab中找misc分区节点配置了。

 std::string get_misc_blk_device(std::string* err) {
    if (g_misc_device_for_test.has_value() && !g_misc_device_for_test->empty()) {
      return *g_misc_device_for_test;
    }
    Fstab fstab;
    if (!ReadDefaultFstab(&fstab)) {
      *err = "failed to read default fstab";
      return "";
    }
    for (const auto& entry : fstab) {
      if (entry.mount_point == "/misc") {
        return entry.blk_device;
      }
    }
  
    *err = "failed to find /misc partition";
    return "";
  }

回头一看,果然是fstab.xxx中没有配置misc分区,添加上

/dev/block/by-name/misc  /misc            emmc    defaults            defaults

重新编译,果然就不再报这个问题了。
现在回头来看一下vendor.boot-hal-1-1启动的是啥进程,看了一下相关的bp和代码,这是打开ENABLE_AB = true后,用来OTA A/B升级完成时更新slot信息的进程和相关接口,为了告诉系统应用引导哪个slot a还b镜像。
好了,就分析到这吧。

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐