安全计算模式(secure computing mode,seccomp)是 Linux 内核功能,可以使用它来限制容器内可用的操作。

Docker 的默认 seccomp 配置文件是一个白名单,它指定了允许的调用。

下表列出了由于不在白名单而被有效阻止的重要(但不是全部)系统调用。该表包含每个系统调用被阻止的原因。

SyscallDescription
acctAccounting syscall which could let containers disable their own resource limits or process accounting. Also gated by CAP_SYS_PACCT.
add_keyPrevent containers from using the kernel keyring, which is not namespaced.
adjtimexSimilar to clock_settime and settimeofday, time/date is not namespaced. Also gated by CAP_SYS_TIME.
bpfDeny loading potentially persistent bpf programs into kernel, already gated by CAP_SYS_ADMIN.
clock_adjtimeTime/date is not namespaced. Also gated by CAP_SYS_TIME.
clock_settimeTime/date is not namespaced. Also gated by CAP_SYS_TIME.
cloneDeny cloning new namespaces. Also gated by CAP_SYS_ADMIN for CLONE_* flags, except CLONE_USERNS.
create_moduleDeny manipulation and functions on kernel modules. Obsolete. Also gated by CAP_SYS_MODULE.
delete_moduleDeny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
finit_moduleDeny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
get_kernel_symsDeny retrieval of exported kernel and module symbols. Obsolete.
get_mempolicySyscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
init_moduleDeny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
iopermPrevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO.
ioplPrevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO.
kcmpRestrict process inspection capabilities, already blocked by dropping CAP_PTRACE.
kexec_file_loadSister syscall of kexec_load that does the same thing, slightly different arguments. Also gated by CAP_SYS_BOOT.
kexec_loadDeny loading a new kernel for later execution. Also gated by CAP_SYS_BOOT.
keyctlPrevent containers from using the kernel keyring, which is not namespaced.
lookup_dcookieTracing/profiling syscall, which could leak a lot of information on the host. Also gated by CAP_SYS_ADMIN.
mbindSyscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
mountDeny mounting, already gated by CAP_SYS_ADMIN.
move_pagesSyscall that modifies kernel memory and NUMA settings.
name_to_handle_atSister syscall to open_by_handle_at. Already gated by CAP_SYS_NICE.
nfsservctlDeny interaction with the kernel nfs daemon. Obsolete since Linux 3.1.
open_by_handle_atCause of an old container breakout. Also gated by CAP_DAC_READ_SEARCH.
perf_event_openTracing/profiling syscall, which could leak a lot of information on the host.
personalityPrevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns.
pivot_rootDeny pivot_root, should be privileged operation.
process_vm_readvRestrict process inspection capabilities, already blocked by dropping CAP_PTRACE.
process_vm_writevRestrict process inspection capabilities, already blocked by dropping CAP_PTRACE.
ptraceTracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping CAP_PTRACE.
query_moduleDeny manipulation and functions on kernel modules. Obsolete.
quotactlQuota syscall which could let containers disable their own resource limits or process accounting. Also gated by CAP_SYS_ADMIN.
rebootDon’t let containers reboot the host. Also gated by CAP_SYS_BOOT.
request_keyPrevent containers from using the kernel keyring, which is not namespaced.
set_mempolicySyscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
setnsDeny associating a thread with a namespace. Also gated by CAP_SYS_ADMIN.
settimeofdayTime/date is not namespaced. Also gated by CAP_SYS_TIME.
socket, socketcallUsed to send or receive packets and for other socket operations. All socket and socketcall calls are blocked except communication domains AF_UNIX, AF_INET, AF_INET6, AF_NETLINK, and AF_PACKET.
stimeTime/date is not namespaced. Also gated by CAP_SYS_TIME.
swaponDeny start/stop swapping to file/device. Also gated by CAP_SYS_ADMIN.
swapoffDeny start/stop swapping to file/device. Also gated by CAP_SYS_ADMIN.
sysfsObsolete syscall.
_sysctlObsolete, replaced by /proc/sys.
umountShould be a privileged operation. Also gated by CAP_SYS_ADMIN.
umount2Should be a privileged operation. Also gated by CAP_SYS_ADMIN.
unshareDeny cloning new namespaces for processes. Also gated by CAP_SYS_ADMIN, with the exception of unshare –user.
uselibOlder syscall related to shared libraries, unused for a long time.
userfaultfdUserspace page fault handling, largely needed for process migration.
ustatObsolete syscall.
vm86In kernel x86 real mode virtual machine. Also gated by CAP_SYS_ADMIN.
vm86oldIn kernel x86 real mode virtual machine. Also gated by CAP_SYS_ADMIN.

其中gdb在进行进程debug时,会报错:

(gdb) attach 30721
Attaching to process 30721

ptrace: Operation not permitted.

原因就是因为ptrace被Docker默认禁止的问题。考虑到应用分析的需要,可以有以下几种方法解决:

1、关闭seccomp

docker run --security-opt seccomp=unconfined

2、采用超级权限模式

docker run --privileged

3、仅开放ptrace限制

docker run --cap-add sys_ptrace

当然从安全角度考虑,如只是想使用gdb进行debug的话,建议使用第三种。

小提示:

如果使用marathon进行docker的管理,应用JSON的修改在这里:

"parameters": [
        {
          "key": "cap-add",
          "value": "SYS_PTRACE"
        }
      ],


Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐