通过故障状态检查来排查实例问题
如果您的实例未能通过状态检查,以下信息可帮助您解决问题。请首先确定您的应用程序是否存在任何问题。如果您验证的结果是实例没有按照预期运行应用程序,请查看状态检查信息和系统日志。
有关导致状态检查出现故障的问题示例,请参阅 实例的状态检查 。
目录
- 查看状态检查信息
- 检索系统日志
- 诊断基于 Linux 的实例的系统日志错误
- 内存不足:终止进程
- 错误:mmu_update 失败 (内存管理更新失败)
- I/O 错误(块储存设备故障)
- I/O 错误:既不是本地磁盘也不是远程磁盘(破损的分布式块储存设备)
- request_module:runaway loop modprobe (在较旧的 Linux 版本上循环旧内核 modprobe)
- “严重错误:内核太旧”和“fsck:在尝试打开 /dev 时没有此文件或目录”(内核与 AMI 不匹配)
- “FATAL: Could not load /lib/modules”或者“BusyBox”(内核模块缺失)
- ERROR:无效内核 (EC2 不兼容内核)
- fsck:尝试打开时没有找到此文件或目录... (未找到文件系统)
- 挂载文件系统时出现一般性错误(挂载失败)
- VFS:无法在未知块上挂载根 fs (根文件系统不匹配)
- 错误:无法确定根设备的主/次编号… (根文件系统/设备不匹配)
- XENBUS:设备没有驱动程序…
- … 没有检查时,已强制执行检查的工作日 (文件系统检查要求)
- fsck 卡在退出状态... (设备缺失)
- GRUB 提示 (grubdom>)
- 提起接口 eth0:设备 eth0 的 MAC 地址与预期不同,驳回。(硬编码的 MAC 地址)。
- 无法加载 SELinux 策略。计算机处于强制执行模式。正在中断。(SELinux 配置错误)
- XENBUS:连接设备时超时 (Xenbus 超时)
查看状态检查信息
使用 Amazon EC2 控制台调查受损实例
-
通过以下网址打开 Amazon EC2 控制台: https://console.aws.amazon.com/ec2/
-
在导航窗格中,选择 Instances (实例) ,然后选择您的实例。
在详细信息窗格中,选择 状态和警报 ,查看所有 系统状态检查 和 实例状态检查 的各项结果。
如果系统状态检查失败,您可以尝试以下一种选项:
创建实例恢复警报。有关更多信息,请参阅 创建停止、终止、重启或恢复实例的警报 。
如果您将实例类型更改为了基于 Nitro 系统 构建的实例,则当您在从没有所需的 ENA 和 NVMe 驱动程序的实例中迁移时,状态检查将失败。有关更多信息,请参阅 更改实例类型的兼容性 。
对于使用 Amazon EBS-backed AMI 的实例,停止并重启该实例。
对于使用实例存储支持的 AMI 的实例,可终止实例并启动替换实例。
等待 Amazon EC2 解决问题。
将您的问题发布到 AWS re:Post
如果您的实例位于 Auto Scaling 组中,则 Amazon EC2 Auto Scaling 服务会自动启动替换实例。有关更多信息,请参阅 Amazon EC2 Auto Scaling 用户指南 中的 Auto Scaling 实例的运行状况检查 。
检索系统日志并查找错误。
检索系统日志
如果实例状态检查失败,则您可以重启实例并检索系统日志。日志能够显示错误之处,从而帮助您诊断问题。重启可清除日志中不必要的信息。
重启实例并检索系统日志
-
通过以下网址打开 Amazon EC2 控制台: https://console.aws.amazon.com/ec2/
-
在导航窗格中,选择 Instances ,然后选择您的实例。
依次选择 Instance state (实例状态) 、 Reboot instance (启动实例) 。实例重启可能需要几分钟时间。
验证问题是否依然存在;在一些情况下,重启可以解决此问题。
如果实例位于
running
状态中,选择实例,依次选择 Actions (操作) 、 Monitor and troubleshoot (监控和故障排除) 、 Get system log (获取系统日志) 。查看屏幕上显示的日志,使用下面的已知系统日志错误语句列表来诊断问题。
如果您的问题没有得到解决,您可以将问题发布到 AWS re:Post
诊断基于 Linux 的实例的系统日志错误
对于无法通过实例状态检查的 Linux 实例,例如实例可到达性检查,请验证您是否按照上述步骤检索了系统日志。以下列表中包含一些常见的系统日志错误,还有一些建议您采取以解决此问题的针对性操作。
request_module:runaway loop modprobe (在较旧的 Linux 版本上循环旧内核 modprobe)
“严重错误:内核太旧”和“fsck:在尝试打开 /dev 时没有此文件或目录”(内核与 AMI 不匹配)
“FATAL: Could not load /lib/modules”或者“BusyBox”(内核模块缺失)
文件系统错误
提起接口 eth0:设备 eth0 的 MAC 地址与预期不同,驳回。(硬编码的 MAC 地址)。
无法加载 SELinux 策略。计算机处于强制执行模式。正在中断。(SELinux 配置错误)
内存不足:终止进程
指示内存不足错误的系统日志条目与下方显示的内容类似。
[115879.769795]
Out of memory: kill process
20273 (httpd) score 1285879 or a child [115879.769795] Killed process 1917 (php-cgi) vsz:467184kB, anon- rss:101196kB, file-rss:204kB建议采取的措施
对于此实例类型 请执行该操作表示内存管理更新故障的系统日志条目与以下示例类似:
Press `ESC' to enter the menu... 0 [H[J Booting 'Amazon Linux 2011.09 (2.6.35.14-95.38.amzn1.i686)' root (hd0) Filesystem type is ext2fs, using whole disk kernel /boot/vmlinuz-2.6.35.14-95.38.amzn1.i686 root=LABEL=/ console=hvc0 LANG= en_US.UTF-8 KEYTABLE=us initrd /boot/initramfs-2.6.35.14-95.38.amzn1.i686.imgERROR: mmu_update failed with rc=-22
Amazon Linux 的问题
建议采取的措施
将您的问题发布到 开发人员论坛
I/O 错误(块储存设备故障)
表示输入/输出错误的系统日志条目类似于以下示例:
[9943662.053217] end_request:
I/O error
, dev sde,sector 52428288
[9943664.191262] end_request: I/O error, dev sde, sector 52428168 [9943664.191285] Buffer I/O error on device md0, logical block 209713024 [9943664.191297] Buffer I/O error on device md0, logical block 209713025 [9943664.191304] Buffer I/O error on device md0, logical block 209713026 [9943664.191310] Buffer I/O error on device md0, logical block 209713027 [9943664.191317] Buffer I/O error on device md0, logical block 209713028 [9943664.191324] Buffer I/O error on device md0, logical block 209713029 [9943664.191332] Buffer I/O error on device md0, logical block 209713030 [9943664.191339] Buffer I/O error on device md0, logical block 209713031 [9943664.191581] end_request: I/O error, dev sde, sector 52428280 [9943664.191590] Buffer I/O error on device md0, logical block 209713136 [9943664.191597] Buffer I/O error on device md0, logical block 209713137 [9943664.191767] end_request: I/O error, dev sde, sector 52428288 [9943664.191970] end_request: I/O error, dev sde, sector 52428288 [9943664.192143] end_request: I/O error, dev sde, sector 52428288 [9943664.192949] end_request: I/O error, dev sde, sector 52428288 [9943664.193112] end_request: I/O error, dev sde, sector 52428288 [9943664.193266] end_request: I/O error, dev sde, sector 52428288表示设备的输入/输出错误的系统日志条目类似于以下示例:
block drbd1: Local IO failed in request_timer_fn. Detaching... Aborting journal on device drbd1-8. block drbd1:IO ERROR: neither local nor remote disk
Buffer I/O error on device drbd1, logical block 557056 lost page write due to I/O error on drbd1 JBD2: I/O error detected when updating journal superblock for drbd1-8.request_module:runaway loop modprobe (在较旧的 Linux 版本上循环旧内核 modprobe)
表示此条件的系统日志类似于下方显示的示例。使用不稳定或陈旧的 Linux 内核 (如 2.6.16-xenU) 可能会在启动时导致无法终止的循环环境。
Linux version
2.6.16-xenU
([email protected]) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Mon May 28 03:41:49 SAST 2007 BIOS-provided physical RAM map: Xen: 0000000000000000 - 0000000026700000 (usable) 0MB HIGHMEM available.request_module: runaway loop modprobe binfmt-464c
request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c建议采取的措施
对于此实例类型 请执行该操作表示此条件的系统日志类似于下方显示的示例。
Linux version 2.6.16.33-xenU ([email protected]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #2 SMP Wed Aug 15 17:27:36 SAST 2007
FATAL: kernel too old
Kernel panic - not syncing: Attempted to kill init!不可兼容的内核和用户空间
建议采取的措施
对于此实例类型 请执行该操作表示此条件的系统日志类似于下方显示的示例。
[ 0.370415] Freeing unused kernel memory: 1716k freed Loading, please wait... WARNING: Couldn't open directory /lib/modules/2.6.34-4-virtual: No such file or directory FATAL: Could not open /lib/modules/2.6.34-4-virtual/modules.dep.temp for writing: No such file or directory FATAL: Could not load /lib/modules/2.6.34-4-virtual/modules.dep: No such file or directory Couldn't get a file descriptor referring to the console Begin: Loading essential drivers... ... FATAL: Could not load /lib/modules/2.6.34-4-virtual/modules.dep: No such file or directory FATAL: Could not load /lib/modules/2.6.34-4-virtual/modules.dep: No such file or directory Done. Begin: Running /scripts/init-premount ... Done. Begin: Mounting root file system... ... Begin: Running /scripts/local-top ... Done. Begin: Waiting for root file system... ... Done. Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) FATAL: Could not load /lib/modules/2.6.34-4-virtual/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/
2.6.34-4-virtual/modules.dep: No such file or directory ALERT! /dev/sda1 does not exist. Dropping to a shell!BusyBox
v1.13.3 (Ubuntu 1:1.13.3-1ubuntu5) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs)以下一个或多个条件可能会导致此问题:
ERROR Invalid kernel: elf_xen_note_check: ERROR: Will only load images built for the generic loader or Linux images xc_dom_parse_image returned -1
Error 9: Unknown boot failure Booting 'Fallback' root (hd0) Filesystem type is ext2fs, using whole disk kernel /vmlinuz.old root=/dev/sda1 ro Error 15: File not found以下一个或两个条件都可能会导致此问题:
Loading ohci-hcd.ko module Loading uhci-hcd.ko module USB Universal Host Controller Interface driver v3.0 Loading mbcache.ko module Loading jbd.ko module Loading ext3.ko module Creating root device. Mounting root filesystem. kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Setting up other filesystems. Setting up new root fs no fstab.sys, mounting internal defaults Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys mountall:/proc: unable to mount: Device or resource busy mountall:/proc/self/mountinfo: No such file or directory mountall: root filesystem isn't mounted init: mountall main process (221) terminated with status 1Welcome to Fedora Press 'I' to enter interactive startup. Setting clock : Wed Oct 26 05:52:05 EDT 2011 [ OK ] Starting udev: [ OK ] Setting hostname localhost: [ OK ] No devices found Setting up Logical Volume Management: File descriptor 7 left open No volume groups found [ OK ] Checking filesystems Checking all file systems. [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda1 /dev/sda1: clean, 82081/1310720 files, 2141116/2621440 blocks [/sbin/fsck.ext3 (1) -- /mnt/dbbackups] fsck.ext3 -a /dev/sdh
fsck
.ext3:No such file or directory
while trying to open /dev/sdh /dev/sdh: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> [FAILED] *** An error occurred during the file system check. *** Dropping you to a shell; the system will reboot *** when you leave the shell. Give root password for maintenance (or type Control-D to continue):General error mounting filesystems
. A maintenance shell will now be started. CONTROL-D will terminate this shell and re-try. Press enter for maintenance (or type Control-D to continue):表示此条件的系统日志类似于下方显示的示例。
XENBUS: Device with no driver: device/vif/0 XENBUS: Device with no driver: device/vbd/2048 drivers/rtc/hctosys.c: unable to open rtc device (rtc0) Initializing network drop monitor service Freeing unused kernel memory: 508k freed :: Starting udevd... done. :: Running Hook [udev] :: Triggering uevents...<30>udevd[65]: starting version 173 done. Waiting 10 seconds for device /dev/xvda1 ... Root device '/dev/xvda1' doesn't exist. Attempting to create it.Linux version 2.6.16-xenU ([email protected]) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Mon May 28 03:41:49 SAST 2007 Kernel command line: root=/dev/sda1 ro 4 Registering block device major 8
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)
ERROR: Unable to determine major/minor number of root device '/dev/xvda1'
. You are being dropped to a recovery shell Type 'exit' to try and continue booting sh: can't access tty; job control turned off[ramfs /]#
表示此条件的系统日志类似于下方显示的示例。
Checking filesystems Checking all file systems. [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda1 /dev/sda1 has gone 361XENBUS: Device with no driver: device/vbd/2048 drivers/rtc/hctosys.c: unable to open rtc device (rtc0) Initializing network drop monitor service Freeing unused kernel memory: 508k freed :: Starting udevd... done. :: Running Hook [udev] :: Triggering uevents...<30>udevd[65]: starting version 173 done. Waiting 10 seconds for device /dev/xvda1 ... Root device '/dev/xvda1' doesn't exist. Attempting to create it.
ERROR: Unable to determine major/minor number of root device '/dev/xvda1'.
You are being dropped to a recovery shell Type 'exit' to try and continue booting sh: can't access tty; job control turned off[ramfs /]#
days without being checked, check forced
文件系统检查时间已过;正在强制执行文件系统检查。
建议采取的措施
Activating lvm and md swap...done. Checking file systems...fsck from util-linux-ng 2.16.2 /sbin/fsck.xfs: /dev/sdh does not existfsck died with exit status
8 [31mfailed (code 8).[39;49mGNU GRUB version 0.97 (629760K lower / 0K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ]
grubdom>
使用位于标准位置 (/boot/grub/menu.lst) 的 GRUB 配置文件创建新 AMI。
选择合适的 GRUB 映像 (hd0 – 第一个磁盘或 hd00 – 第一个磁盘,第一个分区)。
验证您的 GRUB 版本支持基础文件系统类型,并根据需要升级 GRUB。
终止实例并使用您创建的 AMI 启动新实例。
选项 2:终止此实例并启动新实例,指定正确的内核。
注意
要从现有实例恢复数据,请联系 AWS Support
Bringing up interface eth0: Device eth0 has different MAC address than expected, ignoring. [FAILED]
Starting auditd: [ OK ]AMI 配置中存在硬编码接口 MAC
建议采取的措施
对于此实例类型 请执行该操作表示此条件的系统日志类似于下方显示的示例。
audit(1313445102.626:2): enforcing=1 old_enforcing=0 auid=4294967295
Unable to load SELinux Policy. Machine is in enforcing mode. Halting now. Kernel panic - not syncing: Attempted to kill init!
SELinux 已在错误的情况下启动:
在挂载的根卷上禁用 SELinux。此过程因 Linux 分配而异;有关更多信息,请参阅特定于操作系统的文档。
注意
在某些系统上,可通过在
SELINUX=disabled
文件中设置/
来禁用 SELinux (其中,mount_point
/etc/sysconfig/selinuxmount_point
从恢复实例卸载和分离根卷并将该根卷重新附加到原始实例。
启动实例。
表示此条件的系统日志类似于下方显示的示例。
Linux version 2.6.16-xenU ([email protected]) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Mon May 28 03:41:49 SAST 2007
XENBUS: Timeout connecting to devices!
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
-