Alexandros Papadopoulos apapadop at
Tue Jan 4 11:39:31 EET 2005

On Tuesday 04 January 2005 11:06, Antonis Valakas wrote:
> >To epanefera, kai einai idio me to binary pou efera apo to allo
> >mhxanhma. Dhladh diaforetiko MD5 apo to binary poy yphrxe sth 8esh
> > tou (kai petage segfaults), alla idio bytecount kai strings...
> >helios:~# ls -l /usr/bin/perl*
> >-rwxr-xr-x  1 root root 1057324 Jan  3 19:07 /usr/bin/perl
> >-rwxr-xr-x  2 root root 1057324 Oct 24 19:37 /usr/bin/perl5.8.4
> >-rwxr-xr-x  2 root root 1057324 Oct 24 19:37 /usr/bin/perl_SUSPECT
> ><snip>
> >
> >-A
> Δοκίμασε:
> # cmp -l <suspect binary> <vanilla binary>
> και δες αν κάποιο συνεχόμενο byte block έχει διαφορά. Πιθανόν να
> είναι filesystem corruption ή disk failure (οι ΑΤΑ δίσκοι δεν έχουν
> data checksumming όπως οι SCSI).


helios:~# cmp -l /usr/bin/perl /usr/bin/perl_SUSPECT
1055193 377 337
helios:~# cmp -l /usr/bin/find /usr/bin/find_SUSPECT
49561 377 337
helios:~# cmp -l /bin/tar /bin/tar_SUSPECT
163993 377 337

Τι μας λέει αυτό;

Το filesystem είναι ext3 πάνω σε software RAID-1, με τη σειρά του πάνω 
σε δύο πανομοιότυπους δίσκους Serial ATA. Έχω κάνει fsck.ext3 -f -V το 
filesystem και δεν ανέφερε κάποιο πρόβλημα. Εκτός αυτού, έχω κοιτάξει 
τους δίσκους με smartctl -t long και ενώ αναφέρουν κάποια (όχι 
critical) errors, το overalll health αναφέρεται ως ΟΚ. Κάνω attach το 
output του smartctl -a /dev/hd[e|g] σε περίπτωση που ξέρει κανείς κατά 
πόσο εξηγούνται αυτά που βλέπω από τα λάθη που αναφέρονται σε επίπεδο 


helios:~# smartctl -a /dev/hde
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is

Device Model:     Maxtor 6Y080M0
Serial Number:    Y3MEEM2E
Firmware Version: YAR51EW0
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Tue Jan  4 11:35:06 2005 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 182) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  40) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  3 Spin_Up_Time            0x0027   194   193   063    Pre-fail  Always       -       12908
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       103
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   251   244   187    Pre-fail  Always       -       58202
  9 Power_On_Minutes        0x0032   251   251   000    Old_age   Always       -       809h+18m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       128
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       30
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       5001
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   197   000    Old_age   Offline      -       2
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       21
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   202   196   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 164 hours (6 days + 20 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 40 40 a4 b4 e6  Error: ICRC, ABRT at LBA = 0x06b4a440 = 112501824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 03 80 00 a4 b4 e6 00      00:09:00.928  WRITE DMA
  ca 03 80 80 a3 b4 e6 00      00:09:00.912  WRITE DMA
  ca 03 80 00 a3 b4 e6 00      00:09:00.912  WRITE DMA
  ca 03 80 80 a2 b4 e6 00      00:09:00.912  WRITE DMA
  ca 03 80 00 a2 b4 e6 00      00:09:00.912  WRITE DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       741         -
# 2  Extended offline    Interrupted (host reset)      10%       740         -

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
helios:~# smartctl -a /dev/hdg
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is

Device Model:     Maxtor 6Y080M0
Serial Number:    Y3MEEKQE
Firmware Version: YAR51EW0
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Tue Jan  4 11:38:14 2005 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 182) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  40) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  3 Spin_Up_Time            0x0027   203   202   063    Pre-fail  Always       -       10754
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       102
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   250   245   187    Pre-fail  Always       -       43817
  9 Power_On_Minutes        0x0032   251   251   000    Old_age   Always       -       816h+29m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       131
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       46
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       10349
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   197   189   000    Old_age   Offline      -       10
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       49
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       1
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   201   199   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
Warning: ATA error count 9 inconsistent with error log pointer 5

ATA Error Count: 9 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9 occurred at disk power-on lifetime: 747 hours (31 days + 3 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 c7 b9 8a e0  Error: ICRC, ABRT at LBA = 0x008ab9c7 = 9091527

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 45 c7 b9 8a e0 00      05:09:27.296  READ DMA
  ef 03 45 c7 b9 8a e0 00      05:08:01.312  SET FEATURES [Set transfer mode]
  ef 03 45 c7 b9 8a e0 00      05:08:01.312  SET FEATURES [Set transfer mode]
  c8 00 08 c0 b9 8a e9 00      05:09:02.112  READ DMA
  c8 00 01 ff b3 8a e9 00      05:09:02.080  READ DMA

Error 8 occurred at disk power-on lifetime: 596 hours (24 days + 20 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 ff b5 8a e0  Error: ICRC, ABRT at LBA = 0x008ab5ff = 9090559

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 45 ff b5 8a e0 00      00:08:17.456  READ DMA
  ef 03 45 ff b5 8a e0 00      00:06:40.960  SET FEATURES [Set transfer mode]
  ef 03 45 ff b5 8a e0 00      00:06:40.960  SET FEATURES [Set transfer mode]
  c8 00 08 f8 b5 8a e9 00      00:06:36.224  READ DMA
  c8 00 08 c0 b9 8a e9 00      00:06:36.208  READ DMA

Error 7 occurred at disk power-on lifetime: 489 hours (20 days + 9 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 00 a6 65 e1  Error: ICRC, ABRT at LBA = 0x0165a600 = 23438848

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 da 80 00 a6 65 e1 00      00:12:12.864  READ DMA
  c8 da 80 80 a5 65 e1 00      00:12:12.864  READ DMA
  c8 da 80 00 a5 65 e1 00      00:12:12.864  READ DMA
  c8 da 80 80 a4 65 e1 00      00:12:12.864  READ DMA
  c8 da 80 00 a4 65 e1 00      00:12:12.864  READ DMA

Error 6 occurred at disk power-on lifetime: 489 hours (20 days + 9 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 00 4b 55 e1  Error: ICRC, ABRT at LBA = 0x01554b00 = 22366976

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 da 80 00 4b 55 e1 00      00:11:56.400  READ DMA
  c8 da 80 80 4a 55 e1 00      00:11:56.400  READ DMA
  c8 da 80 00 4a 55 e1 00      00:11:56.400  READ DMA
  c8 da 80 80 49 55 e1 00      00:11:56.400  READ DMA
  c8 da 80 00 49 55 e1 00      00:11:56.400  READ DMA

Error 5 occurred at disk power-on lifetime: 489 hours (20 days + 9 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 00 7e c0 e0  Error: ICRC, ABRT at LBA = 0x00c07e00 = 12615168

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 da 80 00 7e c0 e0 00      00:09:02.704  READ DMA
  c8 da 80 80 7d c0 e0 00      00:09:02.704  READ DMA
  c8 da 80 00 7d c0 e0 00      00:09:02.704  READ DMA
  c8 da 80 80 7c c0 e0 00      00:09:02.704  READ DMA
  c8 da 80 00 7c c0 e0 00      00:09:02.688  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       748         -
# 2  Extended offline    Interrupted (host reset)      10%       747         -

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


