Hi, ich habe seit einigen Tagen das Problem dass ich jeden Tag eine Email erhalte. Anscheinend hat eine meiner Festplatten einen "Offline uncorrectable sector". Ich habe jetzt schon diverse Versuche hinter mir, aber wenn man den glauben darf ist alles in Ordnung??
Email:
This message was generated by the smartd daemon running on: host name: Ubuntu DNS domain: [Empty] The following warning/error was logged by the smartd daemon: Device: /dev/sdd [SAT], 1 Offline uncorrectable sectors Device info: WDC WD60EFRX-68MYMN1, S/N:WD-WX21D1526AR6, WWN:5-0014ee-20be1cec8, FW:82.00A82, 6.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. The original message about this issue was sent at Wed Dec 26 00:26:29 2018 CET Another message will be sent in 24 hours if the problem persists.
Hier habe ich die Smart Daten zu dem Laufwerk:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | dirk@Ubuntu:~$ sudo smartctl -A /dev/sdd smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-43-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 203 196 021 Pre-fail Always - 8825 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 76 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 058 058 000 Old_age Always - 30720 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 75 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 50 193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 30294 194 Temperature_Celsius 0x0022 120 109 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 |
Der Selbstest war anscheinend aber okay:
1 2 3 4 5 6 7 8 | SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 3 Extended offline Completed without error 00% 30627 - # 4 Short offline Completed without error 00% 30614 - # 5 Conveyance offline Completed without error 00% 30614 - |
Der komplette ErrorLog zu dem Problem:
Complete error log: SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 2 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 [1] occurred at disk power-on lifetime: 30562 hours (1273 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 02 7b 4d a8 40 00 Error: UNC at LBA = 0x027b4da8 = 41635240 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 08 00 b8 00 00 02 7b 4e 98 40 08 6d+09:30:55.484 READ FPDMA QUEUED 60 00 08 00 c0 00 00 02 7b 4e 90 40 08 6d+09:30:55.484 READ FPDMA QUEUED 60 00 08 00 88 00 00 02 7b 4e 88 40 08 6d+09:30:55.482 READ FPDMA QUEUED 60 00 08 00 90 00 00 02 7b 4e 80 40 08 6d+09:30:55.478 READ FPDMA QUEUED 60 00 08 00 c8 00 00 02 7b 4e 78 40 08 6d+09:30:55.475 READ FPDMA QUEUED Error 1 [0] occurred at disk power-on lifetime: 30562 hours (1273 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 02 7b 4d a8 40 00 Error: UNC at LBA = 0x027b4da8 = 41635240 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 30 00 00 02 14 12 30 40 08 6d+09:30:27.707 READ FPDMA QUEUED 60 04 00 00 28 00 00 02 14 0e 30 40 08 6d+09:30:27.707 READ FPDMA QUEUED 60 04 00 00 20 00 00 02 14 0a 30 40 08 6d+09:30:27.707 READ FPDMA QUEUED 60 04 00 00 18 00 00 02 14 06 30 40 08 6d+09:30:27.707 READ FPDMA QUEUED 60 04 00 00 10 00 00 02 14 02 30 40 08 6d+09:30:27.707 READ FPDMA QUEUED
Ich habe die Platte aus dem Raid Verbund entfernt und versucht den Sektor aus der ErrorLog neu zu schreiben bzw zu lesen.
1 2 | hdparm --repair-sector 41635240 --yes-i-know-what-i-am-doing /dev/sdd hdparm --read-sector 41635240 /dev/sdd |
Ich habe gerade nicht die original Rückmeldung zur Hand aber es lief ohne Fehler...
Also habe ich mir das Tool Diskscan geladen:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | root@Ubuntu:/home/dirk# diskscan -f /dev/sdd diskscan version 0.19 I: Validating path /dev/sdd I: Disk start temperature is 36 I: Opened disk /dev/sdd sector size 512 num bytes 6001175125504 I: Scanning disk /dev/sdd in 65536 byte steps I: Scan started at: Sat Dec 29 21:13:10 2018 Disk scan |=== | ETA: 8h56m20s I: Disk temperature changed from 36 to 37 Disk scan |==================== | ETA: 8h03m59s I: Disk temperature changed from 37 to 36 Disk scan |========================== | ETA: 7h42m20s I: Disk temperature changed from 36 to 35 Disk scan |============================ | ETA: 7h34m43s I: Disk temperature changed from 35 to 34 Disk scan |==============================================================================================================================================| ETA: 9h40m50s Access time histogram: Value Percentile TotalCount 1/(1-Percentile) 0.156 0.000000 5 1.00 0.333 0.100000 9242599 1.11 0.335 0.200000 30323990 1.25 0.335 0.300000 30323990 1.43 0.336 0.400000 37694983 1.67 0.352 0.500000 45815552 2.00 0.375 0.550000 52790374 2.22 0.376 0.600000 58683443 2.50 0.377 0.650000 69922809 2.86 0.377 0.700000 69922809 3.33 0.377 0.750000 69922809 4.00 0.378 0.775000 79500396 4.44 0.378 0.800000 79500396 5.00 0.378 0.825000 79500396 5.71 0.378 0.850000 79500396 6.67 0.379 0.875000 82437640 8.00 0.379 0.887500 82437640 8.89 0.379 0.900000 82437640 10.00 0.381 0.912500 83891044 11.43 0.389 0.925000 84884230 13.33 0.410 0.937500 85855476 16.00 0.418 0.943750 86724571 17.78 0.419 0.950000 87441973 20.00 0.420 0.956250 87830901 22.86 0.429 0.962500 88141437 26.67 1.007 0.968750 88726233 32.00 1.024 0.971875 88997101 35.56 1.047 0.975000 89357808 40.00 1.049 0.978125 89899368 45.71 1.049 0.981250 89899368 53.33 1.050 0.984375 90321461 64.00 1.050 0.985938 90321461 71.11 1.051 0.987500 90541952 80.00 1.052 0.989062 90641575 91.43 1.054 0.990625 90720605 106.67 1.068 0.992188 90860809 128.00 1.087 0.992969 90934465 142.22 1.090 0.993750 91075697 160.00 1.090 0.994531 91075697 182.86 1.091 0.995313 91217446 213.33 1.091 0.996094 91217446 256.00 1.092 0.996484 91331115 284.44 1.092 0.996875 91331115 320.00 1.092 0.997266 91331115 365.71 1.093 0.997656 91387482 426.67 1.094 0.998047 91420041 512.00 1.094 0.998242 91420041 568.89 1.095 0.998437 91438315 640.00 1.096 0.998633 91447022 731.43 1.102 0.998828 91464190 853.33 1.108 0.999023 91482614 1024.00 1.113 0.999121 91490674 1137.78 1.130 0.999219 91499421 1280.00 1.134 0.999316 91508663 1462.86 1.173 0.999414 91517107 1706.67 1.421 0.999512 91526003 2048.00 2.359 0.999561 91530470 2275.56 2.855 0.999609 91535727 2560.00 2.857 0.999658 91540068 2925.71 2.873 0.999707 91543978 3413.33 2.897 0.999756 91549311 4096.00 2.899 0.999780 91553571 4551.11 2.899 0.999805 91553571 5120.00 2.901 0.999829 91555336 5851.43 2.913 0.999854 91557442 6826.67 2.943 0.999878 91559666 8192.00 3.011 0.999890 91560664 9102.22 3.135 0.999902 91561780 10240.00 3.709 0.999915 91562887 11702.86 4.535 0.999927 91564190 13653.33 4.539 0.999939 91565185 16384.00 4.555 0.999945 91565705 18204.44 4.583 0.999951 91566388 20480.00 5.003 0.999957 91566801 23405.71 5.599 0.999963 91567357 27306.67 7.835 0.999969 91567916 32768.00 9.135 0.999973 91568196 36408.89 10.743 0.999976 91568475 40960.00 11.311 0.999979 91568755 46811.43 12.487 0.999982 91569038 54613.33 14.055 0.999985 91569314 65536.00 15.727 0.999986 91569454 72817.78 17.247 0.999988 91569595 81920.00 18.927 0.999989 91569733 93622.86 20.447 0.999991 91569872 109226.67 21.295 0.999992 91570021 131072.00 21.903 0.999993 91570083 145635.56 23.439 0.999994 91570153 163840.00 25.535 0.999995 91570221 187245.71 27.839 0.999995 91570291 218453.33 31.679 0.999996 91570362 262144.00 32.351 0.999997 91570406 291271.11 32.575 0.999997 91570431 327680.00 35.231 0.999997 91570466 374491.43 38.911 0.999998 91570501 436906.67 42.719 0.999998 91570537 524288.00 42.783 0.999998 91570563 582542.22 42.815 0.999998 91570573 655360.00 44.959 0.999999 91570588 748982.86 53.151 0.999999 91570608 873813.33 60.639 0.999999 91570623 1048576.00 63.903 0.999999 91570632 1165084.44 65.311 0.999999 91570641 1310720.00 66.687 0.999999 91570650 1497965.71 67.839 0.999999 91570658 1747626.67 70.527 1.000000 91570667 2097152.00 71.615 1.000000 91570671 2330168.89 71.999 1.000000 91570676 2621440.00 72.639 1.000000 91570680 2995931.43 74.687 1.000000 91570684 3495253.33 78.399 1.000000 91570689 4194304.00 82.175 1.000000 91570691 4660337.78 83.007 1.000000 91570693 5242880.00 88.639 1.000000 91570695 5991862.86 94.207 1.000000 91570698 6990506.67 101.055 1.000000 91570700 8388608.00 103.935 1.000000 91570701 9320675.55 104.319 1.000000 91570702 10485760.00 104.383 1.000000 91570703 11983725.71 106.559 1.000000 91570704 13981013.34 113.279 1.000000 91570705 16777216.00 113.599 1.000000 91570706 18641351.10 113.599 1.000000 91570706 20971519.98 114.879 1.000000 91570707 23967451.45 114.879 1.000000 91570707 27962026.68 121.023 1.000000 91570708 33554432.00 121.023 1.000000 91570708 37282702.28 121.023 1.000000 91570708 41943039.96 139.519 1.000000 91570709 47934902.91 139.519 1.000000 91570709 55924053.19 139.519 1.000000 91570709 67108864.00 139.519 1.000000 91570709 74565404.57 139.519 1.000000 91570709 83886080.31 299.007 1.000000 91570710 95869805.31 299.007 1.000000 91570710 inf #[Mean = 0.380, StdDeviation = 0.194] #[Max = 299.007, Total count = 91570710] #[Buckets = 16, SubBuckets = 2048] Latency graph: 330 | | | ^ | | 275 | | | | | 220 | | | | | 165 | | | ^ | | ^ ^ ^ ^ 110 | ^ ^ ^ ^ ^ ^ | ^ ^ ^ | ^ ^ ^ | ^ ^ ^ ^ ^ ^ ^^ ^^ ^ ^ ^^^ ^^ ^^^ ^ ^ ^ ^ ^^ | ^^ ^ ^^ ^ ^ ^ ^ ^^ ^ ^ ^^ 55 | ^ ^ ^ ^ ^ | ^ ^ ^ ^ ^ | ^ | ********************************************************************** | ______________________________________________________________________ +----------------------------------------------------------------------- Conclusion: passed I: Scan ended at: Sun Dec 30 06:54:00 2018 I: Scan took 34850 second I: Closed disk /dev/sdd |
Auch dieser Test wurde bestanden. Also ein erneuter versuch über den Selbsttest:
smartctl -t long /dev/sdd SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 203 196 021 Pre-fail Always - 8825 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 76 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 058 058 000 Old_age Always - 30720 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 75 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 50 193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 30294 194 Temperature_Celsius 0x0022 120 109 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 30707 - # 2 Short offline Completed without error 00% 30672 - # 3 Extended offline Completed without error 00% 30627 - # 4 Short offline Completed without error 00% 30614 - # 5 Conveyance offline Completed without error 00% 30614 -
Aktuell habe ich gerade einen Versuch über Badblocks am laufen. Bisher scheint aber auch alles in Ordnung zu sein. In den letzten 13 Stunden jedenfalls noch kein Fehler.
badblocks -svw -b 4096 /dev/sdd > /home/dirk/badsectors.txt
Mir gehen so langsam die Ideen aus. Was kann man sonst noch so versuchen? Laut smartctl ist der "Offline uncorrectable sector" immernoch da.