ubuntuusers.de

Festplatte defekt?

Status: Ungelöst | Ubuntu-Version: Server 12.04 (Precise Pangolin)
Antworten |

MeistaJack

Anmeldungsdatum:
24. Dezember 2010

Beiträge: 26

Hallo liebe Ubuntu Community,

mein Server möchte nicht mehr richtig hochfahren, er bleibt beim Hochfahren "hängen" und wirft im Log diverse IO Errors (end_request). Ich habe die Festplatte mittels smart geprüft, kann aber auf Grund meiner leider noch nicht so langen Erfahrung nicht soviel zum Error Log aussagen, bzw es fällt mir auch nach Einlesen in smart etwas schwer, das Log zu deuten (zumindest den Teil in dem die einzelnen Fehler geloggt wurden):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
smartctl 5.41 2011-06-09 r3365 [i686-linux-3.0.0-26-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F4 EG (AFT)
Device Model:     SAMSUNG HD204UI
Serial Number:    S2H7JD6ZB02178
LU WWN Device Id: 5 0024e9 00459c451
Firmware Version: 1AQ10001
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Sun Oct 14 12:16:03 2012 CEST

==> WARNING: Using smartmontools or hdparm with this
drive may result in data loss due to a firmware bug.
****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******
Buggy and fixed firmware report same version number!
See the following web pages for details:
http://www.samsung.com/global/business/hdd/faqView.do?b2b_bbs_msg_id=386
http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(19620) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       7
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   069   045   025    Pre-fail  Always       -       9682
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       765
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3306
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       789
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       102921
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       84
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   063   000    Old_age   Always       -       28 (Min/Max 13/37)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       188
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       11
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       793

SMART Error Log Version: 1
ATA Error Count: 43 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 43 occurred at disk power-on lifetime: 3302 hours (137 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.667  IDENTIFY DEVICE
  00 00 60 98 f8 5d 40 00      00:00:00.666  NOP [Abort queued commands]
  60 00 10 d8 e1 5c 40 00      00:00:00.648  READ FPDMA QUEUED
  60 00 08 00 e0 9c 40 00      00:00:00.661  READ FPDMA QUEUED
  60 00 00 e8 e0 5c 40 00      00:00:00.661  READ FPDMA QUEUED

Error 42 occurred at disk power-on lifetime: 3302 hours (137 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.096  IDENTIFY DEVICE
  00 00 08 00 e1 1d 40 00      00:00:00.096  NOP [Abort queued commands]
  60 00 20 48 e1 5d 40 00      00:00:00.090  READ FPDMA QUEUED
  60 00 40 28 e1 5d 40 00      00:00:00.090  READ FPDMA QUEUED
  60 00 20 08 e1 5d 40 00      00:00:00.090  READ FPDMA QUEUED

Error 41 occurred at disk power-on lifetime: 3300 hours (137 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:04.116  IDENTIFY DEVICE
  00 00 00 00 58 1d 40 08      00:00:04.116  NOP [Abort queued commands]
  60 00 50 f0 51 22 40 08      00:00:04.098  READ FPDMA QUEUED
  60 00 00 40 6a 22 40 00      00:00:04.110  READ FPDMA QUEUED
  60 00 00 40 66 22 40 00      00:00:04.110  READ FPDMA QUEUED

Error 40 occurred at disk power-on lifetime: 3299 hours (137 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:01.120  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:00:01.120  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      00:00:01.120  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      00:00:01.120  IDENTIFY DEVICE
  00 00 d8 58 dd 08 40 00      00:00:01.120  NOP [Abort queued commands]

Error 39 occurred at disk power-on lifetime: 3299 hours (137 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.536  IDENTIFY DEVICE
  00 00 01 01 00 00 00 00      00:00:00.536  NOP [Abort queued commands]
  ec 00 00 00 00 00 a0 00      00:00:00.531  IDENTIFY DEVICE
  00 00 18 48 fa 5b 40 00      00:00:00.531  NOP [Abort queued commands]
  60 00 80 80 3c 86 40 00      00:00:00.519  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3306         -

Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Handelt es sich bei diesen Fehlern um behebbare Fehler? Oder sollte ich mich mit dem Gedanken anfreunden, eine neue Festplatte zu kaufen??

ThermioN

Anmeldungsdatum:
25. Januar 2012

Beiträge: Zähle...

Soweit ich das gesehen habe beginnt die Platte schon damit defekte Sektoren neu zu zuweisen. Die Platte am besten read-only mounten, alle wichtigen Daten, sofern noch möglich, auf ein anderes Medium kopieren.

TomTobin

Avatar von TomTobin

Anmeldungsdatum:
24. August 2007

Beiträge: 3101

Hallo,

ThermioN schrieb:

Soweit ich das gesehen habe beginnt die Platte schon damit defekte Sektoren neu zu zuweisen.

Woran machst Du das fest? Ich sehe da keinen Wert der das andeuten würde.

Die Platte am besten read-only mounten, alle wichtigen Daten, sofern noch möglich, auf ein anderes Medium kopieren.

Das ist natürlich trotzdem eine gute Idee ☺ Die Platte hat ja ein Problem beim Anlaufen.

Unabhängig vom Zustand der Platte würde ich

  • Kabelverbindungen prüfen

  • Netzeil ok? Spannungen ggf. im BIOS Monitor checken.

  • den langen SMART Selbsttest anstoßen (!) Datensicherung vorher machen, so ein Test bzw. Datensicherung ist zunächst mal ordentlich Stress für eine schwächelnde Platte.

Gruß

Tom

ThermioN

Anmeldungsdatum:
25. Januar 2012

Beiträge: 115

TomTobin schrieb:

Woran machst Du das fest? Ich sehe da keinen Wert der das andeuten würde.

An folgendem Wert:

1
5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0

Vorausgesetzt natürlich die Werte wurden korrekt ausgelesen/interpretiert.

TomTobin

Avatar von TomTobin

Anmeldungsdatum:
24. August 2007

Beiträge: 3101

ThermioN schrieb:

An folgendem Wert:

1
5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0

Hmm, der RAW Wert steht aber auf 0 - und die Werte für Value und Worst sind identisch, also hat noch kein Austausch stattgefunden. Thresh ist lediglich der Grenzwert. Die Platte kann also max. 242 Sektoren austauschen, da von oben nach unten gezählt wird ⮷Abfrage_der_SMART-Attribute. So hätte ich das jetzt gelesen. Allerdings bin ich kein SMART-Experte.

Gruß

Tom

Antworten |