Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?


ハードディスク・ドライブの故障率に関する事実, ITpro 



1) 平均故障時間 (MTTF)の定義について

Drive manufacturers specify the reliability of their products in terms of two related metrics: the annualized failure rate (AFR), which is the percentage of disk drives in a population that fail in a test scaled to a per year estimation; and the mean time to failure (MTTF).  ... The MTTF is estimated as the number of power on hours per year divided by the AFR. ... The MTTFs specified for today's highest quality disks range from 1,000,000 hours to 1,500,000 hours, corresponding to AFRs of 0.58% to 0.88%.

2.2 Specifying disk reliability and failure frequency

年間の時間数を、年間故障率(AFR)で割った値が、平均故障時間 (MTTF)になります。上の例にある、MTTF100万時間と150万時間についていいますと、以下のようになります。

  (24時間 * 365日) / 0.0088 ≒  100万時間

  (24時間 * 365日) / 0.0058 ≒  150万時間


2) 年間交換率(ARR)

In contrast, in our data analysis we will report the annual replacement rate (ARR) to reflect the fact that, strictly speaking, disk replacements that are reported in the customer logs do not necessarily equal disk failures

2.2 Specifying disk reliability and failure frequency



Many sites follow a ``better safe than sorry'' mentality, and use even more rigorous testing. As a result, it cannot be ruled out that a customer may declare a disk faulty, while its manufacturer sees it as healthy. ... In fact, a disk vendor has reported that for 43% of all disks returned by customers they find no problem with the disk .

2.1 What is a disk failure?


3) 調査結果

Observation 1: Variance between datasheet MTTF and disk replacement rates in the field was larger than we expected. The weighted average ARR was 3.4 times larger than 0.88%, corresponding to a datasheet MTTF of 1,000,000 hours.

4.1 Disk replacements and MTTF



Observation 2: For older systems (5-8 years of age), data sheet MTTFs underestimated replacement rates by as much as a factor of 30.

Observation 3: Even during the first few years of a system's lifetime (< 3 years), when wear-out is not expected to be a significant factor, the difference between datasheet MTTF and observed time to disk replacement was as large as a factor of 6.


Observation 4: In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk-independent factors, such as operating conditions, usage and environmental factors, affect replacement rates more than component specific factors.  ...




