How likely is the last block of a drive to be a bad sector?

0

2

This is a follow-up question to this one, where zeroing the disk with dd would give me an error towards the very end of the drive.

I then set about trying to determine whether this was a bad sector, as was suggested it might be, as opposed to some sort of issue with the hard drive or with the environment I was running.

Running:

badblocks -o ~/.badblocks_in_X_full -vws /dev/sda 976762584 950000000

...gives the output:

Checking for bad blocks in read-write mode
From block 950000000 to 976762584
Testing with pattern 0xaa: Weird value (4294967295) in do_writerrors)
done
Reading and comparing: done
Testing with pattern 0x55: Weird value (4294967295) in do_writeerrors)
done
Reading and comparing: done
Testing with pattern 0xff: Weird value (4294967295) in do_writeerrors)
done
Reading and comparing: done
Testing with pattern 0x00: Weird value (4294967295) in do_writeerrors)
done
Reading and comparing: done
Pass completed, 1 bad blocks found. (1/0/0 errors)

The resulting output file lists the bad block as block 976762584.

Doing cat /proc/partitions shows that dev/sda has 976762584 blocks.

In other words, badblocks is reporting that the very last block of the drive is bad.

I know it's possible that the very last block/sector of a drive could be the only bad sector on the drive, but it seemed to me such an unlikely occurrence that it would be more likely to be something else.

Is this bad sector reported by badblocks real, or more likely to be just a sign of hardware or environment issues?

Hashim

Posted 2019-11-29T21:44:29.623

Reputation: 6 967

3It's as likely as any other block is to be bad – music2myear – 2019-11-29T22:11:47.753

1It's reporting 976762584 blocks, but is the first block number 0 or number 1? If it is 0 then you need to subtract 1 from the total number of blocks to actually write the last block, otherwise you are writing past the end of the disk. The question becomes: can you erase block 0? – Mokubai – 2019-11-29T22:25:08.797

@music2myear Did you read the rest of the question, or just the title? – Hashim – 2019-11-30T02:07:11.213

1@Mokubai Interesting, that hadn't occurred to me, but isn't the first block always 0, and therefore, is it possible not to erase block 0? If the entire block device is being zeroed then surely that would always include block 0? – Hashim – 2019-11-30T02:12:46.163

You cannot "erase" a HDD sector without actually writing it. HDDs do not accept an erase command. The erase can only be performed in conjunction with a write operation. Block numbers, e.g. LBA, start at zero. This guide confirms it. Sector numbers do start at 1. SMART should be able to confirm the maximum sector number or LBA .

– sawdust – 2019-11-30T02:21:50.847

1You said the drive was 1TB in your previous question. Is it the same drive that you are talking about in this one? As LBA 976762584 shouldn't be the last block of a 1TB drive (but somewhere in the middle). – Tom Yan – 2019-11-30T05:37:27.033

1I am not sure if badblocks refers to LBA or not, but blocks in /proc/partitions apparently refers to 1KiB blocks. – Tom Yan – 2019-11-30T05:41:38.653

@TomYan It is the same disk, but for what it's worth the disk previously had a filesystem on it with blocks of 1024, and 1024*976762584 = 1000.2GB. – Hashim – 2019-11-30T07:45:27.730

That doesn't matter. It's just that I'm not familiar with badblocks so I'm not sure if the block numbers from it refers to LBA or 1KiB blocks. (It should refer to LBA though, regarding its nature of purpose; I don't even have an idea why /proc/partitions is based on 1KiB blocks.) – Tom Yan – 2019-11-30T10:24:11.547

Oh never mind. From the man page: The default is 1024. for -b block-size. – Tom Yan – 2019-11-30T10:28:29.527

1@Hashim the first block might technically be block "0" but that doesn't mean the program counts from 0. Conversely having 976762584 blocks and counting 0 as the first block means that your final block is actually at 976762583. – Mokubai – 2019-11-30T11:11:14.477

Answers

3

Although this is not a direct answer to your question, but it may give you some insight.

First of all for some reason badblocks seem to report a bad block for "last block + 1" anyway (while for "last block + 2" or higher, it correctly return a seek error):

[tom@alarm ~]$ sudo badblocks /dev/mmcblk0 -b 512 -v 31116287 31116287
Checking blocks 31116287 to 31116287
Checking for bad blocks (read-only test): done                                   

Pass completed, 0 bad blocks found. (0/0/0 errors)
[tom@alarm ~]$ sudo badblocks /dev/mmcblk0 -b 512 -v 31116288 31116288
Checking blocks 31116288 to 31116288
Checking for bad blocks (read-only test): 31116288
done
Pass completed, 1 bad blocks found. (1/0/0 errors)
[tom@alarm ~]$ sudo badblocks /dev/mmcblk0 -b 512 -v 31116289 31116289
Checking blocks 31116289 to 31116289
Checking for bad blocks (read-only test): badblocks: Invalid argument during seek
done
Pass completed, 0 bad blocks found. (0/0/0 errors)
[tom@alarm ~]$ sudo badblocks /dev/mmcblk0 -b 512 -v 31116290 31116290
Checking blocks 31116290 to 31116290
Checking for bad blocks (read-only test): badblocks: Invalid argument during seek
done
Pass completed, 0 bad blocks found. (0/0/0 errors)
[tom@alarm ~]$

But dd has no such problem:

[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null skip=31116287
1+0 records in
1+0 records out                                                                  
512 bytes copied, 0.0254859 s, 20.1 kB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null skip=31116288
0+0 records in
0+0 records out
0 bytes copied, 0.000534631 s, 0.0 kB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null skip=31116289
dd: /dev/mmcblk0: cannot skip: Invalid argument
0+0 records in
0+0 records out
0 bytes copied, 0.000737962 s, 0.0 kB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null skip=31116290
dd: /dev/mmcblk0: cannot skip: Invalid argument
0+0 records in
0+0 records out
0 bytes copied, 0.000694995 s, 0.0 kB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null iflag=count_bytes,skip_bytes 
count=1M skip=15192M
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0612828 s, 17.1 MB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null iflag=count_bytes,skip_bytes 
count=1M skip=15193M
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 0.029878 s, 17.5 MB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null iflag=count_bytes,skip_bytes 
count=1M skip=15194M
dd: /dev/mmcblk0: cannot skip: Invalid argument
0+0 records in
0+0 records out
0 bytes copied, 0.000814785 s, 0.0 kB/s
[tom@alarm ~]$ sudo dd if=/dev/mmcblk0 of=/dev/null iflag=count_bytes,skip_bytes 
count=1M skip=15195M
dd: /dev/mmcblk0: cannot skip: Invalid argument
0+0 records in
0+0 records out
0 bytes copied, 0.000700151 s, 0.0 kB/s
[tom@alarm ~]$

In any case, if you want to check whether a particular block is bad and/or compare the result of badblocks and dd, it's best that you test with the logical block size (usually 512 bytes) and do not use values that exceeds the actual size of the drive. Both pieces of information can be obtained with fdisk -l:

[tom@alarm ~]$ sudo fdisk -l /dev/mmcblk0
Disk /dev/mmcblk0: 14.86 GiB, 15931539456 bytes, 31116288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
[tom@alarm ~]$

Also make sure you count from 0 when you are specifying a specific block, and do not count one less when you are specifying numbers of blocks (like for count/skip/seek in dd)

EDIT: It's actually natural that "last block + 1" is a special case as it is valid to seek an entire drive (last block inclusive). It's just that there's no more space to read/write. But for "last block + 2" or higher, even the the seek will go past the end of the drive, hence the seek error. It might also worth noting that current badblocks doesn't exactly check what the write error is (but just return to its caller that the written length is 0): https://github.com/tytso/e2fsprogs/blob/v1.45.4/misc/badblocks.c#L443

Tom Yan

Posted 2019-11-29T21:44:29.623

Reputation: 4 744

This is an excellent answer and I appreciate the detective work in it. However what's confusing me is that I don't seem to be able to reproduce your results with regards to badblocks. Instead, when I use the value for total sectors given by fdisk -l as suggested, I get an error, which I've posted another question about and which might also the cause of this problem. I only get no errors at all when using last block -1. See here for my results: https://pastebin.com/fJThLxWs.

– Hashim – 2019-12-01T19:19:36.250

1For the record, 1953525168 is "last block + 1" in your case. The last block should be 1953525168 - 1 = 1953525167 as you count block from 0 (i.e. 0 is the first block). I have some idea about your another question and I will write an answer for it. Also I am making an edit to explain a bit why "last block + 1" is a special case. – Tom Yan – 2019-12-02T03:23:38.033

So to be sure I'm understanding you fully here - you're saying that "last block + 1" is considered a valid operation by badblocks, but is not necessary, and whenever running badblocks right to the end of a drive I need to get the total number of sectors with fdisk -l and count one less? – Hashim – 2019-12-02T17:39:19.747

1In case you misunderstood, "last block + 1" is literally "last block + 1", which means it does not exist so the write always fail (and the failure gets "misinterpreted" as a badblock). (I am not really sure what you mean by "valid operation" / "not necessary".) But the seek that needs to be made before the attempt of the write would be valid for that (as the seek only covers the entire drive but no more). (Perhaps you mean that by "valid operation"?) And yes, (the "address" of) the last block is always total number of blocks - 1. – Tom Yan – 2019-12-02T18:12:02.730

I thought that when you said special case you meant that badblocks considers it a special case. Never mind, I have my answer to this question, and based on it, I will read your other answers and try to resolve those too. – Hashim – 2019-12-02T18:14:18.540

1I said special case because it is special in nature (the seek needed for it is valid, but the subsequent write is not). – Tom Yan – 2019-12-02T18:16:18.313