Can a massive MySQL data import on an SSD damage it?

28

5

I have to import quite a lot of data (~100 million rows, ~100 times) into a MySQL database. Currently, it is stored on my hard disk drive, and the bottleneck of my import seems to be the hard disk drive write speed.

I have heard that SSDs do not like massive continuous writes, and that it tends to damage them. What do you think? Is this really an issue on modern SSDs?

christophetd

Posted 2015-04-24T14:12:24.863

Reputation: 384

As long as you leave (say) 2-3GB outside the partitioned area for over-provisioning, I guess you are safe with it. I don't see that much problem with it. Most SSDs already have some part of the disk that isn't accessible to the operating system. That space is used for wear leveling and to overprovisioning, in case the hard-drive is too full. These extra GB will give more room for the SSD to distribute the data to avoid damages. If you are hard-core and want to go ahead with this, you can find out how many memory chips your ssd has and give 1GB by chip. 10 chips is 10 unpartitioned GB. – Ismael Miguel – 2015-04-24T15:38:03.323

5For what little it is worth, we routinely import far, far more data than this. A single one of our tables has much more data than you are importing, and we have a couple of hundred tables. We use SSDs. I expect you'll be fine. – ChrisInEdmonton – 2015-04-24T16:29:17.240

4Nowadays SSDs are smart enough to handle wear leveling themselves even without the OS support (even though the OS asks to rewrite the same block, the SSD's controller transparently writes to a different block each time) so it'll be just fine. – None – 2015-04-24T17:35:58.840

7Red herring. Failure rate of SSDs isn't a thing to worry about - it'll be long enough that they'll still last longer than equivalent spinning rust. – Sobrique – 2015-04-24T21:30:36.130

2People worry far too much about their SSDs. Basically you'll never manage to "destroy" your SSD by accident, and even doing it on purpose may require weeks or months of continuous writes. Even if you "destroy" it, it will still provide the data as read-only. Stop worrying and just use it. You might as well ask about how your HDD's read/write head gets worn down by the accelerations. – mic_e – 2015-04-26T18:39:04.533

Answers

27

It really is not a straightforward answer to this.

SSDs do not care about continuous writes as much as how many times any particular sector is overwritten. When SSDs first came out, something like SQL was a bad word as the operating system in general treated the drive like a traditional HDD and failures were very frequent.

Since then, drives have become bigger, cheaper, more reliable, meant for more read/writes and the operating systems have become smarter.

SSDs in SQL is not only common, but often encouraged. Feel free to peruse the DBA sister site.

My thoughts are to do it, assuming the SQL server is built properly with redundant disks. If not, then expect a failure eventually anyway.

Austin T French

Posted 2015-04-24T14:12:24.863

Reputation: 9 766

5"If not, then expect a failure eventually anyway." If the server does use redundant disks, still definitely expect a failure at some point, and plan for it. It's just that with redundancy in place, a single storage device failure has a much lower probability of leading to system downtime. – a CVn – 2015-04-25T22:54:01.920

@MichaelKjörling yes, precisely. In my mind "built properly" also assumes backups of the database in case of a failure... But sometimes even that which should be OK to be left unsaid needs to be said, thanks. – Austin T French – 2015-04-25T23:07:07.027

19

Reads are fine, and SSD's can have their bits read from without any detrimental effect.

Writes are another matter. Clearing a bit affects the integrity of the bit and after a lot of sequential writes, the bit will stop accepting new writes altogether. It can however still be read.

Let me just say that the write limits on new enterprise drives are huge. Take Samsung's new 845DC Pro. It is good for 10 drive writes per day for 5 years on warranty. I would imagine it will do twice that number. To put that into numbers, that's 14,600 TB written over 5 years on the 800 GB model.
Or 2920 TB per year,
Or 8 TB per day, for five years.

Show me a hard drive with a warranty that covers that much use. I'm not even sure you could write 8 TB to a HDD in a day:- (50 MB/s average throughput * 60 (seconds) * 60 (minutes) * 24 (hours) = 4,320,000 MB/day = 4.32 TB/day) It turns out you can't (on an average drive).

As long as you use a drive like this, based on V-NAND (or equally durable SLC), not one based on TLC or bad MLC flash, you should be fine. And anyway, RAID 10 and backups are your friend for a reason. And at least if the SSD write limit does become an issue, you can still read the data stored in the faulty bits.

SSDs are also cheaper to run, cooler, quieter and enterprise models are especially resistant to power issues. No more head crash fears and of course, a huge performance increase for your database access needs.

Ctrl-alt-dlt

Posted 2015-04-24T14:12:24.863

Reputation: 2 400

12Can I ask why the downvote? – Ctrl-alt-dlt – 2015-04-24T15:55:35.057

You can ask, but you shan't receive, apparently. – Fund Monica's Lawsuit – 2015-04-27T04:14:46.047

12

Writing to SSDs isn't necessarily bad. It's the writing and rewriting of a single block that's bad. Meaning if you write a file delete it then write it again, or make small amounts of changes to a file over and over again. This causes wear on the SSD's. Databases would definitely fit into this category.

However according to this article, petabytes of data have been written to SSDs and still been operable. This is probably due to advances to wear leveling:

Wear leveling attempts to work around these limitations by arranging data so that erasures and re-writes are distributed evenly across the medium. In this way, no single erase block prematurely fails due to a high concentration of write cycles.

In your particular situation I would have the databases reside on the SSD for speed, but backed up on a daily basis. You also might consider getting two SSDs in a RAID 1 array as well. The likelihood of two SSDs failing at the same time is low.

Note: RAID arrays are NOT backups!!!! No matter if you use a RAID array or not, have a backup. No matter if you use an SSD or not, have a backup.

James Mertz

Posted 2015-04-24T14:12:24.863

Reputation: 24 787

1RAID1 would do very little for the type of damage you are talking about. The wear level is likely to be deterministic, which means they will wear at exactly the same rate and way, causing errors to occur almost exactly in the same places. – Aron – 2015-04-24T16:18:49.077

from the linked article: "the electronics in the SSD are going to fail long before the NAND wears out" ... wait, what? – Michael – 2015-04-24T16:49:04.757

4

Let's assume your import involves no updates and no deletions. So you are doing all insertions. This should only be writing new data to the transaction log.

This means as data is added, it is always being written to a new sector. There might be some buffers/swap that gets churned/written to multiple times, but ignoring that, all of those inserts would theoretically result in no more than one write per sector. Depending on how MySQL is implemented, and what kind of bulk insert you are performing, you might generate a second set of writes later when the transaction log is integrated in to the main data file(I'm going off an understanding of different DB engines, and assuming MySQL is somewhat similar in how transaction logs are flushed).

Point being, you are not "churning" the SSD. That is, you are not doing a great deal of modifications/moves/deletions/etc. that would potentially rewrite over the same sectors many times. So you are essentially only going to generate a very small number of writes per sector and that's what really matters.

Assuming you are not completely filling up the SSD, there should be sufficient free space for those hot spots(such as buffers/swap) which are being churned to minimize wear through wear leveling algorithms.

(Indexes might be another matter. As clustered indexes in many DBs involve alot of modifications as data is inserted. Usually when doing large isnerts in a data warehouse environment, you turn off indexes during bulk import then update them after.)

AaronLS

Posted 2015-04-24T14:12:24.863

Reputation: 2 074

3

This is no issue.

First of all, SSDs have greatly improved during the last years. Overprovisioning and wear levelling (and to a small amount, the TRIM command, though not applicable in your case) have made them quite suitable as heavy-duty, general-purpose disks. I am not using anything but SSD on my development PC (which regularly does a lot of compiling) without even coming anywhere near the erase cycle count.

Further, this statement:

SSDs do not like massive continuous writes, and that it tends to damage them

is outright wrong. The opposite is the case, frequent small writes, if anything, may cause damage to SSDs.

Unlike traditional hard disks, SSDs (or rather the NAND-based flash inside) are physically organized in large blocks which logically contain several sectors. A typical block size is 512kB whereas sectors (which is the unit that the filesystem uses) are traditionally 1kB (different values are possible, two decades ago 512B was common).
Three things can be done with a 512kB-block. It can be read from, part of it or all can be programmed (= written to), and the whole of it can be erased. Erasing is what's problematic because there is a limited numbers of erase cycles, and you can only erase a complete block.

Therefore, large writes are very SSD-friendly whereas small writes are not.

In the case of small writes, the controller must read a block in, modify the copy, erase a different block, and program it. Without caching, in the very worst possible case, you would need to erase 512.000 blocks to write 512 kilobytes. In the best possible case (large, continuous write) you need to do exactly 1 erase.

Doing an import into a MySQL database is much different from doing many separate insert queries. The engine is able to collapse a lot of writes (both data and indices) together and needs not sync between each pair of inserts. This amounts to a much more SSD-friendly write pattern.

Damon

Posted 2015-04-24T14:12:24.863

Reputation: 4 002

2Sectors are traditionally 1 KiB? Citation, please. On rotational drives, two sector sizes are common: 512 bytes (traditional, like on my 4 TB HDDs, in IBM-compatibles dates back to around 1981 or so) and 4096 bytes ("Advanced Format"). File system level allocation units can vary in size, but that's a completely different matter and is purely a file system construct to keep the data structures tracking allocation to a reasonable size in file systems that don't grow them dynamically on an as-needed basis; besides, I doubt fixed 1 KiB block sizes are very common in practice. – a CVn – 2015-04-25T22:58:46.840

@MichaelKjörling: Thank you for your very valuable input. You did of course read and understand the answer, didn't you? The relevant fact is that SSDs have physical block sizes which are much larger than that, regardless of the logical sector size (which I've seen anywhere from 500 to 4096 bytes, even non-power-of-two sizes). No citation needed. – Damon – 2015-04-27T07:29:01.867

1

SSD's do not like it. If you keep max write speed up for 5-10 years (24 hours per day, 7 days per week) then you might end up with a broken SSD.

Ofc. After 5 years most servers have reached their economical end of life.


Disclaimer:
Do not try this with the very first generation of SSD. Those where less robust.

Hennes

Posted 2015-04-24T14:12:24.863

Reputation: 60 739

I'm well aware that using any disk at its maximal capacity 7/24 would end up damaging it... My question is if it's safe for a limited amount of time (let's say several times 2-3 hours) – christophetd – 2015-04-24T14:31:38.390

@christophetd - It depends. Update your question to estimate the amount of data. Its more about the percentage of the drive. Writing 20GB an hour on a 80GB SSD is worst then doing 20GB an hour on a 1TB SSD. – Ramhound – 2015-04-24T15:52:06.257

On the same note: Having a mostly empty drive means that many of the 'empty' flash cells get used in wear leveling. (and a bigger drive with the same amount of data is %-while emtier). – Hennes – 2015-04-24T16:00:01.307

1

If you are truly interested in figuring out the details then you will need the following question answered:

On average how many bytes are in each row?

If you can tell me that there are 10 columns, each column is varchar(100), and the encoding is UTF-8 then I can guess at worst case scenario that you have 4,000 bytes worth of data per row and add some more bytes for meta-data so lets say 4,200 bytes?

Your torture SQL calculates to 4,200 x 100 x 100,000,000 = 42,000,000,000,000 bytes of data written to the disk

42,000,000,000,000 / 1000 = 42,000,000,000 KB

42,000,000,000 / 1000 = 42,000,000 MB

42,000,000 / 1000 = 42,000 GB

42,000 / 1000 = 42 TB

At this theoretical worst-case scenario you will be writing 42 TB to the disk

According to this article, provided by @KronoS you should be good for about 25 more rounds of your torture SQL.

MonkeyZeus

Posted 2015-04-24T14:12:24.863

Reputation: 7 101

-2

As the poster of this writeup on SSDs said, what is really harmful is again and again writing small chunks of data.

  • bits are stored into {1,2,3}-bit cells. These have limited lifespan.
  • cells are grouped into [2-16]KB pages (smallest writeable unit)
  • pages are grouped into (128-256 page-)blocks (smallest eraseable unit)
  • for a page to be rewritten, it---and its whole block---needs to be erased first

That is why it is recommended to

  • never write less than a page at once,
  • buffer small writes, and
  • separate read and write requests
  • "A large single-threaded write is better than many small concurrent writes"

So, a really big amount at once seems way better.

serv-inc

Posted 2015-04-24T14:12:24.863

Reputation: 400

2This answer doesn't really provide any relevant information that hasn't been said, besides, its basically a comment with a link contained in it. – Ramhound – 2015-04-24T15:53:04.953

@Ramhound: would you give your ok for your comment (thank you, btw), and this, too, to be tagged obsolete? Or do you still consider the info already said/irrelevant? – serv-inc – 2015-04-26T11:06:27.783

While its no longer a link, honestly, the technical information itself, does not really apply to the user's question with regards to running a database on a SSD I – Ramhound – 2015-04-26T14:25:44.200

@Ramhound: to me it seemed to be about the import, not the running. Judging from the downvotes, it seems as though you're right – serv-inc – 2015-04-26T17:37:54.157