2
0
I've seen some similar questions but these are dealing with files under 1GB and the answers generally recommend services such as Dropbox, S3 and Skydrive. This does not appear to be suitable for my needs.
I have a very large dataset (Dota2 public matchmaking history) which in its raw form in MongoDB (without indexes) is around 800GB. Dumping this and using 7Zip Ultra level compression I can achieve around 9-10% compression ratio, I can reduce this size to around 80GB for distribution. I am seeking a way to make these compressed files publicly available, but am unsure of the best way to distribute these. I can split the files into smaller pieces by dumping with a query. This has a negligible impact on the compression ratio.
My home internet has a very slow upload speed (1.3Mbps max, often throttled), so I would prefer not to seed a torrent from my home connection.
What is the best way of distributing this dataset? Could there be a way to further compress the dataset?
EDIT: Since this question has been marked as duplicate, I don't think I can answer it anymore. I'm not sure how anyone thinks that this is a duplicate of a question where the accepted answer is Dropbox, but for anyone who stumbles across this question by best options seems to be as follows:
Use BitTorrent as the transfer protocol, but host the files with a "Seedbox" provider. These appear to be VPS providers focused on provided bandwidth and storage space for heavy users of the BitTorrent protocol. As an average price, enough space and bandwidth for my needs can be had for around $10 a month. In order to get the files onto the hosting providers, I will copy them to an external drive and then FTP them to the hosting from multiple locations where I have access to internet connections.
Bittorrent is the best way. As more and more people get the file from you less people will actually get it from you. Eventually you can stop seeding the file yourself. – Ramhound – 2013-06-21T11:11:54.450
1Transfering large files over internet – Sathyajith Bhat – 2013-06-21T11:13:52.063
@Ramhound - I understand how Bittorrent works, but I'm concerned that even I were able to fully saturate my home connection 24/7 it would take almost 6 days to seed out the first copy of the dataset, and seeds who get a copy and then choose to leave the pool will cause the amount of data I need to transfer further. – Charles A – 2013-06-21T11:17:09.707
@Sathya - Appreciated, but that question is focused on the security of distributing files which are considerably smaller than this dataset. – Charles A – 2013-06-21T11:19:50.907
2@CharlesA - There is no fast way to distrubute 80GB of data. Any solution would result in the distrubution of 80GB of data. Its very likely you have monthly data transfer limits. I would check with your internet provider to make sure you will be able to distrubute 80GB worth of data in a single month. – Ramhound – 2013-06-21T11:30:53.967