0

Is anyone aware of any sandbox where an environment is already set up for big data processing?

It can be hadoop, cassandra, pig etc...

I'm a sql server programmer, and trying to get into big data/nosql solutions, but having a very difficult time trying to set up my own environment in linux.

Are there any free/paid services that allow you to upload your big data and play with it and set up clustering?

Alex Gordon
  • 455
  • 3
  • 14
  • 31

1 Answers1

1

Amazon, Windows Azure.

You can easily get 100 or 1000 virtual machiens for a short time.

The problem with any such external thing is that "big data" is hard to move. Lots of processing is easy, but how do you get terabytes of data to Amazon for a test?

TomTom
  • 50,857
  • 7
  • 52
  • 134
  • great point regarding moving it! i would like experience with the processing part, so i would only move maybe a few gigs! can you point me exactly where on amazon this can be done? – Alex Gordon Dec 31 '12 at 17:16
  • THe problem with that is - that - hm - seriously - depending WHAT do you, you may not do anything then. I mean, if you need 100 hadoop instances for handling large data, what sense is having 1gb distributed? – TomTom Dec 31 '12 at 17:16
  • Anyhow, any cloud (amazon) does that. And how challenging is it for you to type "amazon hpc" into google? – TomTom Dec 31 '12 at 17:16
  • regarding your first response, i just want to learn. is there any harm in doing what im doing? would i not learn anything? – Alex Gordon Dec 31 '12 at 17:19
  • Likey not learn anything. THat is the problem - big data ASSUMES big data, many things make no sense if you dont ahve a lot of data. I run a HPC system in my basement, if that would "test" with a handfull of gigabyte, there would not be enough jobs ;) – TomTom Dec 31 '12 at 17:26
  • oh oh isee! great point. can you tell me what exactly do you do with your HPC system? – Alex Gordon Dec 31 '12 at 17:27
  • Statistical simulations based on time variable data - i.e. testing financial trading systems. We run jobs of "1 system, 1 parameter set, 1 week" and tests generate 10.000 combinations per run, over many years. If I would have to cut down on the size in the time etc. - the system would not spin up to full speed, possibly, because agents check for new jobs every 30 seconds. – TomTom Dec 31 '12 at 17:29
  • very cool! what is your goal with the simulations? – Alex Gordon Dec 31 '12 at 17:30
  • What sense does one have doing that? Validating and testing ideas, then pumping them into the markets and making money. – TomTom Dec 31 '12 at 17:32
  • cool! have you been succesful with it? – Alex Gordon Dec 31 '12 at 17:33
  • Hahaha. Expect me to answer that? Seriously? But it is not "me". Half my company manpower is running in that department. – TomTom Dec 31 '12 at 17:34
  • blessings for lots of success!! i'm very interested in learning the hadoop technologies and i was wondering if u can give any guidance on how to get into it? – Alex Gordon Dec 31 '12 at 17:36
  • No, not using hadoop here. As our data set is dynamic and we do not so much analysis there as processing... that is not a technology we use. – TomTom Dec 31 '12 at 17:53
  • are you familiar at all with simpledb? – Alex Gordon Dec 31 '12 at 18:03
  • Not really. See, our sims are programs running "along" data files. We do not filter, but generate an output file that then gets loaded into a database server for analysis. Others are taking a set of data and creating thousands of random combinastions, then uploading them again. Hadoop is more in the "filter" thing. FIND data - not do complex processing. Comlex because ware time based - you HAVE to run a, b,c - this is why we do week by week, as weekends are "clean slate" moments. – TomTom Dec 31 '12 at 18:05
  • i was very curious on how this project is going for you?! – Alex Gordon Jan 27 '13 at 22:46
  • We activated the first 6 computers on Friday. Works like a charm. Generation 2 of the agent is planned already, better control, and getting another 3 computers online in February. We now are about factor 40 faster than before. – TomTom Jan 28 '13 at 05:46