I am a VERY new sysadmin (Class of '16) and I've been asked to create a big data cluster with 3 bare metal PowerEdge Servers. I have the following request to be put on the cluster:
*Hadoop2 *YARN *Java 7&8 *Spark *SBT *Maven *Scala *P7zip *Pig *Hive *R (libraries for Spark and Hadoop) *Zeppelin *Cassandra
I would like to know if these can all 'play well together' since I know very little of big data and searches result in a lot of "x VS y" pages rather than "x AND y". And is there a preferred industry standard?
Thank you in advance for your advice!