Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

262 questions
28
votes
3 answers

What is Hadoop and what is it used for?

I have been enjoying reading ServerFault for a while and I have come across quite a few topics on Hadoop. I have had a little trouble finding out what it does from a global point of view. So my question is quite simple : What is Hadoop ? What does…
Antoine Benkemoun
  • 7,314
  • 3
  • 41
  • 60
13
votes
4 answers

In Hadoop, how to show current process of -copyFromLocal

I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file. I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current…
Bang Dao
  • 233
  • 2
  • 6
11
votes
2 answers

Moving the SecondaryName Node in a Cloudera HBase Cluster

I deployed the secondary namenode on the same machine is my main namenode: This is wrong for performance and durability reasons (the secondary name node isn't a hot spare, but it does have a copy of needed metadata). I have found documentation on…
Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
9
votes
3 answers

Best choice for NTP client configuration

Lets see if someone can throw a bit of light on this subject.. I'm making a server installation in the next days. My client wants to deploy a Hortonworks HDP with 2 servers as master servers and 5 workers servers. One of the requirements for all of…
lgg
  • 141
  • 2
  • 11
9
votes
4 answers

DIY Hadoop Cluster - Heat & Dust issues?

Following are links of my DIY 6-Node Hadoop Cluster using i3 Machines, What is the best possible way to protect my design from dust & provide better heat transfer? What should I use to cover four side of my rack in order to protect it from dust?
yogesh.panchal
  • 103
  • 1
  • 6
9
votes
4 answers

Hadoop JBOD disk configuration on HP Smart Array 410/i disk controller

I'm in a evaluation phase of some hw that could be used for setting up a hadoop cluster. This hw is refurbished (hp G6 servers w/ Smart Array 410/i controller) and probably we should/must use it... we haven't it yet. I've read that 410/i controller…
nysalsa
  • 91
  • 1
  • 1
  • 2
8
votes
1 answer

Could not start ZK at requested port of 2181, while export HBASE_MANAGES_ZK=false

Problem The first aim was to run HBase standalone. Navigating to ip:60010/master-status is succesfull once HBase has been started. The second aim is to run a distinct ZooKeeper quorum. ZooKeeper has been downloaded and has been started: netstat…
030
  • 5,731
  • 12
  • 61
  • 107
8
votes
1 answer

Is it possible to Managing 20 TB data using MySQL?

I am working in a project and my job is to build a database system to manage about 60,000,000,000 data entries. The project background is I have to do real-time storage for large number of messages that read from about 30,000 RFID readers every…
lemuria
7
votes
1 answer

Set up a Windows 10 Client for a Linux KDC Realm

I set up a KDC Server and created a Realm EXAMPLE.COM. Here is my krb5.conf file: [libdefaults] renew_lifetime = 7d forwardable = true default_realm = EXAMPLE.COM ticket_lifetime = 24h dns_lookup_realm = false dns_lookup_kdc = false …
D. Müller
  • 251
  • 1
  • 2
  • 8
7
votes
2 answers

Hadoop HDFS Backup & DR Strategy

We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…
Matt Keller
  • 221
  • 4
  • 7
7
votes
1 answer

Can a hadoop job be paused or suspended?

I'm using hadoop-0.20.2. Looking at hadoop fs. I am able to kill or fail an individual task. Is there anyway to pause it so that the map slots are freed up for another task?
Dan R
  • 2,275
  • 1
  • 19
  • 27
6
votes
2 answers

Hadoop HDFS: set file block size from commandline?

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks. I've done this before within a…
BigChief
  • 398
  • 1
  • 2
  • 12
6
votes
2 answers

Hadoop disk fail, what do you do?

I would like to know about your strategies on what to do when one of the Hadoop server disk fails. Let's say, I have multiple (>15) Hadoop servers and 1 namenode, and one from 6 disks on slaves stops working, disks are connected via SAS. I don't…
wlk
  • 1,643
  • 3
  • 14
  • 19
5
votes
0 answers

Spark Error: Failed to Send RPC to Datanode

We have quite few issues with our Spark Thrift server. It is a new Ambari cluster and no Spark jobs are running now. From the log we can see an error message: Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149 Please advice why this…
shalom
  • 451
  • 12
  • 26
5
votes
1 answer

Hadoop - Name Node and Data Node on the same machine

We have 7 identical physical servers (2x8 core CPU, 128GB RAM, 8x 6TB disks) that will be used for Hadoop. All of the machines are connected to 10G switch with double 10G interfaces. Since we do not have many machines we want to use one of the…
1
2 3
17 18