Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

262 questions

votes

3 answers

What is Hadoop and what is it used for?

I have been enjoying reading ServerFault for a while and I have come across quite a few topics on Hadoop. I have had a little trouble finding out what it does from a global point of view. So my question is quite simple : What is Hadoop ? What does…

hadoop

asked Jun 18 '09 at 06:34

Antoine Benkemoun

7,314
3
41
60

votes

4 answers

In Hadoop, how to show current process of -copyFromLocal

I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file. I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current…

hadoop hdfs

asked Apr 11 '14 at 04:15

Bang Dao

votes

2 answers

Moving the SecondaryName Node in a Cloudera HBase Cluster

I deployed the secondary namenode on the same machine is my main namenode: This is wrong for performance and durability reasons (the secondary name node isn't a hot spare, but it does have a copy of needed metadata). I have found documentation on…

hadoop hbase cloudera

asked Dec 10 '14 at 02:30

Kyle Brandt

82,107
71
302
444

votes

3 answers

Best choice for NTP client configuration

Lets see if someone can throw a bit of light on this subject.. I'm making a server installation in the next days. My client wants to deploy a Hortonworks HDP with 2 servers as master servers and 5 workers servers. One of the requirements for all of…

centos ntp hadoop

asked Sep 14 '15 at 08:12

lgg

votes

4 answers

DIY Hadoop Cluster - Heat & Dust issues?

Following are links of my DIY 6-Node Hadoop Cluster using i3 Machines, What is the best possible way to protect my design from dust & provide better heat transfer? What should I use to cover four side of my rack in order to protect it from dust?

hardware cluster hadoop physical-environment

asked Jan 07 '13 at 13:20

yogesh.panchal

votes

4 answers

Hadoop JBOD disk configuration on HP Smart Array 410/i disk controller

I'm in a evaluation phase of some hw that could be used for setting up a hadoop cluster. This hw is refurbished (hp G6 servers w/ Smart Array 410/i controller) and probably we should/must use it... we haven't it yet. I've read that 410/i controller…

hp hp-proliant hadoop hp-smart-array storage

asked May 09 '11 at 14:12

nysalsa

votes

1 answer

Could not start ZK at requested port of 2181, while export HBASE_MANAGES_ZK=false

Problem The first aim was to run HBase standalone. Navigating to ip:60010/master-status is succesfull once HBase has been started. The second aim is to run a distinct ZooKeeper quorum. ZooKeeper has been downloaded and has been started: netstat…

linux hadoop hbase zookeeper cloudera

asked May 30 '14 at 09:52

030

5,731
12
61
107

votes

1 answer

Is it possible to Managing 20 TB data using MySQL?

I am working in a project and my job is to build a database system to manage about 60,000,000,000 data entries. The project background is I have to do real-time storage for large number of messages that read from about 30,000 RFID readers every…

mysql database hadoop hbase

asked Aug 25 '11 at 08:15

lemuria

votes

1 answer

Set up a Windows 10 Client for a Linux KDC Realm

I set up a KDC Server and created a Realm EXAMPLE.COM. Here is my krb5.conf file: [libdefaults] renew_lifetime = 7d forwardable = true default_realm = EXAMPLE.COM ticket_lifetime = 24h dns_lookup_realm = false dns_lookup_kdc = false …

linux windows kerberos hadoop

asked Dec 01 '16 at 14:31

D. Müller

votes

2 answers

Hadoop HDFS Backup & DR Strategy

We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…

backup disaster-recovery hadoop hdfs

asked Aug 13 '13 at 23:32

Matt Keller

votes

1 answer

Can a hadoop job be paused or suspended?

I'm using hadoop-0.20.2. Looking at hadoop fs. I am able to kill or fail an individual task. Is there anyway to pause it so that the map slots are freed up for another task?

hadoop

asked Dec 01 '10 at 16:57

Dan R

2,275
1
19
27

votes

2 answers

Hadoop HDFS: set file block size from commandline?

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks. I've done this before within a…

hadoop block hdfs

asked Aug 11 '11 at 15:22

BigChief

votes

2 answers

Hadoop disk fail, what do you do?

I would like to know about your strategies on what to do when one of the Hadoop server disk fails. Let's say, I have multiple (>15) Hadoop servers and 1 namenode, and one from 6 disks on slaves stops working, disks are connected via SAS. I don't…

hardware hard-drive failover hadoop

asked Jun 25 '10 at 20:23

wlk

1,643
3
14
19

votes

0 answers

Spark Error: Failed to Send RPC to Datanode

We have quite few issues with our Spark Thrift server. It is a new Ambari cluster and no Spark jobs are running now. From the log we can see an error message: Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149 Please advice why this…

hadoop apache-spark

asked Feb 07 '18 at 17:35

shalom

votes

1 answer

Hadoop - Name Node and Data Node on the same machine

We have 7 identical physical servers (2x8 core CPU, 128GB RAM, 8x 6TB disks) that will be used for Hadoop. All of the machines are connected to 10G switch with double 10G interfaces. Since we do not have many machines we want to use one of the…

hadoop

asked Mar 15 '16 at 08:32

Merve Aydınlılar

2 3

…

17 18 Next