Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

262 questions
2
votes
1 answer

Management of available file descriptors within a Hadoop cluster

I'm currently in charge of a rapidly-growing Hadoop cluster for my employer, currently built upon release 0.21.0 with CentOS as the OS for each worker and master node. I've worked through most of the standard configuration issues (load-balancing, IO…
MrGomez
  • 163
  • 6
2
votes
2 answers

Pairing MySQL and NoSQL Solutions

We have some fairly large datasets (user events and server log information - >100 GB) that's becoming fairly unwieldy for data processing. I've seen lots of activity around NoSQL/Hadoop/etc and I was wondering what SV had to say about a paired…
aronchick
  • 685
  • 3
  • 7
  • 14
2
votes
1 answer

Is there any way to use arrays in a puppet module (not in template)?

I want to use puppet to manage a hadoop cluster. On the machines we have several directories which must be created and set permissions. But i'm unable to add array values for defined methods. define hdfs_site( $dirs ) { file { $dirs: …
KARASZI István
  • 207
  • 3
  • 13
2
votes
0 answers

Unable to start Apache Kylin

Good morning, I am trying to install Kylin-3.1.1 on a remote linux server, I made sure it had all software requirments and I installed already the following programms: apache-hive-3.1.2-bin , kylin-3.1.1-bin-hadoop3 , kafka_2.12-2.5.0 , hadoop-3.3.0…
user617409
  • 21
  • 1
1
vote
1 answer

List all files in hdfs directory

Due to some error at one component, files in HDFS got accumulated and the number is huge i.e 2123516. I want to list all files and want to copy their name in one file but when I run the following command, it gives Java heap space error. hdfs dfs -ls…
innervoice
  • 21
  • 5
1
vote
0 answers

How to install hadoop reusable on a local network?

For a multi-node image cluster on an institute, we have sevaral laptops and machines and we want to create a hadoop cluster with hbase on the top for indexing the data/images. I have tried some VMare and docker solutions, but the most tutorials are…
madik_atma
  • 111
  • 2
1
vote
1 answer

Hadoop: Failed to start backup node, bad state: DROP_UNTIL_NEXT_ROLL

I have created a small Hadoop cluster setup with 1 NameNode and 1 DataNode to get hands-on. below is my configuration files: Core-site.xml fs.defaultFS
Dipak
  • 111
  • 2
1
vote
1 answer

Zookeeper best practice - using only SSD disks

we have HDP clusters version 2.6.0 / 2.6.1 ( hortonworks ) , and maybe in the future the version 3.0 I searched a lot in HORTONWORKS documentation , but not found the details about using SSD disks for zookeeper but on confluent , we can see that…
shalom
  • 451
  • 12
  • 26
1
vote
2 answers

can we mix MTU values in cluster

we have hadoop cluster ( all machines are linux redhat machines version 7.x ) on the VM machines we set MTU=8900 and all other machines we set MTU=9000 we set on VM MTU=8900 because we saw some network problems with MTU=9000 My question: dose mix…
shalom
  • 451
  • 12
  • 26
1
vote
0 answers

Sample output of Rumen or Input to Gridmix

I want to see JobHistory logs, which can be fed as input to the Rumen. More specifically, I am interested in knowing input format for the Gridmix. I tried following two things for it: 1) I found this files: . What is this file exactly? Is this…
PHcoDer
  • 111
  • 2
1
vote
1 answer

Using posix attributes instead of normal LDAP?

Due to the way a software we use interacts with Unix, when I am setting up a certain application to interact with LDAP I need to use Posix attributes instead of normal LDAP attributes. So far all I have found is that for…
Josh
  • 111
  • 4
1
vote
0 answers

spark.dynamicAllocation + setting the spark parameters according to ambari cluster

we want to find the values for the following spark parameters according to inputs as memory on datanode machine , CPU CORE on data node machine , numbers of datanode machine etc ,, spark.dynamicAllocation.initialExecutors =…
shalom
  • 451
  • 12
  • 26
1
vote
0 answers

ambari cluster + when need to set Block replication to 1

We get the following in Spark logs: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\ The current…
shalom
  • 451
  • 12
  • 26
1
vote
1 answer

Redirecting Ambari-Server backup file creation to a different location

I am taking backup of my Ambari server using the command ambari-server backup This creates the backup file in the location /var/lib/ambari-server/ I want the backup to go to a different location, and I am not finding the way to do it. The help…
Gautam Somani
  • 296
  • 3
  • 14
1
vote
0 answers

dose ambari cluster needs ssh access between ambari-server machine to all other host

We installed ambari cluster with 3 masters machines While ambari server installed on master02 linux machine ambari cluster also include 25 DataNodes machines and 5 kafka's machines dose ambari-server needs ssh access to all other machines in the…
shalom
  • 451
  • 12
  • 26