Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

262 questions

votes

1 answer

Management of available file descriptors within a Hadoop cluster

I'm currently in charge of a rapidly-growing Hadoop cluster for my employer, currently built upon release 0.21.0 with CentOS as the OS for each worker and master node. I've worked through most of the standard configuration issues (load-balancing, IO…

asked Dec 02 '10 at 20:03

MrGomez

votes

2 answers

Pairing MySQL and NoSQL Solutions

We have some fairly large datasets (user events and server log information - >100 GB) that's becoming fairly unwieldy for data processing. I've seen lots of activity around NoSQL/Hadoop/etc and I was wondering what SV had to say about a paired…

mysql hadoop nosql

asked Jul 26 '10 at 02:31

aronchick

votes

1 answer

Is there any way to use arrays in a puppet module (not in template)?

I want to use puppet to manage a hadoop cluster. On the machines we have several directories which must be created and set permissions. But i'm unable to add array values for defined methods. define hdfs_site( $dirs ) { file { $dirs: …

configuration puppet hadoop

asked May 10 '10 at 15:09

KARASZI István

votes

0 answers

Unable to start Apache Kylin

Good morning, I am trying to install Kylin-3.1.1 on a remote linux server, I made sure it had all software requirments and I installed already the following programms: apache-hive-3.1.2-bin , kylin-3.1.1-bin-hadoop3 , kafka_2.12-2.5.0 , hadoop-3.3.0…

hadoop hbase

asked Feb 15 '21 at 07:47

user617409

vote

1 answer

List all files in hdfs directory

Due to some error at one component, files in HDFS got accumulated and the number is huge i.e 2123516. I want to list all files and want to copy their name in one file but when I run the following command, it gives Java heap space error. hdfs dfs -ls…

hadoop hdfs

asked Jan 21 '20 at 05:59

innervoice

vote

0 answers

How to install hadoop reusable on a local network?

For a multi-node image cluster on an institute, we have sevaral laptops and machines and we want to create a hadoop cluster with hbase on the top for indexing the data/images. I have tried some VMare and docker solutions, but the most tutorials are…

installation best-practices hadoop

asked Apr 22 '19 at 14:31

madik_atma

vote

1 answer

Hadoop: Failed to start backup node, bad state: DROP_UNTIL_NEXT_ROLL

I have created a small Hadoop cluster setup with 1 NameNode and 1 DataNode to get hands-on. below is my configuration files: Core-site.xml fs.defaultFS …

hadoop hdfs

asked Mar 06 '19 at 10:12

Dipak

vote

1 answer

Zookeeper best practice - using only SSD disks

we have HDP clusters version 2.6.0 / 2.6.1 ( hortonworks ) , and maybe in the future the version 3.0 I searched a lot in HORTONWORKS documentation , but not found the details about using SSD disks for zookeeper but on confluent , we can see that…

redhat hard-drive ssd hadoop zookeeper

asked Jan 09 '19 at 18:19

shalom

vote

2 answers

can we mix MTU values in cluster

we have hadoop cluster ( all machines are linux redhat machines version 7.x ) on the VM machines we set MTU=8900 and all other machines we set MTU=9000 we set on VM MTU=8900 because we saw some network problems with MTU=9000 My question: dose mix…

linux networking hadoop mtu jumboframes

asked Jun 25 '18 at 12:11

shalom

vote

0 answers

Sample output of Rumen or Input to Gridmix

I want to see JobHistory logs, which can be fed as input to the Rumen. More specifically, I am interested in knowing input format for the Gridmix. I tried following two things for it: 1) I found this files: . What is this file exactly? Is this…

hadoop hdfs mapreduce

asked Apr 06 '18 at 20:24

PHcoDer

vote

1 answer

Using posix attributes instead of normal LDAP?

Due to the way a software we use interacts with Unix, when I am setting up a certain application to interact with LDAP I need to use Posix attributes instead of normal LDAP attributes. So far all I have found is that for…

ldap hadoop posix

asked Mar 23 '18 at 13:33

Josh

vote

0 answers

spark.dynamicAllocation + setting the spark parameters according to ambari cluster

we want to find the values for the following spark parameters according to inputs as memory on datanode machine , CPU CORE on data node machine , numbers of datanode machine etc ,, spark.dynamicAllocation.initialExecutors =…

hadoop big-data apache-spark

asked Feb 08 '18 at 20:51

shalom

vote

0 answers

ambari cluster + when need to set Block replication to 1

We get the following in Spark logs: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\ The current…

hadoop hdfs apache-spark

asked Jan 31 '18 at 09:20

shalom

vote

1 answer

Redirecting Ambari-Server backup file creation to a different location

I am taking backup of my Ambari server using the command ambari-server backup This creates the backup file in the location /var/lib/ambari-server/ I want the backup to go to a different location, and I am not finding the way to do it. The help…

backup hadoop

asked Jan 13 '18 at 13:27

Gautam Somani

vote

0 answers

dose ambari cluster needs ssh access between ambari-server machine to all other host

We installed ambari cluster with 3 masters machines While ambari server installed on master02 linux machine ambari cluster also include 25 DataNodes machines and 5 kafka's machines dose ambari-server needs ssh access to all other machines in the…

linux ssh hadoop

asked Dec 21 '17 at 16:14

shalom

Prev 1 2 3

…

17 18 Next