Highest Voted 'hdfs' Questions - Server Fault Stack Exchange

1

vote

1 answer

What version of HDFS is compatible with HBase stable?

HBase stable is currently hbase-0.90.4, what version(s) of HDFS is it compatible with?

hadoop hdfs hbase

asked Dec 21 '11 at 02:44

Aleksandr Levchuk

2,415
3
21
41

1

vote

1 answer

Processing pre-existing log files with Flume

I have a large set of log files that I need to extract data from. Is it possible to use Flume to read these files and dump them into an HDFS (Cassandra, or another data source) which I can then query? The documentation seems to suggest it's all…

cassandra hdfs apache-flume

asked Aug 23 '11 at 18:28

duckus

11
2

1

vote

0 answers

HDFS + how to disable the "du -sk" verifcation on data node disks

We are using HDP cluster with 182 data node machines: HDP version - 2.6.4 Ambari version 2.6.1 We note the following behavior on the data nodes machines (its happens on all data-node machines and on all disks). When we perform the command as above…

redhat hdfs

asked Nov 28 '21 at 15:49

King David

433
4
17

0

votes

1 answer

AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs

We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also said to be optimized for this particular…

amazon-s3 hdfs lustre amazon-emr

asked Jan 12 '20 at 01:21

dimisjim

215
2
10

0

votes

1 answer

is it possible mix different RHEL OS version in hadoop cluster?

we are using the following HDP cluster with ambari , list of nodes and their RHEL version 3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2 312 DATA-NODES machines , installed on RHEL 7.2 5 kafka machines , installed…

redhat rhel7 hadoop hdfs apache-spark

asked Nov 20 '19 at 19:48

shalom

451
12
26

0

votes

1 answer

HDFS block deletion speed - cause, expectance, tuning?

I have a small (testing) HDFS cluster which I use as snapshot backup space for Flink. Flink creates and deletes roughly 1000 (small) files per second. The namenode seems to handle this without problems at first, but over time the Number of Blocks…

hdfs

asked Nov 07 '19 at 07:01

Caesar

111
4

0

votes

0 answers

Any benefits of ZFS over EXT4 for data stream processing on top of HDFS?

I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks. I have already…

zfs ext4 hadoop hdfs apache-spark

asked Oct 08 '19 at 15:36

HUSMEN

1
2

0

votes

1 answer

HDFS balancing , how to balanced hdfs data?

we have Hadoop version - 2.6.4 On the datanode machine we can see that hdfs data isn’t balanced On some disks we have different used size as sdb 11G and sdd 17G /dev/sdd 20G 3.0G 17G 15% /grid/sdd /dev/sdb 20G 11G 9.3G 53% /grid/sdb <-- WHY…

linux hadoop hdfs big-data

asked Mar 07 '19 at 17:23

shalom

451
12
26

0

votes

0 answers

Datanode machines disks size

is it important that ( workers ) datanode machines disks will be with the same size? for example we have ambari cluster with 3 workers machines ( datanode machines ) each datanode machine have 10 disks ( 7 disk with 50G and the 3 disks with 48G…

linux redhat hadoop hdfs

asked Dec 23 '17 at 23:19

shalom

451
12
26

0

votes

1 answer

what is effected when running - hadoop namenode -format

we have amabri cluster ( version 2.6 ) with 24 workers machines we want to run following commands only on worker23 machine ( because problem on worker23 ) , dose these commands effected on all FileSystem of all the workers? or only on worker23 ? $…

linux hadoop hdfs

asked Nov 20 '17 at 17:48

jango

59
2
2
12

0

votes

1 answer

copying files in hdfs stalls

Have a 35 node cluster with a high number of blocks in it: ≈450K blocks per data node. After configuration change (which contained rack reassignments and NameNode Xmx increase) HDFS became a problem. It's unable to perform copy operations on random…

hadoop hdfs

asked May 12 '17 at 04:44

inteloid

101
2

0

votes

1 answer

how to install hadoop2.4.1 in windows with spark 2.0.0

i want to setup a cluster using hadoop in yarn mode..i want to use spark API for map-reduce and will use spark submit to deploy my applications..i want to work on cluster..can anyone help me how to install HADOOP in cluster using windows

windows-7 cluster hadoop hdfs apache-spark

asked Mar 13 '17 at 12:26

Sadim Nadeem

1

0

votes

1 answer

Why does DFSZKFailoverController kills Namenode process in hadoop?

I try to configure hadoop high availability cluster by following this tutorial: http://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/ When I follow that article I faces with two main problems: 1. hdfs namenode…

high-availability hadoop hdfs zookeeper

asked Jul 17 '16 at 14:46

Oleksandr

703
2
10
17

0

votes

1 answer

Flume- Error Log while using FileChannel

I am using Flume flume-ng-1.5.0 ( with CDH 5.4) to collect logs from many Servers and Sink to HDFS Here is my configuration : #Define Source , Sinks, Channel collector.sources = avro collector.sinks = HadoopOut collector.channels = fileChannel #…

hadoop hdfs cdh4 apache-flume

asked May 08 '15 at 11:39

Summer Nguyen

214
3
10

0

votes

1 answer

Hadoop: How to configure failover time for a datanode

I need to re-replicate blocks on my HDFS cluster in case of a datanode is failing. Actually, this appears to already happen after a period of maybe 10min. However, I want to decrease this time, but wondering how to do so. I tried to set…

failover hadoop hdfs

asked Jan 21 '15 at 13:35

frlan

563
5
27

Questions tagged [hdfs]