Highest Voted 'big-data' Questions - Server Fault Stack Exchange

0

votes

1 answer

how to determine yarn.scheduler.maximum-allocation-vcores value in ambari cluster

we have ambari cluster ( version 2.6 ) with 3 workers machine , and each worker machine have 16 CPU CORE ( see pic down ) , while each machine have 32G memory according to: yarn.nodemanager.resource.cpu-vcores: Set to the appropriate number in…

linux hadoop big-data

asked Feb 11 '18 at 21:23

shalom

451
12
26

0

votes

1 answer

why ambari agent insist to create another repository file

we are installing the new hadoop version - 2.6.3.0 on ambari - 2.6.0 from ambari agent log we see the follwing: Writing File['/etc/yum.repos.d/ambari-hdp-51.repo'] because contents don't match why ambari create the file - ambari-hdp-51.repo , ? is…

linux redhat hadoop big-data

asked Jan 14 '18 at 00:46

shalom

451
12
26

0

votes

1 answer

How to reconfigure Ambari services values with blueprint.json file

we have many Ambari LAB clusters - Apache Ambari Version 2.5.0.3 , while ambari agent installed on Linux redhat machines my target is to find a way to update the values of services , on all the Ambari clusters , by automate the process what we do…

linux hadoop json big-data

asked Aug 08 '17 at 14:04

shalom

451
12
26

0

votes

2 answers

At what point do you consider moving from the cloud to colocation?

I'm currently operating at a cost of about $25k - $40k per month on AWS. I have about 30TB of data indexed in Elasticsearch, running a 4 node production cluster, and another 4 node staging cluster. Each system in the cluster is an m4.2xlarge, with a…

amazon-web-services colocation big-data

asked Oct 28 '16 at 13:32

Franz Kafka

22
2
12

0

votes

0 answers

Quickest way to get large number of small files from remote FTP server

In Fintech, the following scenario seems fairly common: You've paid for access to a huge collection of data, but it is made available to you as thousands of little files, each with a footprint in the neighborhood of 300 kB, but altogether amounting…

ftp bandwidth compression big-data

asked Sep 11 '16 at 23:02

StudentsTea

165
9

0

votes

1 answer

MySQL Cluster ndb_restore fails without error

I have been working to migrate our current single instance database to a new clustered database running MySQL cluster. It is a large database (several billion records) and, while it seems to be working reasonably well, I am having difficulty…

mysql database-backup mysql-cluster big-data

asked Apr 07 '15 at 09:21

egmackenzie

101
3

0

votes

0 answers

Hadoop - On the Wire Performance Monitoring?

I have been tasked with implementing an 'on the wire' monitoring solution for a large Hadoop installation. The source of data will be a combination of taps and SPANs throughout the environment. My team's usual charter is one of packet analysis and…

performance-monitoring network-monitoring hadoop packet-analyzer big-data

asked Mar 15 '14 at 01:30

user212869

1

0

votes

1 answer

mhddfs not support single file to split multiple hard drives.. if file size exceed limit of single storage device

I'm using mhddfs to combine multiple drives that are mounted over network using NFS. e.g. There are three machines Server Name Dir Space Server 1 /home 10 GB Space Server 2 /home 10 GB Space Server 3 /home 10 GB Space Using NFS i…

linux ubuntu hard-drive virtual-directory big-data

asked Nov 03 '13 at 16:51

Imran

101
3

0

votes

0 answers

Suggestion for Non Analytical Distributed Processing Frameworks

Can someone please suggest a tool, framework or a service to perform the below task faster. Input : The input to the service is a CSV file which consists of an identifier and several image columns with over a million rows. Objective: To check if any…

google-cloud-platform distributed-computing big-data apache-spark

asked May 31 '22 at 04:15

Kishan M Mohan

1
1

0

votes

0 answers

HDFS + results from hdfs fsck / are diff from hdfs dfsadmin -report

we have hadoop cluster ( Ambari platform with HDP version - 2.6.4 ) and we performed verification step in order to understand if we have under replica blocks the first verification was with: su hdfs hdfs fsck / - --> its gives the results: …

linux hadoop hdfs big-data

asked Jan 11 '22 at 15:22

King David

433
4
17

0

votes

1 answer

Spark-Cassandra-Connector Issue Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/rdd/reader/RowReaderFactory

What is going wrong with the Spark Cassandra Connector could you please help to solve this? Scala File: import com.datastax.spark.connector._ import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} object…

cluster cassandra datastax-enterprise big-data apache-spark

asked Dec 28 '20 at 18:22

Subhrangshu Adhikary

109
1

0

votes

1 answer

How do megasites like Youtube perform backups?

How do megasites like Youtube perform backups? According to https://www.quora.com/Where-does-YouTube-store-so-many-videos, 2014 they stored 76 PB every year, a number that most certainly has increased a lot since then. Is it even possible to backup…

backup big-data

asked Nov 14 '20 at 02:44

d-b

125
3

-2

votes

1 answer

0.5 TB SQL Server Database, is it possible to store in a standalone server?

We are facing the task of storing of 0.5 TB of data in a SQL Server 2008 Server. Is it possible to do it in a standalone server? Later we also want to query for generating statistics of the data (a lot of group bys, inner joins, etc) but the…

sql-server-2008 storage sql query big-data

asked Aug 05 '14 at 10:49

X.Otano

103
5

Questions tagged [big-data]