Questions tagged [big-data]

28 questions
0
votes
1 answer

how to determine yarn.scheduler.maximum-allocation-vcores value in ambari cluster

we have ambari cluster ( version 2.6 ) with 3 workers machine , and each worker machine have 16 CPU CORE ( see pic down ) , while each machine have 32G memory according to: yarn.nodemanager.resource.cpu-vcores: Set to the appropriate number in…
shalom
  • 451
  • 12
  • 26
0
votes
1 answer

why ambari agent insist to create another repository file

we are installing the new hadoop version - 2.6.3.0 on ambari - 2.6.0 from ambari agent log we see the follwing: Writing File['/etc/yum.repos.d/ambari-hdp-51.repo'] because contents don't match why ambari create the file - ambari-hdp-51.repo , ? is…
shalom
  • 451
  • 12
  • 26
0
votes
1 answer

How to reconfigure Ambari services values with blueprint.json file

we have many Ambari LAB clusters - Apache Ambari Version 2.5.0.3 , while ambari agent installed on Linux redhat machines my target is to find a way to update the values of services , on all the Ambari clusters , by automate the process what we do…
shalom
  • 451
  • 12
  • 26
0
votes
2 answers

At what point do you consider moving from the cloud to colocation?

I'm currently operating at a cost of about $25k - $40k per month on AWS. I have about 30TB of data indexed in Elasticsearch, running a 4 node production cluster, and another 4 node staging cluster. Each system in the cluster is an m4.2xlarge, with a…
Franz Kafka
  • 22
  • 2
  • 12
0
votes
0 answers

Quickest way to get large number of small files from remote FTP server

In Fintech, the following scenario seems fairly common: You've paid for access to a huge collection of data, but it is made available to you as thousands of little files, each with a footprint in the neighborhood of 300 kB, but altogether amounting…
StudentsTea
  • 165
  • 9
0
votes
1 answer

MySQL Cluster ndb_restore fails without error

I have been working to migrate our current single instance database to a new clustered database running MySQL cluster. It is a large database (several billion records) and, while it seems to be working reasonably well, I am having difficulty…
egmackenzie
  • 101
  • 3
0
votes
0 answers

Hadoop - On the Wire Performance Monitoring?

I have been tasked with implementing an 'on the wire' monitoring solution for a large Hadoop installation. The source of data will be a combination of taps and SPANs throughout the environment. My team's usual charter is one of packet analysis and…
0
votes
1 answer

mhddfs not support single file to split multiple hard drives.. if file size exceed limit of single storage device

I'm using mhddfs to combine multiple drives that are mounted over network using NFS. e.g. There are three machines Server Name Dir Space Server 1 /home 10 GB Space Server 2 /home 10 GB Space Server 3 /home 10 GB Space Using NFS i…
Imran
  • 101
  • 3
0
votes
0 answers

Suggestion for Non Analytical Distributed Processing Frameworks

Can someone please suggest a tool, framework or a service to perform the below task faster. Input : The input to the service is a CSV file which consists of an identifier and several image columns with over a million rows. Objective: To check if any…
0
votes
0 answers

HDFS + results from hdfs fsck / are diff from hdfs dfsadmin -report

we have hadoop cluster ( Ambari platform with HDP version - 2.6.4 ) and we performed verification step in order to understand if we have under replica blocks the first verification was with: su hdfs hdfs fsck / - --> its gives the results: …
King David
  • 433
  • 4
  • 17
0
votes
1 answer

Spark-Cassandra-Connector Issue Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/rdd/reader/RowReaderFactory

What is going wrong with the Spark Cassandra Connector could you please help to solve this? Scala File: import com.datastax.spark.connector._ import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} object…
0
votes
1 answer

How do megasites like Youtube perform backups?

How do megasites like Youtube perform backups? According to https://www.quora.com/Where-does-YouTube-store-so-many-videos, 2014 they stored 76 PB every year, a number that most certainly has increased a lot since then. Is it even possible to backup…
d-b
  • 125
  • 3
-2
votes
1 answer

0.5 TB SQL Server Database, is it possible to store in a standalone server?

We are facing the task of storing of 0.5 TB of data in a SQL Server 2008 Server. Is it possible to do it in a standalone server? Later we also want to query for generating statistics of the data (a lot of group bys, inner joins, etc) but the…
X.Otano
  • 103
  • 5
1
2