0

I'm using opscenter 5.1.3 with cassandra 2.1.7 in Ubuntu 14.04 with lvm disks.

Opscenter shows all information except disk utilization and storage capacity and keeps saying that it has trouble connection to the agents (but data for other stats keeps working normally). I have reinstalled the agents with opscenter option.

In /var/log/datastax-agent/agent.log I see:

ERROR [os-metrics-4] 2015-07-06 12:56:00,468 Short os-stats collector failed java.lang.NullPointerException
at clojure.lang.Numbers.ops(Numbers.java:942)
at clojure.lang.Numbers.lt(Numbers.java:219)
at clojure.lang.Numbers.min(Numbers.java:4007)
at opsagent.rollup$add_value.invoke(rollup.clj:156)
at opsagent.rollup$add_value.invoke(rollup.clj:156)
at opsagent.rollup$process_keypair$fn__1435.invoke(rollup.clj:235)
at psagent.cache$update_cache_value_default$fn__1163$fn__1164.invoke(cache.clj:25)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.lang.Ref.alter(Ref.java:174)
at clojure.core$alter.doInvoke(core.clj:2244)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at opsagent.cache$update_cache_value_default$fn__1163.invoke(cache.clj:25)
at clojure.lang.AFn.call(AFn.java:18)
at clojure.lang.LockingTransaction.run(LockingTransaction.java:263)
at clojure.lang.LockingTransaction.runInTransaction(LockingTransaction.java:231)
at opsagent.cache$update_cache_value_default.invoke(cache.clj:24)
at opsagent.rollup$process_keypair.invoke(rollup.clj:235)
at opsagent.rollup$process_metric_map.invoke(rollup.clj:241)
at opsagent.os.collection$start_os_stat_collection$send_metric__15899.invoke(collection.clj:80)
at opsagent.os.linux_metrics$sendmap.invoke(linux_metrics.clj:12)
at opsagent.os.linux_metrics$report_mem_stats.invoke(linux_metrics.clj:134)
at opsagent.os.linux_metrics$collectors$wrap_short_collector__9128$fn__9129.invoke(linux_metrics.clj:270)
at opsagent.os.collection$start_pool$fn__15870.invoke(collection.clj:39)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
David Sedeño
  • 163
  • 1
  • 6
  • More info would be useful here: which user does the agent run as? what command is it running to fetch the stats? can you run that command manually as that user as see what happens? – dawud Jul 06 '15 at 12:55

1 Answers1

1

OpsCenter developer here. Your missing storage capacity stats are almost certainly related to this Ubuntu bug (which recently bit me, and which I did some fix validation to try to get released promptly, but it's still awaiting release as of today):

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1465322

That bug causes df to return with a non-zero exit status, which in turn causes OpsCenter agent to think the df command has failed and ignore its output.

As a workaround, rollback your kernel to something earlier tha 3.2.0-86 build 125, which introduced the problem.

I'm not sure if your OpsCenter agent connection issues are related or not, but I'd fix your kernel bug first and then test again.

Cheers, Mike Lococo

Mike Lococo
  • 121
  • 1
  • Hi Mike, thanks for the response. But I use ubuntu 14.04 with kernel 3.13.0-55-generic and the bug you point out it seems to 12.04. – David Sedeño Jul 08 '15 at 08:29
  • Hrm... good point. I misremembered the affected version list. There is a very simple manual test to determine if df is causing issues, which is to log into the cassandra/datastax node and run df manually. If you see the error message mentioned in the description of that Ubuntu ticket, you have the problem. If that isn't it, I don't have an immediate intuition about your issue. – Mike Lococo Jul 08 '15 at 14:44
  • The df command works in all nodes, but what is the exactly df parameters the agent runs ? – David Sedeño Jul 09 '15 at 16:21
  • Looks like 'df --print-type --no-sync' on linux. I doubt that's relevant, though. Without doing some deep spelunking on this, I'm out of ideas. Sorry. At first it sounded exactly like an issue I recently hit myself, but you're clearly seeing something different. – Mike Lococo Jul 09 '15 at 17:23