How to monitor memory IO on Linux

0

There are a lot of tools that can be used to monitor disk IO like dstat.

Is there any tool can be used to monitor DRAM IO? Like how many MB data are read from DRAM per second.

Eugene

Posted 2019-12-02T08:38:01.217

Reputation: 195

What CPU (exact model) do you have? – Daniel B – 2019-12-02T08:43:48.740

@DanielB I'm using Intel(R) Xeon(R) Gold 6240 – Eugene – 2019-12-02T09:02:30.907

Answers

3

Since you have an Intel CPU, you should be able to use Processor Counter Monitor, a now open-sourced Intel software. Compiling it on Linux only needs g++ and make, if I saw correctly.

Before running it, you need to make sure the msr module is loaded (sudo modprobe msr) or built-in.

With your CPU, you should be able to use the pcm-memory.x utility. I can’t use it, so I don’t know what the output looks like.

Even if your CPU is not supported for pcm-memory.x, you can still get overall memory bandwidth statistics from pcm.x. It looks like this:

$ sudo ./pcm.x -i=1 -nc

 Processor Counter Monitor  ($Format:%ci ID=%h$)


IBRS and IBPB supported  : no
STIBP supported          : no
Spec arch caps supported : no
Number of physical cores: 4
Number of logical cores: 8
Number of online logical cores: 8
Threads (logical cores) per physical core: 2
Num sockets: 1
Physical cores per socket: 4
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3600000000 Hz
Package thermal spec power: 65 Watt; Package minimum power: 0 Watt; Package maximum power: 0 Watt;
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf

Detected Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz "Intel(r) microarchitecture codename Kabylake" stepping 9 microcode level 0x5e

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 (read) cache misses
 L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*)
 L3HIT : L3 (read) cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3MPI : number of L3 (read) cache misses per instruction
 L2MPI : number of L2 (read) cache misses per instruction
 READ  : bytes read from main memory controller (in GBytes)
 WRITE : bytes written to main memory controller (in GBytes)
 IO    : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
 energy: Energy in Joules


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI |  TEMP

---------------------------------------------------------------------------------------------------------------
 SKT    0     0.02   1.05   0.02    0.39     402 K   1770 K    0.76    0.53    0.00    0.00     67
---------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.02   1.05   0.02    0.39     402 K   1770 K    0.76    0.53    0.00    0.00     N/A

 Instructions retired:  487 M ; Active cycles:  462 M ; Time (TSC): 3602 Mticks ; C0 (active,non-halted) core residency: 4.12 %

 C1 core residency: 9.26 %; C3 core residency: 0.59 %; C6 core residency: 2.14 %; C7 core residency: 83.89 %;
 C0 package residency: 36.94 %; C2 package residency: 63.06 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; C8 package residency: 0.00 %; C9 package residency: 0.00 %; C10 package residency: 0.00 %;
                             ┌───────────────────────────────────────────────────────────────────────────────┐
 Core    C-state distribution│0001111111667777777777777777777777777777777777777777777777777777777777777777777│
                             └───────────────────────────────────────────────────────────────────────────────┘
                             ┌────────────────────────────────────────────────────────────────────────────────┐
 Package C-state distribution│00000000000000000000000000000022222222222222222222222222222222222222222222222222│
                             └────────────────────────────────────────────────────────────────────────────────┘

 PHYSICAL CORE IPC                 : 2.11 => corresponds to 52.65 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.03 => corresponds to 0.85 % core utilization over time interval
 SMI count: 0
---------------------------------------------------------------------------------------------------------------
MEM (GB)->|  READ |  WRITE |   IO   | CPU energy |
---------------------------------------------------------------------------------------------------------------
 SKT   0     0.24     0.03     0.00       1.88
---------------------------------------------------------------------------------------------------------------
Cleaning up
 Zeroed uncore PMU registers

Unless you specify -i=1, the output will repeat periodically. If you leave out -nc, you will get per-core execution statistics instead of just totals.

At the bottom, you can see the memory statistics.

Daniel B

Posted 2019-12-02T08:38:01.217

Reputation: 40 502

Thanks for this elaborate explanation, you remind me of a cli tool to monitor system performance like memory bandwidth that's provided by Intel, https://software.intel.com/en-us/download/emon-user-guide

– Eugene – 2019-12-02T14:35:47.717