37

I was using df -h to print out human readable disk usage. I would like to figure out what is taking up so much space. For instance, is there a way to pipe this command so that it prints out files that are larger than 1GB in size? Other ideas?

Thanks

syn4k
  • 569
  • 1
  • 6
  • 12

7 Answers7

41

You may want to try the ncdu utility found at: http://dev.yorhel.nl/ncdu

It will quicky sum the contents of a filesystem or directory tree and print the results, sorted by size. It's a really nice way to drill-down interactively and see what's consuming drive space.

Additionally, it can be faster than some du combinations.

The typical output looks like:

ncdu 1.7 ~ Use the arrow keys to navigate, press ? for help                                                         
--- /data ----------------------------------------------------------------------------------------------------------
  163.3GiB [##########] /docimages                                                                                  
   84.4GiB [#####     ] /data
   82.0GiB [#####     ] /sldata
   56.2GiB [###       ] /prt
   40.1GiB [##        ] /slisam
   30.8GiB [#         ] /isam
   18.3GiB [#         ] /mail
   10.2GiB [          ] /export
    3.9GiB [          ] /edi   
    1.7GiB [          ] /io     
    1.2GiB [          ] /dmt
  896.7MiB [          ] /src
  821.5MiB [          ] /upload
  691.1MiB [          ] /client
  686.8MiB [          ] /cocoon
  542.5MiB [          ] /hist
  358.1MiB [          ] /savsrc
  228.9MiB [          ] /help
  108.1MiB [          ] /savbin
  101.2MiB [          ] /dm
   40.7MiB [          ] /download
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • +1 - I've been meaning to figure this out myself for a while. BTW, `ncdu` is also available via yum (CentOS, RHEL, Fedora) using the epel repo: http://fedoraproject.org/wiki/EPEL. `sudo ncdu -q /` works nicely over ssh. I really like the way I can drill down into folders using the arrow keys. – dunxd Aug 01 '12 at 10:46
  • 1
    +1 this tool is just the best ! – TheSquad Jul 09 '13 at 08:50
  • Oh ncdu where have you been all my life. – ekerner Aug 20 '17 at 19:52
28

I use this one a lot.

du -kscx *

It can take a while to run, but it'll tell you where the disk space is being used.

toppledwagon
  • 4,215
  • 24
  • 15
  • 3
    I use a variant on this, I use the h flag instead of k, which will give you a similarly "human readable" format as df will. – Mitch Kent Oct 28 '14 at 10:13
  • 2
    The advantage is that with the `-k` switch 20MB in kilobytes is a much smaller number than 2GB in kilobytes... easier to `| sort -n` – HBruijn Dec 17 '15 at 07:48
9

You can use the find command. Example:

find /home/ -size +1073700000c -print
ChriSxStyles
  • 393
  • 1
  • 3
  • 11
  • 2
    Newer versions of GNU find also support M (Megabytes) as a modifier. ie `find /path -size +1024M` – Jodie C Aug 16 '11 at 00:19
7

I myself use

du -c --max-depth=4 /dir | sort -n

this returns amount of space used by a directory and its subdirectories up to 4 deep, sort -n will put the largest last.

New versions of sort can handle "human-readable" sizes, so one can use much more readable

du -hc --max-depth=4 /dir | sort -h
Hubert Kario
  • 6,351
  • 6
  • 33
  • 65
2

With human readable sizes:

du -hscx *
Jekis
  • 181
  • 10
2

Recursively search for big files in one directory

As I haved to determine what is taking up so much space? a lot of time, I wrote this little script in order to search for big occupation on a specific device (without argument, this will browse current directory, searching for >256Mb directory entries):

#!/bin/bash

humansize() {
    local _c=$1 _i=0 _a=(b K M G T P)
    while [ ${#_c} -gt 3 ] ;do
    ((_i++))
    _c=$((_c>>10))
    done
    _c=$(( ( $1*1000 ) >> ( 10*_i ) ))
    printf ${2+-v} $2 "%.2f%s" ${_c:0:${#_c}-3}.${_c:${#_c}-3} ${_a[_i]}
}

export device=$(stat -c %d "${1:-.}")
export minsize=${2:-$((256*1024**2))}

rdu() {
    local _dir="$1" _spc="$2" _crt _siz _str
    while read _crt;do
    if [ $(stat -c %d "$_crt") -eq $device ];then
            _siz=($(du -xbs "$_crt"))
            if [ $_siz -gt $minsize ];then
        humansize $_siz _str
        printf "%s%12s%14s_%s\n" "$_spc" "$_str" \\ "${_crt##*/}"
        [ $d "$_crt" ] && rdu "$_crt" "  $_spc"
        fi
    fi
    done < <(
    find "$_dir" -mindepth 1 -maxdepth 1 -print
    )
}

rdu "${1:-.}"

Sample of use:

./rdu.sh /usr 100000000
       1.53G             \_lib
       143.52M             \_i386-linux-gnu
       348.16M             \_x86_64-linux-gnu
       107.80M             \_jvm
         100.20M             \_java-6-openjdk-amd64
           100.17M             \_jre
              99.65M             \_lib
       306.63M             \_libreoffice
         271.75M             \_program
       107.98M             \_chromium
      99.57M             \_lib32
     452.47M             \_bin
       2.50G             \_share
       139.63M             \_texlive
         129.74M             \_texmf-dist
       478.36M             \_locale
       124.49M             \_icons
       878.09M             \_doc
         364.02M             \_texlive-latex-extra-doc
           359.36M             \_latex

Little check:

du -bs /usr/share/texlive/texmf-dist
136045774   /usr/share/texlive/texmf-dist
echo 136045774/1024^2 | bc -l
129.74336051940917968750

Nota: using -b instead of -k tell du to sumarize only used bytes, but not effective reserved space (by block of 512 bytes). For working about blocs size, you have to change line du -xbs ... by du -xks, suppress b in _a=(K M G T P) and divide argument size by 1024.

... There is a modified version (I will keep for myself) using blocks sizes by default, but accepting -b as first argument for bytes calculation:

Edit: New version

After some work about, there is a newer version a lot quicker and with output sorted in descending size order:

#!/bin/bash

if [ "$1" == "-b" ] ;then
    shift
    export units=(b K M G T P)
    export duargs="-xbs"
    export minsize=${2:-$((256*1024**2))}
else
    export units=(K M G T P)
    export duargs="-xks"
    export minsize=${2:-$((256*1024))}
fi

humansize() {
    local _c=$1 _i=0
    while [ ${#_c} -gt 3 ] ;do
    ((_i++))
    _c=$((_c>>10))
    done
    _c=$(( ( $1*1000 ) >> ( 10*_i ) ))
    printf ${2+-v} $2 "%.2f%s" ${_c:0:${#_c}-3}.${_c:${#_c}-3} ${units[_i]}
}

export device=$(stat -c %d "${1:-.}")

rdu() {
    local _dir="$1" _spc="$2" _crt _siz _str
    while read _siz _crt;do
        if [ $_siz -gt $minsize ];then
        humansize $_siz _str
        printf "%s%12s%14s_%s\n" "$_spc" "$_str" \\ "${_crt##*/}"
        [ -d "$_crt" ] &&
        [ $(stat -c %d "$_crt") -eq $device ] &&
            rdu "$_crt" "  $_spc"
    fi
    done < <(
    find "$_dir" -mindepth 1 -maxdepth 1 -xdev \
        \( -type f -o -type d \) -printf "%D;%p\n" |
        sed -ne "s/^${device};//p" |
        tr \\n \\0 |
        xargs -0 du $duargs |
        sort -nr
    )
}

rdu "${1:-.}"
1

To display the biggest top-20 directories in the current folder, use the following one-liner:

du -ah . | sort -rh | head -20

or:

du -a . | sort -rn | head -20

For the top-20 biggest files in the current directory (recursively):

ls -1Rs | sed -e "s/^ *//" | grep "^[0-9]" | sort -nr | head -n20

or with human readable sizes:

ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20

The second command to work on OSX/BSD properly (as sort doesn't have -h), you need to install sort from coreutils. Then add the bin folder to your PATH.

So these aliases are useful to have in your rc files (every time when you need it):

alias big='du -ah . | sort -rh | head -20'
alias big-files='ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20'
kenorb
  • 5,943
  • 1
  • 44
  • 53