Linux utility for finding the largest files/directories

134

111

I'm looking for a program to show me which files/directories occupy the most space, something like:

74% music
 \- 60% music1
 \- 14% music2
12% code
13% other

I know that it's possible in KDE3, but I'd rather not do that - KDE4 or command line are preferred.

Robert Munteanu

Posted 2009-07-21T06:54:48.307

Reputation: 4 240

Question was closed 2015-04-14T04:18:17.880

for mac users, I just want to recommend this free software called Disk Inventory X. download it here http://www.derlien.com/ it's simple to use for mac osx

– Nimitack – 2018-01-06T23:05:59.863

Answers

131

To find the largest 10 files (linux/bash):

find . -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}

To find the largest 10 directories:

find . -type d -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}

Only difference is -type {d:f}.

Handles files with spaces in the names, and produces human readable file sizes in the output. Largest file listed last. The argument to tail is the number of results you see (here the 10 largest).

There are two techniques used to handle spaces in file names. The find -print0 | xargs -0 uses null delimiters instead of spaces, and the second xargs -I{} uses newlines instead of spaces to terminate input items.

example:

$ find . -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}

  76M    ./snapshots/projects/weekly.1/onthisday/onthisday.tar.gz
  76M    ./snapshots/projects/weekly.2/onthisday/onthisday.tar.gz
  76M    ./snapshots/projects/weekly.3/onthisday/onthisday.tar.gz
  76M    ./tmp/projects/onthisday/onthisday.tar.gz
  114M   ./Dropbox/snapshots/weekly.tgz
  114M   ./Dropbox/snapshots/daily.tgz
  114M   ./Dropbox/snapshots/monthly.tgz
  117M   ./Calibre Library/Robert Martin/cc.mobi
  159M   ./.local/share/Trash/files/funky chicken.mpg
  346M   ./Downloads/The Walking Dead S02E02 ... (dutch subs nl).avi

Sean

Posted 2009-07-21T06:54:48.307

Reputation: 1 546

Largest file listed first : find . -type f -print0 | xargs -0 du | sort -nr | head -10 | cut -f2 | xargs -I{} du -sh {} (i.e. use sort -nr | head -10 instead of sort -n | tail -10) – Sandra Rossi – 2019-03-13T20:47:51.713

200

I always use ncdu. It's interactive and very fast.

Daenyth

Posted 2009-07-21T06:54:48.307

Reputation: 5 742

FYI: ncdu stands for NCurse Disk Usage – hello_harry – 2017-05-31T13:34:29.033

Yeah and it's small! – Luke Stanley – 2011-07-18T00:40:11.730

5I love ncdu. It's one of my favorite google finds. – Rob – 2012-07-24T18:33:02.253

5Wow. how did I not know this existed. Thanks! – pixel – 2012-09-11T21:57:03.823

22+1000 for ncdu --- it's like htop for disk space. Super useful! – Noah Sussman – 2013-06-26T01:27:22.450

since there do not seem to be flags nor a .config option, here's the key sequence you'll probably type every time you run it if you like seeing files and folders mingled and relative percentage stats: [t] [g] [g]. – rymo – 2013-11-10T21:36:21.010

You could make it more human readable with "du -h | sort -h". – rudimeier – 2014-01-21T19:55:45.890

37

For a quick view:

du | sort -n

lists all directories with the largest last.

du --max-depth=1 * | sort -n

or, again, avoiding the redundant * :

du --max-depth=1 | sort -n

lists all the directories in the current directory with the largest last.

(-n parameter to sort is required so that the first field is sorted as a number rather than as text but this precludes using -h parameter to du as we need a significant number for the sort)

Other parameters to du are available if you want to follow symbolic links (default is not to follow symbolic links) or just show size of directory contents excluding subdirectories, for example. du can even include in the list the date and time when any file in the directory was last changed.

mas

Posted 2009-07-21T06:54:48.307

Reputation: 2 431

du -h --max-depth=1 2>/dev/null | sort -nr | grep -v ^0 - a bit tidier – Stuart Cardall – 2016-12-24T17:00:22.827

This is fine but the results aren't very friendly. I usually turn to this: find {/path/to/directory} -type f -size +{file-size-in-kb}k -exec ls -lh {} \; | awk '{ print $8 ": " $5 }' – deed02392 – 2012-03-06T20:02:18.533

roman# du --max-depth=1 | sort -n du: illegal option -- - usage: du [-A] [-H | -L | -P] [-a | -s | -d depth] [-c] [-l] [-h | -k | -m | -B bsize] [-n] [-x] [-I mask] [file ...] – holms – 2012-04-06T16:00:48.143

3Is the * really necessary. Doesn't it by default include all files in the current dir? – Josh Hunt – 2009-07-21T17:02:23.353

No, the * should be redundant. I'm not sure whether using it is the sign of a good habit or a bad one. Thanks for pointing it out. I've amended the answer to reflect it as optional. – mas – 2009-07-22T09:16:09.067

23

For most things, I prefer CLI tools, but for drive usage, I really like filelight. The presentation is more intuitive to me than any other space management tool I've seen.

Filelight screenshot

Anton Geraschenko

Posted 2009-07-21T06:54:48.307

Reputation: 453

Very good app. +1 – rpax – 2014-07-01T17:43:37.813

Visually, it's artistically intriguing, but intuitive?  Just from looking at it, I have no idea what it's representing.  Can somebody explain it?  I went to the site, and I didn't see any explanation. – G-Man Says 'Reinstate Monica' – 2015-08-07T22:59:52.350

A similar tool on Mac is DaisyDisk, available at http://daisydiskapp.com

– computingfreak – 2016-07-03T00:38:39.023

1Filelight is my space-hog pruning tool of choice. – Ryan C. Thompson – 2009-09-11T08:07:00.607

20

Filelight is better for KDE users, but for completeness (question title is general) I must mention Baobab is included in Ubuntu, aka Disk Usage Analyzer:

enter image description here

Nicolas Raoul

Posted 2009-07-21T06:54:48.307

Reputation: 7 766

If you're looking for an equivalent of this on the Mac platform, checkout DaisyDisk. – computingfreak – 2016-07-03T00:40:40.627

8

A GUI tool, KDirStat, shows the data both in table form and graphically. You can see really quickly where most of the space is used.

enter image description here

I'm not sure if this is exactly the KDE tool you didn't want, but I think it still should be mentioned in a question like this. It's good and many people probably don't know it - I only learned about it recently myself.

Jonik

Posted 2009-07-21T06:54:48.307

Reputation: 5 352

4Kdirstat is sooooo slow. Use ncdu instead. – Daenyth – 2010-07-07T15:31:07.590

@Daenyth not true anymore! This tool was rebuilt as QDirStat and it is of instant speed. no idea how it does that but out of the given answers here, it is probably the best one – phil294 – 2016-11-27T15:46:50.593

I just hit ctrl+f to find ncdu, and saw that I've already upvoted @Daenyth – Rob – 2013-02-18T05:06:07.950

Thanks for the answer. It's the exact same tool I had in KDE3, but I moved to KDE 4. – Robert Munteanu – 2009-07-22T11:20:35.790

Are you sure you can't get kdirstat for KDE4? – Jonik – 2009-07-22T11:44:17.807

On KDE, it's simply called k4dirstat. – phihag – 2014-02-26T14:07:33.690

5

A Combination is always the best trick on Unix.

du -sk $(find . -type d) | sort -n -k 1

Will show directory sizes in KB and sort to give the largest at the end.
Tree-view will however needs some more fu... is it really required?

Note that this scan is nested across directories so it will count sub-directories again for the higher directories and the base directory . will show up at the end as the total utilization sum.

You can however use a depth control on the find to search at a specific depth.
And, get a lot more involved with your scanning actually... depending on what you want. Depth control of find with -maxdepth and -mindepth can restrict to a specific sub-directory depth.


Here is a refined variation for your arg-too-long problem

find . -type d -exec du -sk {} \; |  sort -n -k 1

nik

Posted 2009-07-21T06:54:48.307

Reputation: 50 788

I tried that and i got lots of 'du: Task: No such file or directory' – Josh Hunt – 2009-07-21T07:12:31.587

Thanks for the answer. Unfortunately I get bash: /usr/bin/du: Argument list too long – Robert Munteanu – 2009-07-21T07:21:40.117

3

I like gt5. You can navigate the tree and open subdirectories to drill down for more detail. It uses a text-mode web browser, such as lynx, to display the results. Install elinks for best results.

alt text

Paused until further notice.

Posted 2009-07-21T06:54:48.307

Reputation: 86 075

2

Although it does not give you a nested output like that, try du

du -h /path/to/dir/

Running that on my Documents folder spits out the following:

josh-hunts-macbook:Documents joshhunt$ du -h
  0B    ./Adobe Scripts
  0B    ./Colloquy Transcripts
 23M    ./Electronic Arts/The Sims 3/Custom Music
  0B    ./Electronic Arts/The Sims 3/InstalledWorlds
364K    ./Electronic Arts/The Sims 3/Library
 77M    ./Electronic Arts/The Sims 3/Recorded Videos
101M    ./Electronic Arts/The Sims 3/Saves
 40M    ./Electronic Arts/The Sims 3/Screenshots
1.6M    ./Electronic Arts/The Sims 3/Thumbnails
387M    ./Electronic Arts/The Sims 3
387M    ./Electronic Arts
984K    ./English Advanced/Documents
1.8M    ./English Advanced
  0B    ./English Extension/Documents
212K    ./English Extension
100K    ./English Tutoring
5.6M    ./IPT/Multimedia Assessment Task
720K    ./IPT/Transaction Processing Systems
8.6M    ./IPT
1.5M    ./Job
432K    ./Legal Studies/Crime
8.0K    ./Legal Studies/Documents
144K    ./Legal Studies/Family/PDFs
692K    ./Legal Studies/Family
1.1M    ./Legal Studies
380K    ./Maths/Assessment Task 1
388K    ./Maths
[...]

Then you can sort the output by piping it through to sort

du /path/to/dir | sort -n

Josh Hunt

Posted 2009-07-21T06:54:48.307

Reputation: 20 095

Thanks, but it does not properly show which directories are largest. If I start it in my home directory the output is unusable. – Robert Munteanu – 2009-07-21T07:02:07.353

1

Here is the script which does it for you automatically.

http://www.thegeekscope.com/linux-script-to-find-largest-files/

Following is the sample output of the script:

**# sh get_largest_files.sh / 5**

[SIZE (BYTES)]     [% OF DISK] [OWNER]         [LAST MODIFIED ON]        [FILE] 

56421808           0%           root           2012-08-02 14:58:51       /usr/lib/locale/locale-archive
32464076           0%           root           2008-09-18 18:06:28       /usr/lib/libgcj.so.7rh.0.0
29147136           0%           root           2012-08-02 15:17:40       /var/lib/rpm/Packages
20278904           0%           root           2008-12-09 13:57:01       /usr/lib/xulrunner-1.9/libxul.so
16001944           0%           root           2012-08-02 15:02:36       /etc/selinux/targeted/modules/active/base.linked

Total disk size: 23792652288 Bytes
Total size occupied by these files: 154313868 Bytes  [ 0% of Total Disc Space  ]

*** Note: 0% represents less than 1% ***

You may find this script very handy and useful !

Kam

Posted 2009-07-21T06:54:48.307

Reputation: 19

1Link is broken? – Danijel – 2015-12-24T13:15:35.890

2While the linked website does give instructions, it is preferred for you to paraphrase then reference the external site (which looks like a personal blog anyways). This will prevent link rot and help more people on this site – Canadian Luke – 2012-09-06T07:58:56.270

1

Although finding the percentage disk usage of each file/directory is beneficial, most of the time knowing largest files/directories inside the disk is sufficient.

So my favorite is this:

# du -a | sort -n -r | head -n 20

And output is like this:

28626644        .
28052128        ./www
28044812        ./www/vhosts
28017860        ./www/vhosts/example.com
23317776        ./www/vhosts/example.com/httpdocs
23295012        ./www/vhosts/example.com/httpdocs/myfolder
23271868        ./www/vhosts/example.com/httpdocs/myfolder/temp
11619576        ./www/vhosts/example.com/httpdocs/myfolder/temp/main
11590700        ./www/vhosts/example.com/httpdocs/myfolder/temp/main/user
11564748        ./www/vhosts/example.com/httpdocs/myfolder/temp/others
4699852         ./www/vhosts/example.com/stats
4479728         ./www/vhosts/example.com/stats/logs
4437900         ./www/vhosts/example.com/stats/logs/access_log.processed
401848          ./lib
323432          ./lib/mysql
246828          ./lib/mysql/mydatabase
215680          ./www/vhosts/example.com/stats/webstat
182364          ./www/vhosts/example.com/httpdocs/tmp/aaa.sql
181304          ./www/vhosts/example.com/httpdocs/tmp/bbb.sql
181144          ./www/vhosts/example.com/httpdocs/tmp/ccc.sql

trante

Posted 2009-07-21T06:54:48.307

Reputation: 539

1

To find the top 25 files in the current directory and its subdirectories:

find . -type f -exec ls -al {} \; | sort -nr -k5 | head -n 25

The will output the top 25 files by sorting based on the size of the files via the "sort -nr -k5" piped command.

xpros

Posted 2009-07-21T06:54:48.307

Reputation: 111

1

Another alternative is agedu which breaks down disk space by last-access time, which makes it easier to locate space wasting files.

It even works on a server without X Windows by serving temporary web pages so usage can be analysed remotely, with graphs. Assuming the IP address of the server is 192.168.1.101, you can type this on the command line of the server

agedu -s / -w --address 192.168.1.101:60870 --auth basic -R

This prints the username, password and URL with which you can access the "GUI" and browse the results. When done, terminate agedu with Ctrl+D on the server.

Bastiaan

Posted 2009-07-21T06:54:48.307

Reputation: 11

0

du -chs /*

Will show you a list of the root directory.

RusAlex

Posted 2009-07-21T06:54:48.307

Reputation: 236

0

To complete the list a little bit more, I add my favorite disk usage analyzer, which is xdiskusage.

The GUI remembers me of some other good ol' X utilities, it's fast and not bloated, but you can nevertheless navigate easily in the hierarchy and have some display options:

$ xdiskusage /usr

enter image description here

mpy

Posted 2009-07-21T06:54:48.307

Reputation: 20 866

0

Try the following one-liner (displays top-20 biggest files in the current directory):

ls -1Rs | sed -e "s/^ *//" | grep "^[0-9]" | sort -nr | head -n20

or with human readable sizes:

ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20

The second command to work on OSX/BSD properly (as sort doesn't have -h), you need to install sort from coreutils.

So these aliases are useful to have in your rc files (every time when you need it):

alias big='du -ah . | sort -rh | head -20'
alias big-files='ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20'

kenorb

Posted 2009-07-21T06:54:48.307

Reputation: 16 795