104
38
I run
ln /a/A /b/B
I would like to see at the folder a
where the file A points to by ls
.
104
38
I run
ln /a/A /b/B
I would like to see at the folder a
where the file A points to by ls
.
182
You can find inode number for your file with
ls -i
and
ls -l
shows references count (number of hardlinks to a particular inode)
after you found inode number, you can search for all files with same inode:
find . -inum NUM
will show filenames for inode NUM in current dir (.)
1@BeowulfNode42 This command is great, but it needs the shared root folder of the same files at least. – Itachi – 2016-09-26T04:41:10.853
2
this answer gives a pragmatic "do this" but i feel strongly that @LaurenceGonsalves answers the "how" and/or "why" questions.
– Trevor Boyd Smith – 2016-11-16T18:12:26.80050you could just run find . -samefile filename – BeowulfNode42 – 2013-11-25T00:02:12.303
66
There isn't really a well-defined answer to your question. Unlike symlinks, hardlinks are indistinguishable from the "original file".
Directory entries consist of a filename and a pointer to an inode. The inode in turn contains the file metadata and (pointers to) the actual file contents). Creating a hard link creates another filename + reference to the same inode. These references are unidirectional (in typical filesystems, at least) -- the inode only keeps a reference count. There is no intrinsic way to find out which is the "original" filename.
By the way, this is why the system call to "delete" a file is called unlink
. It just removes a hardlink. The inode an attached data are deleted only if the inode's reference count drops to 0.
The only way to find the other references to a given inode is to exhaustively search over the file system checking which files refer to the inode in question. You can use 'test A -ef B' from the shell to perform this check.
35That means that there is no such thing as a hard link to another file, as the original file is also a hard link; hard links point to a location on disk. – jtbandes – 2009-07-26T00:03:58.710
12@jtbandes: Hard links point to an inode which points to the actual data. – dash17291 – 2013-06-13T19:34:10.677
33
UNIX has hard links and symbolic links (made with "ln"
and "ln -s"
respectively). Symbolic links are simply a file that contains the real path to another file and can cross filesystems.
Hard links have been around since the earliest days of UNIX (that I can remember anyway, and that's going back quite a while). They are two directory entries that reference the exact same underlying data. The data in a file is specified by its inode
. Each file on a file system points to an inode but there's no requirement that each file point to a unique inode - that's where hard links come from.
Since inodes are unique only for a given filesystem, there's a limitation that hard links must be on the same filesystem (unlike symbolic links). Note that, unlike symbolic links, there is no privileged file - they are all equal. The data area will only be released when all the files using that inode are deleted (and all processes close it as well, but that's a different issue).
You can use the "ls -i"
command to get the inode of a particular file. You can then use the "find <filesystemroot> -inum <inode>"
command to find all files on the filesystem with that given inode.
Here's a script which does exactly that. You invoke it with:
findhardlinks ~/jquery.js
and it will find all files on that filesystem which are hard links for that file:
pax@daemonspawn:~# ./findhardlinks /home/pax/jquery.js
Processing '/home/pax/jquery.js'
'/home/pax/jquery.js' has inode 5211995 on mount point '/'
/home/common/jquery-1.2.6.min.js
/home/pax/jquery.js
Here's the script.
#!/bin/bash
if [[ $# -lt 1 ]] ; then
echo "Usage: findhardlinks <fileOrDirToFindFor> ..."
exit 1
fi
while [[ $# -ge 1 ]] ; do
echo "Processing '$1'"
if [[ ! -r "$1" ]] ; then
echo " '$1' is not accessible"
else
numlinks=$(ls -ld "$1" | awk '{print $2}')
inode=$(ls -id "$1" | awk '{print $1}' | head -1l)
device=$(df "$1" | tail -1l | awk '{print $6}')
echo " '$1' has inode ${inode} on mount point '${device}'"
find ${device} -inum ${inode} 2>/dev/null | sed 's/^/ /'
fi
shift
done
@pax: There seems to be a bug in the script. I start it by . ./findhardlinks.bash
while being in OS X's Zsh. My current window in Screen closes. – None – 2009-07-25T16:31:37.153
4@Masi The issue is your initial . (same as the source command). That causes the exit 1 command to exit your shell. Use chmod a+x findhardlinks.bash then execute it with ./findhardlinks.bash or use bash findhardlinks.bash – njsf – 2009-07-25T23:08:15.487
Please, see my reply to your answer at http://superuser.com/questions/12972/to-see-hardlinks-by-ls/13233#13233
– Léo Léopold Hertz 준영 – 2009-07-26T16:42:00.727Best answer, by far. Kudos. – MariusMatutiae – 2015-06-27T08:24:50.247
Yeah, great explanation with the correct answer :) – sMyles – 2017-05-26T02:49:54.577
@Joe do you have any suggestion as to what to use for device
? when attempted on my Mac I had to replace the 6th position with the 9th position. Also stat
has different flags on Mac. should be -f
instead of -c
. – guyarad – 2019-01-16T17:57:59.577
Found a solution to df
output issues here. Simply add -P
to df
command to get POSIX-compliant output
@guyarad Good, glad you got it figured out. Because I have no idea what this is anymore. That was from 7 years ago – Joe – 2019-01-17T19:36:08.087
Shame on you @Joe ! not remembering a comment on a SE post you wrote 7 years ago :) – guyarad – 2019-01-20T05:30:27.637
3To do this programmatically, it's probably more resilient if you use this instead: INUM=$(stat -c %i $1)
. Also NUM_LINKS=$(stat -c %h $1)
. See man stat
for more format variables you can use. – Joe – 2012-01-03T20:12:21.513
24
ls -l
The first column will represent permissions. The second column will be the number of sub-items (for directories) or the number of paths to the same data (hard links, including the original file) to the file. Eg:
-rw-r--r--@ 2 [username] [group] [timestamp] HardLink
-rw-r--r--@ 2 [username] [group] [timestamp] Original
^ Number of hard links to the data
3Helpful in determining IF a given file has [other] hard links, but not WHERE they are. – mklement0 – 2015-02-11T03:48:48.263
Also, there's no technical distinction between a hard-link and an original file. They are both identical in that they simply point to the inode
which in turn point to disc content. – guyarad – 2019-01-16T17:59:32.627
14
How about the following simpler one? (Latter might replace the long scripts above!)
If you have a specific file <THEFILENAME>
and want to know all its hardlinks spread over the directory <TARGETDIR>
, (which can even be the entire filesystem denoted by /
)
find <TARGETDIR> -type f -samefile <THEFILENAME>
Extending the logic, if you want to know all the files in the <SOURCEDIR>
having multiple hard-links spread over <TARGETDIR>
:
find <SOURCEDIR> -type f -links +1 \
-printf "\n\n %n HardLinks of file : %H/%f \n" \
-exec find <TARGETDIR> -type f -samefile {} \;
3@silvio: You can only create hard links to files, not directories. – mklement0 – 2015-02-11T03:45:53.667
@mklement0: You are right! – silvio – 2015-02-11T10:38:10.860
The .
and ..
entries in directories are hardlinks. You can tell how many subdirs are in a directory from the link count of .
. This is moot anyway, since find -samefile .
still won't print any subdir/..
output. find
(at least the GNU version) seems to be hardcoded to ignore ..
, even with -noleaf
. – Peter Cordes – 2015-04-21T18:53:59.940
also, that find-all-links idea is O(n^2)
, and runs find
once for each member of a set of hardlinked files. find ... -printf '%16i %p\n' | sort -n | uniq -w 16 --all-repeated=separate
would work, (16 isn't wide enough for a decimal representation of 2^63-1, so when your XFS filesystem is big enough to have inode numbers that high, watch out) – Peter Cordes – 2015-04-21T19:06:37.773
turned that into an answer – Peter Cordes – 2015-04-21T19:33:28.253
This is for me the best answer! but i would not use -type f
because the file can be a directory too. – silvio – 2013-08-30T11:40:44.810
6
There are a lot of answers with scripts to find all hardlinks in a filesystem. Most of them do silly things like running find to scan the whole filesystem for -samefile
for EACH multiply-linked file. This is crazy; all you need is to sort on inode number and print duplicates.
With only one pass over the filesystem to find and group all sets of hardlinked files
find dirs -xdev \! -type d -links +1 -printf '%20D %20i %p\n' |
sort -n | uniq -w 42 --all-repeated=separate
This is much faster than the other answers for finding multiple sets of hardlinked files.
find /foo -samefile /bar
is excellent for just one file.
-xdev
: limit to one filesystem. Not strictly needed since we also print the FS-id to uniq on! -type d
reject directories: the .
and ..
entries mean they're always linked.-links +1
: link count strictly > 1
-printf ...
print FS-id, inode number, and path. (With padding to fixed column widths that we can tell uniq
about.)sort -n | uniq ...
numeric sort and uniquify on the first 42 columns, separating groups with a blank lineUsing ! -type d -links +1
means that sort's input is only as big as the final output of uniq so we aren't doing a huge amount of string sorting. Unless you run it on a subdirectory that only contains one of a set of hardlinks. Anyway, this will use a LOT less CPU time re-traversing the filesystem than any other posted solution.
sample output:
...
2429 76732484 /home/peter/weird-filenames/test/.hiddendir/foo bar
2429 76732484 /home/peter/weird-filenames/test.orig/.hiddendir/foo bar
2430 17961006 /usr/bin/pkg-config.real
2430 17961006 /usr/bin/x86_64-pc-linux-gnu-pkg-config
2430 36646920 /usr/lib/i386-linux-gnu/dri/i915_dri.so
2430 36646920 /usr/lib/i386-linux-gnu/dri/i965_dri.so
2430 36646920 /usr/lib/i386-linux-gnu/dri/nouveau_vieux_dri.so
2430 36646920 /usr/lib/i386-linux-gnu/dri/r200_dri.so
2430 36646920 /usr/lib/i386-linux-gnu/dri/radeon_dri.so
...
TODO?: un-pad the output with awk
or cut
. uniq
has very limited field-selection support, so I pad the find output and use fixed-width. 20chars is wide enough for the maximum possible inode or device number (2^64-1 = 18446744073709551615). XFS chooses inode numbers based on where on disk they're allocated, not contiguously from 0, so large XFS filesystems can have >32bit inode numbers even if they don't have billions of files. Other filesystems might have 20-digit inode numbers even if they aren't gigantic.
TODO: sort groups of duplicates by path. Having them sorted by mount point then inode number mixes things together, if you have a couple different subdirs that have lots of hardlinks. (i.e. groups of dup-groups go together, but the output mixes them up).
A final sort -k 3
would sort lines separately, not groups of lines as a single record. Preprocessing with something to transform a pair of newlines into a NUL byte, and using GNU sort --zero-terminated -k 3
might do the trick. tr
only operates on single characters, not 2->1 or 1->2 patterns, though. perl
would do it (or just parse and sort within perl or awk). sed
might also work.
1%D
is the filesystem identifier (it is unique for the current boot while no filesystems are umount
ed), so following is even more generic: find directories.. -xdev ! -type d -links +1 -printf '%20i %20D %p\n' | sort -n | uniq -w 42 --all-repeated=separate
. This works as long no given directory contains another directory on the filesystem level, also it looks at everything which can be hardlinked (like devices or softlinks - yes, softlinks can have a link count greater than 1). Note that dev_t
and ino_t
is 64 bits long today. This likely will hold as long as we have 64 bit systems. – Tino – 2015-11-09T14:34:28.280
@Tino: great point about using using ! -type d
, instead of -type f
. I even have some hardlinked symlinks on my filesystem from organizing some collections of files. Updated my answer with your improved version (but I put the fs-id first, so the sort order at least groups by filesystem.) – Peter Cordes – 2015-11-09T18:45:37.093
3
This is somewhat of a comment to Torocoro-Macho's own answer and script, but it obviously won't fit in the comment box.
Rewrote your script with more straightforward ways to find the info, and thus a lot less process invocations.
#!/bin/sh
xPATH=$(readlink -f -- "${1}")
for xFILE in "${xPATH}"/*; do
[ -d "${xFILE}" ] && continue
[ ! -r "${xFILE}" ] && printf '"%s" is not readable.\n' "${xFILE}" 1>&2 && continue
nLINKS=$(stat -c%h "${xFILE}")
if [ ${nLINKS} -gt 1 ]; then
iNODE=$(stat -c%i "${xFILE}")
xDEVICE=$(stat -c%m "${xFILE}")
printf '\nItem: %s[%d] = %s\n' "${xDEVICE}" "${iNODE}" "${xFILE}";
find "${xDEVICE}" -inum ${iNODE} -not -path "${xFILE}" -printf ' -> %p\n' 2>/dev/null
fi
done
I tried to keep it as similar to yours as possible for easy comparison.
One should always avoid the $IFS
magic if a glob suffices, since it is unnecessarily convoluted, and file names actually can contain newlines (but in practice mostly the first reason).
You should avoid manually parsing ls
and such output as much as possible, since it will sooner or later bite you. For example: in your first awk
line, you fail on all file names containing spaces.
printf
will often save troubles in the end since it is so robust with the %s
syntax. It also gives you full control over the output, and is consistent across all systems, unlike echo
.
stat
can save you a lot of logic in this case.
GNU find
is powerful.
Your head
and tail
invocations could have been handled directly in awk
with e.g. the exit
command and/or selecting on the NR
variable. This would save process invocations, which almost always betters performance severely in hard-working scripts.
Your egrep
s could just as well be just grep
.
If you just want groups of hardlinks, rather than repeated with each member as the "master", use find ... -xdev -type f -links +1 -printf '%16i %p\n' | sort -n | uniq -w 16 --all-repeated=separate
. This is MUCH faster, as it only traverses the fs once. For multiple FSes at once, you'd need to prefix the the inode numbers with a FS id. Maybe with find -exec stat... -printf ...
– Peter Cordes – 2015-04-21T19:14:06.927
turned that idea into an answer – Peter Cordes – 2015-04-21T19:33:40.440
xDEVICE=$(stat -c%m "${xFILE}") does not work on all systems (for example: stat (GNU coreutils) 6.12). If the script outputs "Item: ?" at the front of each line, then replace this offending line with a line more like the original script, but with xITEM renamed to xFILE: xDEVICE=$(df "${xFILE}" | tail -1l | awk '{print $6}') – kbulgrien – 2014-03-28T20:23:44.833
2
Based on the findhardlinks
script (renamed it to hard-links
), this is what I have refactored and made it work.
Output:
# ./hard-links /root
Item: /[10145] = /root/.profile
-> /proc/907/sched
-> /<some-where>/.profile
Item: /[10144] = /root/.tested
-> /proc/907/limits
-> /<some-where else>/.bashrc
-> /root/.testlnk
Item: /[10144] = /root/.testlnk
-> /proc/907/limits
-> /<another-place else>/.bashrc
-> /root/.tested
# cat ./hard-links
#!/bin/bash
oIFS="${IFS}"; IFS=$'\n';
xPATH="${1}";
xFILES="`ls -al ${xPATH}|egrep "^-"|awk '{print $9}'`";
for xFILE in ${xFILES[@]}; do
xITEM="${xPATH}/${xFILE}";
if [[ ! -r "${xITEM}" ]] ; then
echo "Path: '${xITEM}' is not accessible! ";
else
nLINKS=$(ls -ld "${xITEM}" | awk '{print $2}')
if [ ${nLINKS} -gt 1 ]; then
iNODE=$(ls -id "${xITEM}" | awk '{print $1}' | head -1l)
xDEVICE=$(df "${xITEM}" | tail -1l | awk '{print $6}')
echo -e "\nItem: ${xDEVICE}[$iNODE] = ${xITEM}";
find ${xDEVICE} -inum ${iNODE} 2>/dev/null|egrep -v "${xITEM}"|sed 's/^/ -> /';
fi
fi
done
IFS="${oIFS}"; echo "";
I posted comments on this script as a separate answer. – Daniel Andersson – 2012-06-13T07:40:57.033
1
You can configure ls
to highlight hardlinks using an 'alias', but as stated before there is no way to show the 'source' of the hardlink which is why I append .hardlink
to help with that.
Add the following somewhere in your .bashrc
alias ll='LC_COLLATE=C LS_COLORS="$LS_COLORS:mh=1;37" ls -lA --si --group-directories-first'
1
A GUI solution gets really close to your question:
You cannot list the actual hardlinked files from "ls" because, as previous commentators have pointed out, the file "names" are mere aliases to the same data. However, there actually is a GUI tool that gets really close to what you want which is to display a path listing of file names that point to the same data (as hardlinks) under linux, it is called FSLint. The option you want is under "Name clashes" -> deselect "checkbox $PATH" in Search (XX) -> and select "Aliases" from drop-down box after "for..." towards the top-middle.
FSLint is very poorly documented but I found that making sure the limited directory tree under "Search path" with the checkbox selected for "Recurse?" and the aforementioned options, a listing of hardlinked data with paths and names that "point" to the same data are produced after the program searches.
1Hard links aren't pointers, symlinks are. They're multiple names for the same file (inode). After a
link(2)
system call, there's no sense in which one is the original and one is the link. This is why, as the answers point out, the only way to find all the links isfind / -samefile /a/A
. Because one directory entry for an inode doesn't "know about" other directory entries for the same inode. All they do is refcount the inode so it can be deleted when the last name for it isunlink(2)ed
. (This is the "link count" inls
output). – Peter Cordes – 2015-04-21T18:47:45.683@PeterCordes: Is the refcount actually stored IN the hardlink entry? That's what your wording implies ("All they do is refcount the inode...") But that wouldn't make sense if the links don't know anything about each other, since when one updated, all the others would somehow have to be updated. Or is the refcount stored in the inode itself? (Forgive me if it's a dumb question, I consider myself a newbie and I'm still learning). – loneboat – 2015-07-06T20:51:42.820
1The refcount is stored in the inode, as you eventually figured out must be the case, from the other facts. :) Directory entries are named pointers to inodes. We call it "hard linking" when you have multiple names pointing to the same inode. – Peter Cordes – 2015-07-06T20:58:11.033