95

I know how to retrieve the last modification date of a single file in a Git repository:

git log -1 --format="%ad" -- path/to/file

Is there a simple and efficient way to do the same for all the files currently present in the repository?

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Eric Bréchemier
  • 1,335
  • 1
  • 11
  • 8
  • It's 2022, ten years after this question was asked. Still no solution to make such simple tasks (in this case akin to `ls -l`) easier to type and to remember? Yes I know I could save the script in an executable, but that works only until I go on a different machine, which usually is when I need the feature the most! – Davide Aug 30 '22 at 11:24

7 Answers7

102

A simple answer would be to iterate through each file and display its modification time, i.e.:

git ls-tree -r --name-only HEAD | while read filename; do
  echo "$(git log -1 --format="%ad" -- $filename) $filename"
done

This will yield output like so:

Fri Dec 23 19:01:01 2011 +0000 Config
Fri Dec 23 19:01:01 2011 +0000 Makefile

Obviously, you can control this since its just a bash script at this point--so feel free to customize to your heart's content!

Andrew M.
  • 10,982
  • 2
  • 34
  • 29
  • 3
    I was hoping that there was an option to get a combined output in a single run of git log, but your answer is better than the one I had in mind using find. I did not know git-ls-tree, which has the advantage of listing only the files stored in the repository, skipping the .git folder and ignored files. Thanks. – Eric Bréchemier Jun 23 '12 at 08:25
  • No problem, Eric; you are following the same route that I did--i.e., doing a find and ignoring the .git directory! :) There may be some options using the git plumbing commands, but quite frankly, this works pretty well. If you could find some way to get the information on a per file basis all at once, that would work best--but remember, git operates on the state of commits, not the state of individual files. – Andrew M. Jun 25 '12 at 16:40
  • Can this be adapted to work on a commit other than the current checkout HEAD? I think the log command is working relative to the HEAD here by default. – ThorSummoner May 30 '14 at 19:30
  • 15
    I recommend using the --format="%ai" if you want sortable time stamps instead of human readable dates. – ThorSummoner May 30 '14 at 19:45
  • 2
    Since "HEAD" is just a reference, you can use any reference you want, be it a tag, branch, commit hash, etc.. – Andrew M. May 30 '14 at 23:29
  • 3
    as @ThorSummoner said, use %ai format for date, and then just pipe to sort to sort the results: `git ls-tree -r --name-only HEAD | while read filename; do echo "$(git log -1 --format="%ai" -- $filename) $filename"; done | sort` – John Hunt Aug 18 '17 at 08:50
  • Is this still the best way to proceed in 2022? It's annoying as such simple tasks (in this case akin to `ls -l`) becomes such a huge pita that one has to script... – Davide Aug 30 '22 at 11:15
37

This approach also works with filenames that contain spaces:

git ls-files -z | xargs -0 -n1 -I{} -- git log -1 --format="%ai {}" {}

Example output:

2015-11-03 10:51:16 -0500 .gitignore
2016-03-30 11:50:05 -0400 .htaccess
2015-02-18 12:20:26 -0500 .travis.yml
2016-04-29 09:19:24 +0800 2016-01-13-Atlanta.md
2016-04-29 09:29:10 +0800 2016-03-03-Elmherst.md
2016-04-29 09:41:20 +0800 2016-03-03-Milford.md
2016-04-29 08:15:19 +0800 2016-03-06-Clayton.md
2016-04-29 01:20:01 +0800 2016-03-14-Richmond.md
2016-04-29 09:49:06 +0800 3/8/2016-Clayton.md
2015-08-26 16:19:56 -0400 404.htm
2016-03-31 11:54:19 -0400 _algorithms/acls-bradycardia-algorithm.htm
2015-12-23 17:03:51 -0500 _algorithms/acls-pulseless-arrest-algorithm-asystole.htm
2016-04-11 15:00:42 -0400 _algorithms/acls-pulseless-arrest-algorithm-pea.htm
2016-03-31 11:54:19 -0400 _algorithms/acls-secondary-survey.htm
2016-03-31 11:54:19 -0400 _algorithms/acls-suspected-stroke-algorithm.htm
2016-03-31 11:54:19 -0400 _algorithms/acls-tachycardia-algorithm-stable.htm
...

The output can be sorted by modification timestamp by adding | sort to the end:

git ls-files -z | xargs -0 -n1 -I{} -- git log -1 --format="%ai {}" {} | sort
dotancohen
  • 2,410
  • 2
  • 24
  • 38
William Entriken
  • 543
  • 5
  • 12
11

Here's another way:

git ls-tree -r --name-only HEAD -z | TZ=UTC xargs -0n1 -I_ git --no-pager log -1 --date=iso-local --format="%ad _" -- _

Changes to previously given answers:

  • Correctly handles spaces in filenames.
  • Uses ls-tree instead of ls-files and as such can be used with bare repositories.
  • Prints all times with zero offset (UTC) in ISO 8601 like format. This allows correct sorting also for times near daylight saving changes (or commits from different timezones) by appending | sort to the command.
  • Doesn't require using subshells so the performance should be as good as possible.

Note that this doesn't correctly handle filenames with the % character. See below for a more elaborate command to correctly handle all characters in filenames.

Note that this command is still really slow because Git doesn't really store the information we're looking after. Technically this goes through all the files, filters all changes to any given file from the whole project history, takes the latest commit and prints its author timestamp. As a result, the displayed times match the last commit that changed each file. If the file had a different timestamp on disk at the time the original commit was made, it was not ever stored anywhere in the Git repository and as such it cannot ever be restored without an external data source.

The timestamps that this script emits are just an emulated version matching the commit time, not the real timestamp that the file had because Git doesn't consider file timestamps as data. This is because this part of Git was designed by Linus Torvalds and he strongly believes that the file timestamp on disk should match the time it was modified on disk, not the timestamp that the file had on the disk of somebody else when it was historically modified. Git only stores one timestamp for the commit that was made and another timestamp for the moment that commit was included in the DAG. These may differ in case commit author and the person that applied the commit to version history are two different people as often happens in Linux kernel development. (Also consider the fact that you can commit only selected lines from each file using the index / staging area. There doesn't exist even a concept of "file timestamp" in theory for that case because the committed version doesn't match any file on disk.)

If you want to set filesystem modification times to the last author commit time of each file, you can do something like this to deal with special characters in filenames (add | bash to automatically execute all emitted commands):

git ls-tree -r --name-only HEAD -z | TZ=UTC xargs -0n1 git --no-pager log -1 --date=iso-local --name-only -z --format="format:%ad" | perl -npe "INIT {\$/ = \"\\0\"} s@^(.*? .*?) .*?\n(.*)\$@\$date=\$1; \$name=\$2; \$name =~ s/'/'\"'\"'/sg; \"TZ=UTC touch -m --date '\$date' '\$name';\n\"@se"

Even though this is much more complex than the command above, the performance of this command should be about equal to the first one because the performance is limited by searching for last modification time of each file instead of actually setting the modification time. Note that this converts times to UTC, uses null-separated files and resets correct timestamp for each file on the filesystem using UTC timezone while setting the time.

If the order of output is not strictly important, you can improve performance of this command by adding -P $(nproc) to xargs flags to scale Git to all CPUs making the command look like ...TZ=UTC xargs -0n1 -P $(nproc) git....

If you prefer committer time instead of author date, use %cd instead of %ad in the above command line.

Mikko Rantalainen
  • 858
  • 12
  • 27
  • 1
    This is the best answer because it is sortable by date. – Jonathan Ben-Avraham Sep 12 '20 at 19:30
  • 1
    +1 for the best 1 line shell script I've seen in a long time! – Andrew Murphy Sep 30 '20 at 07:01
  • Surprised at the claim that xargs is more efficient than while read. Why is that the case? – ShadSterling Aug 12 '21 at 14:58
  • 3
    Bash "while read" is okay for some cases. For this specific use case xargs may actually have identical performance to while read but xargs allows handling filenames with embedded line feeds correctly. In addition, xargs allows running commands on multiple CPUs concurrently with -P flag. – Mikko Rantalainen Aug 13 '21 at 22:12
  • Also note that "while read" cannot handle some potential special characters in the filenames such as a line feed. – Mikko Rantalainen Jun 13 '22 at 14:41
  • 1
    In most cases one can avoid the `-r` and just list one directory, which is what one is often interested, and that should run much faster. – Davide Aug 30 '22 at 11:20
  • Is this still the best way to proceed in 2022? It's annoying as such simple tasks (in this case akin to `ls -lt`) becomes such a huge pita that one has to script... – Davide Aug 30 '22 at 11:21
  • @Davide This is not going to get easier because Git doesn't save file timestamps by design. It only saves author timestamp for commit and committer timestamp (will differ from author timestamp in case of e.g. rebase). See the answer for details. – Mikko Rantalainen Aug 30 '22 at 15:03
  • @MikkoRantalainen Finding author timestamp for commit would be good enough for most purposes, if one could avoid all this mess. Is there an obvious way to do that? I haven't found it. – Davide Aug 31 '22 at 16:12
  • If you want the author time for a single file, you could just run `git log -n 1 --format="%ad" -- path/to/file`. Add `--date=iso-local` if you want output in ISO 8601 style. That queries history of the file `path/to/file`, outputs latest commit formatted as the timestamp only and stops after emitting one commit. It only gets messy if you want to do similar queries for all files at the same time in a bare repository. And you don't really need those quote marks if you're using POSIX compatible shell. – Mikko Rantalainen Sep 01 '22 at 08:17
7

This is a small tweak of Andrew M.'s answer. (I was unable to comment on his answer.)

Wrap the first $filename in double quotes, in order to support filenames with embedded spaces.

git ls-tree -r --name-only HEAD | while read filename; do
    echo "$(git log -1 --format="%ad" -- "$filename") $filename"
done

Sample output:

Tue Jun 21 11:38:43 2016 -0600 subdir/this is a filename with spaces.txt

I appreciate that Andrew's solution (based on ls-tree) works with bare repositories! (This isn't true of solutions using ls-files.)

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Kevin G.
  • 71
  • 1
  • 2
5

If you're trying to set the file modification times on a big repository, look at Git Tools. It’s already a package.

sudo apt install git-restore-mtime
cd repo
git restore-mtime

It uses git whatschanged rather than git log, which is much quicker on big repositories.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Andrew Murphy
  • 161
  • 2
  • 5
3

For those of us using Windows and PowerShell, Andrew M's answer, with the computer-readable tweak:

git ls-tree -r --name-only HEAD | ForEach-Object { "$(git log -1 --format="%ai" -- "$_")`t$_" }

Example output:

2019-05-07 12:00:37 -0500   .editorconfig
2016-07-13 14:03:49 -0500   .gitattributes
2019-05-07 12:00:37 -0500   .gitignore
2018-02-03 22:01:17 -0600   .mailmap
Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
James Skemp
  • 862
  • 2
  • 10
  • 21
  • With newer pwsh, you can throw in a `-Parallel` to the `ForEach-Object` to make this go a lot faster. – Daniel Mar 16 '21 at 19:24
  • Sorry, I missed this comment. Confirmed, but it does mess with the sort order so that it's not alphabetical (in case that matters). – James Skemp Aug 16 '21 at 20:50
1

Here is the Fish shell version of Andrew M's answer, for those that use Fish.

git ls-tree -r --name-only HEAD | while read -l filename
    printf '%s %s\n' (git log -1 --format="%ai" -- $filename) $filename
end

I store this as a Fish function for easy access.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24