16

Given ANY GitHub repository url string like:

git://github.com/some-user/my-repo.git

or

git@github.com:some-user/my-repo.git

or

https://github.com/some-user/my-repo.git

What is the best way in bash to extract the repository name my-repo from any of the following strings? The solution MUST work for all types of urls specified above.

Thanks.

Justin
  • 5,008
  • 19
  • 58
  • 82

8 Answers8

23
$ url=git://github.com/some-user/my-repo.git
$ basename=$(basename $url)
$ echo $basename
my-repo.git
$ filename=${basename%.*}
$ echo $filename
my-repo
$ extension=${basename##*.}
$ echo $extension
git
quanta
  • 50,327
  • 19
  • 152
  • 213
19

I'd go with basename $URL .git.

womble
  • 95,029
  • 29
  • 173
  • 228
13

Old post, but I faced the same problem recently.

The regex ^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+).git$ works for the three types of URL.

#!/bin/bash

# url="git://github.com/some-user/my-repo.git"
# url="https://github.com/some-user/my-repo.git"
url="git@github.com:some-user/my-repo.git"

re="^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+)(.git)*$"

if [[ $url =~ $re ]]; then    
    protocol=${BASH_REMATCH[1]}
    separator=${BASH_REMATCH[2]}
    hostname=${BASH_REMATCH[3]}
    user=${BASH_REMATCH[4]}
    repo=${BASH_REMATCH[5]}
fi

Explaination (see it in action on regex101):

  • ^ matches the start of a string
  • (https|git) matches and captures the characters https or git
  • (:\/\/|@) matches and captures the characters :// or @
  • ([^\/:]+) matches and captures one character or more that is not / nor :
  • [\/:] matches one character that is / or :
  • ([^\/:]+) matches and captures one character or more that is not / nor :, yet again
  • [\/:] matches the character /
  • (.+) matches and captures one character or more
  • (.git)* matches optional .git suffix at the end
  • $ matches the end of a string

This if far from perfect, as something like https@github.com:some-user/my-repo.git would match, but I think it's fine enough for extraction.

Hicham
  • 131
  • 1
  • 3
  • this is gold! – Omri Jul 01 '18 at 14:24
  • 1
    some urls don't have `.git` at the end. – kenn Jan 02 '19 at 14:58
  • @kenn: then they'd not be a valid remote for git, however. See https://git-scm.com/docs/git-push#URLS. – Martijn Pieters Feb 09 '22 at 12:06
  • 1
    I'm using an expanded version (play with it on [regex101](https://regex101.com/r/liVozi/1): `^((https?|ssh|git|ftps?):\/\/)?(([^\/@]+)@)?([^\/:]+)[\/:]([^\/:]+)\/(.+).git\/?$`, which better matches [the official spec for URLs](https://git-scm.com/docs/git-push#URLS). Group 2 is the scheme, if missing the default is `ssh`. – Martijn Pieters Feb 09 '22 at 12:13
6

Summing up:

  • Get url without (optional) suffix:

    url_without_suffix="${url%.*}"
    
  • Get repository name:

    reponame="$(basename "${url_without_suffix}")"
    
  • Get user (host) name afterwards:

    hostname="$(basename "${url_without_suffix%/${reponame}}")"
    
hypnoglow
  • 161
  • 1
  • 3
1

use regular expression: /([^/]+)\.git$/

Aaron Shen
  • 131
  • 3
0

basename is my favorite, but you can also use sed:

url=git://github.com/some-user/my-repo.git
reponame="$(echo $url | sed -r 's/.+\/([^.]+)(\.git)?/\1/')"
# reponame = "my-repo"

"sed" will delete all text until the last / + the .git extension (if exists), and will retain the match of group \1 which is everything except dot ([^.]+)

Noam Manos
  • 287
  • 1
  • 2
  • 7
0

Using Hitcham's awesome answer above allowed me to come up with this, using sed to output exactly what needed: org/reponame with sed.

output = echo ${git_url} | sed -nr  's/^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+).git$$/\4\/\3/p'`

Works well in ubuntu, doesn't work for the sed available by default on macosx.

0
basename $git_repo_url | tr -d ".git"
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
jit
  • 1