Extract filename and path from URL in bash script

2019-03-09 02:40发布

In my bash script I need to extract just the path from the given URL. For example, from the variable containing string:

http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth

I want to extract to some other variable only the:

/one/more/dir/file.exe

part. Of course login, password, filename and parameters are optional.

Since I am new to sed and awk I ask you for help. Please, advice me how to do it. Thank you!

标签: bash url parsing
13条回答
狗以群分
2楼-- · 2019-03-09 02:59

How does this :?

echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | \
sed 's|.*://[^/]*/\([^?]*\)?.*|/\1|g'
查看更多
Anthone
3楼-- · 2019-03-09 03:01

If you have a gawk:

$ echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | \
  gawk '$0=gensub(/http:\/\/[^/]+(\/[^?]+)\?.*/,"\\1",1)'

or

$ echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | \
  gawk -F'(http://[^/]+|?)' '$0=$2'

Gnu awk can use regular expression as field separators(FS).

查看更多
三岁会撩人
4楼-- · 2019-03-09 03:01

I agree that "cut" is a wonderful tool on the command line. However, a more purely bash solution is to use a powerful feature of variable expansion in bash. For example:

pass_first_last='password,firstname,lastname'

pass=${pass_first_last%%,*}

first_last=${pass_first_last#*,}

first=${first_last%,*}

last=${first_last#*,}

or, alternatively,

last=${pass_first_last##*,}
查看更多
疯言疯语
5楼-- · 2019-03-09 03:04
url="http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth"

GNU grep

$ grep -Po '\w\K/\w+[^?]+' <<<$url
/one/more/dir/file.exe

BSD grep

$ grep -o '\w/\w\+[^?]\+' <<<$url | tail -c+2
/one/more/dir/file.exe

ripgrep

$ rg -o '\w(/\w+[^?]+)' -r '$1' <<<$url
/one/more/dir/file.exe

To get other parts of URL, check: Getting parts of a URL (Regex).

查看更多
三岁会撩人
6楼-- · 2019-03-09 03:05

There are built-in functions in bash to handle this, e.g., the string pattern-matching operators:

  1. '#' remove minimal matching prefixes
  2. '##' remove maximal matching prefixes
  3. '%' remove minimal matching suffixes
  4. '%%' remove maximal matching suffixes

For example:

FILE=/home/user/src/prog.c
echo ${FILE#/*/}  # ==> user/src/prog.c
echo ${FILE##/*/} # ==> prog.c
echo ${FILE%/*}   # ==> /home/user/src
echo ${FILE%%/*}  # ==> nil
echo ${FILE%.c}   # ==> /home/user/src/prog

All this from the excellent book: "A Practical Guide to Linux Commands, Editors, and Shell Programming by Mark G. Sobell (http://www.sobell.com/)

查看更多
甜甜的少女心
7楼-- · 2019-03-09 03:05

The Perl snippet is intriguing, and since Perl is present in most Linux distros, quite useful, but...It doesn't do the job completely. Specifically, there is a problem in translating the URL/URI format from UTF-8 into path Unicode. Let me give an example of the problem. The original URI may be:

file:///home/username/Music/Jean-Michel%20Jarre/M%C3%A9tamorphoses/01%20-%20Je%20me%20souviens.mp3

The corresponding path would be:

/home/username/Music/Jean-Michel Jarre/Métamorphoses/01 - Je me souviens.mp3

%20 became space, %C3%A9 became 'é'. Is there a Linux command, bash feature, or Perl script that can handle this transformation, or do I have to write a humongous series of sed substring substitutions? What about the reverse transformation, from path to URL/URI?

(Follow-up)

Looking at http://search.cpan.org/~gaas/URI-1.54/URI.pm, I first saw the as_iri method, but that was apparently missing from my Linux (or is not applicable, somehow). Turns out the solution is to replace the "->path" part with "->file". You can then break that further down using basename and dirname, etc. The solution is thus:

path=$( echo "$url" | perl -MURI -le 'chomp($url = <>); print URI->new($url)->file' )

Oddly, using "->dir" instead of "->file" does NOT extract the directory part: rather, it formats the URI so it can be used as an argument to mkdir and the like.

(Further follow-up)

Any reason why the line cannot be shortened to this?

path=$( echo "$url" | perl -MURI -le 'print URI->new(<>)->file' )
查看更多
登录 后发表回答