In my bash script I need to extract just the path from the given URL. For example, from the variable containing string:
http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth
I want to extract to some other variable only the:
/one/more/dir/file.exe
part. Of course login, password, filename and parameters are optional.
Since I am new to sed and awk I ask you for help. Please, advice me how to do it. Thank you!
How does this :?
If you have a gawk:
or
Gnu awk can use regular expression as field separators(FS).
I agree that "cut" is a wonderful tool on the command line. However, a more purely bash solution is to use a powerful feature of variable expansion in bash. For example:
GNU
grep
BSD
grep
ripgrep
To get other parts of URL, check: Getting parts of a URL (Regex).
There are built-in functions in bash to handle this, e.g., the string pattern-matching operators:
For example:
All this from the excellent book: "A Practical Guide to Linux Commands, Editors, and Shell Programming by Mark G. Sobell (http://www.sobell.com/)
The Perl snippet is intriguing, and since Perl is present in most Linux distros, quite useful, but...It doesn't do the job completely. Specifically, there is a problem in translating the URL/URI format from UTF-8 into path Unicode. Let me give an example of the problem. The original URI may be:
The corresponding path would be:
%20
became space,%C3%A9
became 'é'. Is there a Linux command, bash feature, or Perl script that can handle this transformation, or do I have to write a humongous series of sed substring substitutions? What about the reverse transformation, from path to URL/URI?(Follow-up)
Looking at http://search.cpan.org/~gaas/URI-1.54/URI.pm, I first saw the as_iri method, but that was apparently missing from my Linux (or is not applicable, somehow). Turns out the solution is to replace the "->path" part with "->file". You can then break that further down using basename and dirname, etc. The solution is thus:
Oddly, using "->dir" instead of "->file" does NOT extract the directory part: rather, it formats the URI so it can be used as an argument to mkdir and the like.
(Further follow-up)
Any reason why the line cannot be shortened to this?