regular expression attempt

(\\section\{|\\subsection\{|\\subsubsection\{|\\paragraph[^{]*\{)(\w)\w*([ |\}]*)

search text

\section{intro to installation of apps}
\subsection{another heading for \myformatting{special}}
\subsubsection{good morning, San Francisco}
\paragraph{installation of backend services}

desired output

All initial characters are capitalized except prepositions, conjunctions, and the usual parts of speech that are made upper case on titles.

I supposed I should really narrow this down, so let me borrow from the U.S. Government Printing Office Style Manual:

The articles a, an, and the; the prepositions at, by, for, in, of, on, to, and up; the conjunctions and, as, but, if, or, and nor; and the second element of a compound numeral are not capitalized.

Page 41

\subsection{Installation guide for the server-side app \myapp{webgen}}

changes to

\subsection{Installation Guide for the Server-side App \myapp{Webgen}}

\subsection{Installation Guide for the Server-side App \myapp{webgen}}

How would you name this type of string modification?

Applying REGEX to a string between strings?
Applying REGEX to a part of a string when that part falls between two other strings of characters?
Applying REGEX to a substring that occurs between two other substrings within a string?
<something else>

problem

I match each latex heading command, including the {. This means that my expresion does not match more than the first word in the actually heading text. I cannot surround the whole heading code with an "OR space" because then I will find nearly every word in the document. Also, I have to be careful of formatting commands within the headings themselves.

other helpful related questions

标签： regex perl text awk sed

2条回答

何必那么认真

2楼-- · 2019-07-14 06:12

Here is an example of how you could do it in Perl using the module Lingua::EN::Titlecase and recursive regular expressions :

use strict;
use warnings;

use Lingua::EN::Titlecase;

my $tc = Lingua::EN::Titlecase->new();
my $data = do {local $/; <> };

my ($kw_regex) = map { qr/$_/ }
  join '|', qw(section subsection subsubsection paragraph);
$data =~ s/(\\(?: $kw_regex))(\{(?:[^{}]++|(?2))*\})/title_case($tc,$1,$2)/gex;
print $data;

sub title_case {
    my ($tc, $p1, $p2) = @_;

    $p2 =~ s/^\{//;
    $p2 =~ s/\}$//;
    if ($p2 =~ /\\/ ) {
        while ($p2 =~ /\G(.*?)(\\.*?)(\{(?:[^{}]++|(?3))*\})/ ) {
            my $next_pos = $+[0];
            substr($p2, $-[1], $+[1] -$-[1], $tc->title($1));
            substr($p2, $-[3], $+[3] -$-[3], title_case($tc,'',$3));
            pos($p2) = $next_pos;
        }
        $p2 =~ s/\G(.+)$/$tc->title($1)/e;
    }
    else {
        $p2 = $tc->title($p2);
    }
    return $p1 . '{' . $p2 . '}';
}

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2019-07-14 06:17

So it seems to me as if you need to implement pseudo-code like this:

Are we on the first word? If yes, capitalize it and move on.
Is the current word "reserved"? If yes, lower it and move on.
Is the current word a numeral? If yes, lower it and move on.
Are we still in the list? If yes, print the line verbatim and move on.

One other helpful rule might be to leave fully upper-case words as they are, just in case they're acronyms.

The following awk script might do what you need.

#!/usr/bin/awk -f

function toformal(subject) {
  return toupper(substr(subject,1,1)) tolower(substr(subject,2))
}

BEGIN {
  # Reserved word list gets split into an array for easy matching.
  reserved="at by for in of on to up and as but if or nor";
  split(reserved,a_reserved," "); for(i in a_reserved) r[a_reserved[i]]=1;
  # Same with the list of compound numerals. If this isn't what you mean, say so.
  numerals="hundred thousand million billion";
  split(numerals,a_numerals," "); for(i in a_numerals) n[a_numerals[i]]=1;
}

# This awk condition matches the lines we're interested in modifying.
/^\\(section|subsection|subsubsection|paragraph)[{]/ {

  # Separate the particular section and the text, then split text to an array.
  section=$0; sub(/\\/,"",section); sub(/[{].*/,"",section);
  text=$0; sub(/^[^{]*[{]/,"",text); sub(/[}].*/,"",text);
  size=split(text,atext,/[[:space:]]/);

  # First word...
  newtext=toformal(atext[1]);

  for(i=2; i<=size; i++) {
    # Reserved word...
    if (r[tolower(atext[i])]) { newtext=newtext " " atext[i]; continue; }
    # Compound numerals...
    if (n[tolower(atext[i])]) { newtext=newtext " " tolower(atext[i]); continue; }
#    # Acronyms maybe...
#    if (atext[i] == toupper(atext[i])) { newtext=newtext " " atext[i]; continue; }
    # Everything else...
    newtext=newtext " " toformal(atext[i]);
  }

  print newtext;
  next;

}

# Print the line if we get this far. This is a non-condition with
# a print-only statement.
1

0人赞添加讨论(0) 举报

How can I use regex with sed (or equivalent unix c

regular expression attempt

search text

desired output

problem

other helpful related questions

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间