QRegExp to extract string between a tag in html

2019-07-30 17:01发布

The situation is tricky as I do not have access to webkits on qt module, I am forced to parse an HTML file using QRegExp:

The file contains strings which I need to extract which are well placed between li tags.

If I write a QRegExp

QRegExp ("[^</li>]([a-zA-Z0-9_./]+)");

I could Extract all the strings between li tag. But all I need are:

Pg_1_qds_Bin_Indicator_2

Pg_1_qds_Bin_Indicator_3

Pg_1_qds_Ana_Indicator_1 and all the names similar to this enclosed between li

someother names include which are not in the part of the file enclosed but there in the full file: TEMPLATE_LOGO

Pg_1_Command_By_Text

All the names start with Pg_ except for one which is TEMPLATE_LOGO_

I feel that the other lines have characters like [ , or another tag in between to identify that the string is not needed in that line.

The file is found below, So, TL;DR Need a QRegExp to extract the above mentioned names found in between the li tags.

<ul>
  <li><a href="#symbols">Symbol report</a></li>
<ul>
  <li><a href="#symbolsConsistency">Consistency</a></li>
  <li><a href="#symbolCharacteristics">Symbol characteristics</a></li>
  <li><a href="#basicSymbols">Display of basic symbols</a></li>


    <ul>
      <li>Pg_1_qds_Bin_Indicator_2</li>
    <ul>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionalignment] = (Right)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptiontextcolor] = (Color {0, 0, 0, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.isdescriptiondisplayed] = (true)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontstyle] = (Normal)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontposition] = (LEFT)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontext] = (v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.backgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.digitnumber] = (8)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetfont] = (FONT1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.shortname] = (Pg_1_qds_Bin_Indicator_2)</li>
      <li>[QDSConsistency.report.field.logicIndicator.precision] = (2)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontstyle] = (NORMAL)</li>
      <li>[QDSConsistency.report.field.analogIndicator.dynamicbackgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.longname] = (Pg_1_qds_Bin_Indicator_2_v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.heith] = (32)</li>
      <li>[QDSConsistency.report.field.logicIndicator.weigth] = (50)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxX] = (352)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxY] = (116)</li>
      <li>[QDSConsistency.report.field.logicIndicator.valuealignment] = (Left)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_0] = (Off)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_1] = (On)</li>
    </ul>
      <li>Pg_1_qds_Bin_Indicator_3</li>
    <ul>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionalignment] = (Right)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptiontextcolor] = (Color {0, 0, 0, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.isdescriptiondisplayed] = (true)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontstyle] = (Normal)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontposition] = (LEFT)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontext] = (v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.backgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.digitnumber] = (8)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetfont] = (FONT1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.shortname] = (Pg_1_qds_Bin_Indicator_3)</li>
      <li>[QDSConsistency.report.field.logicIndicator.precision] = (2)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontstyle] = (NORMAL)</li>
      <li>[QDSConsistency.report.field.analogIndicator.dynamicbackgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.longname] = (Pg_1_qds_Bin_Indicator_3_v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.heith] = (32)</li>
      <li>[QDSConsistency.report.field.logicIndicator.weigth] = (50)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxX] = (446)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxY] = (187)</li>
      <li>[QDSConsistency.report.field.logicIndicator.valuealignment] = (Left)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_0] = (Off)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_1] = (On)</li>
    </ul>
    </ul>
    <p><em>Analog indicator :</em></p>
    <ul>
      <li>Pg_1_qds_Ana_Indicator_1</li>
    <ul>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionalignment] = (Right)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontextcolor] = (Color {0, 0, 0, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.isdescriptiondisplayed] = (true)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontstyle] = (Normal)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontposition] = (LEFT)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontext] = (v0)</li>
      <li>[QDSConsistency.report.field.analogIndicator.backgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.digitnumber] = (8)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetfont] = (FONT1)</li>
      <li>[QDSConsistency.report.field.analogIndicator.shortname] = (Pg_1_qds_Ana_Indicator_1)</li>
      <li>[QDSConsistency.report.field.analogIndicator.precision] = (2)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontstyle] = (NORMAL)</li>
      <li>[QDSConsistency.report.field.analogIndicator.dynamicbackgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.longname] = (Pg_1_qds_Ana_Indicator_1_v0)</li>
      <li>[QDSConsistency.report.field.analogIndicator.heith] = (32)</li>

4条回答
萌系小妹纸
2楼-- · 2019-07-30 17:14

Based on your comment, see if this works for you: <li>.*\((Pg_1[^)]*|TEMPLATE_LOGO).*?<\/li>

Should match any string beginning with "Pg_1", or specifically "TEMPLATE_LOGO", that are found between li tags.

查看更多
3楼-- · 2019-07-30 17:28

regex101.com/r/yG9aZ8/2 ure solution led to the final soultion and if u could post this I would close this post but i need to improve it for TEMPLATE_LOGO

Then just add a 2nd Alternative: TEMPLATE_LOGO.*:

QRegExp exp1("<li>(Pg_.*|TEMPLATE_LOGO_.*)<\\/li>");

Credit Goes to Mr. Trey and Mr. Wiktor Stribiżew,their answers led to the solution that is desired.

查看更多
不美不萌又怎样
4楼-- · 2019-07-30 17:33

No, you're not forced to parse html using QRegExp. A regular expression matcher can only match regular syntax languages. HTML isn't a language with regular syntax. So it will not ever reliably work. Use an HTML parser! I suggest Gumbo. It's a stand-alone C-based parser with an easy to use API.

查看更多
干净又极端
5楼-- · 2019-07-30 17:36

Credit goes to Mr. Trey and Mr. Wiktor Stribiżew,their answers led to the solution that is desired.

QRegExp exp1("<li>(Pg_.*|TEMPLATE_LOGO_.*)<\\/li>");
查看更多
登录 后发表回答