Split html row into string array

I have data in an html file, in a table:

<table>
    <tr><td>001</td><td>MC Hammer</td><td>Can't Touch This</td></tr>
    <tr><td>002</td><td>Tone Loc</td><td>Funky Cold Medina</td></tr>
    <tr><td>003</td><td>Funkdoobiest</td><td>Bow Wow Wow</td></tr>
</table>

How do I split a single row into an array or list?

string row = streamReader.ReadLine();

List<string> data = row.Split //... how do I do this bit?

string artist = data[1];

标签： c# regex string split toarray

4条回答

Viruses.

2楼-- · 2019-09-08 02:15

If your HTML is well-formed you could use LINQ to XML:

string input = @"<table>
    <tr><td>001</td><td>MC Hammer</td><td>Can't Touch This</td></tr>
    <tr><td>002</td><td>Tone Loc</td><td>Funky Cold Medina</td></tr>
    <tr><td>003</td><td>Funkdoobiest</td><td>Bow Wow Wow</td></tr>
</table>";

var xml = XElement.Parse(input);

// query each row
foreach (var row in xml.Elements("tr"))
{
    foreach (var item in row.Elements("td"))
    {
        Console.WriteLine(item.Value);
    }
    Console.WriteLine();
}

// if you really need a string array...
var query = xml.Elements("tr")
               .Select(row => row.Elements("td")
                                 .Select(item => item.Value)
                                 .ToArray());

foreach (var item in query)
{
    // foreach over item content
    // or access via item[0...n]
}

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2019-09-08 02:22

When parsing HTML, I usually turn to the HTML Agility Pack.

0人赞添加讨论(0) 举报

地球回转人心会变

4楼-- · 2019-09-08 02:25

Short answer: never try to parse HTML from the wild with regular expressions. It will most likely come back to haunt you.

Longer answer: As long as you can absolutely, positively guarantee that the HTML that you are parsing fits the given structure, you can use string.Split() as Jenni suggested.

string html = "<tr><td>001</td><td>MC Hammer</td><td>Can't Touch This</td></tr>";

string[] values = html.Split(new string[] { "<tr>","</tr>","<td>","</td>" }, StringSplitOptions.RemoveEmptyEntries);

List<string> list = new List<string>(values);

Listing the tags independently keeps this slightly more readable, and the .RemoveEmptyEntries will keep you from getting an empty string in your list between adjacent closing and opening tags.

If this HTML is coming from the wild, or from a tool that may change - in other words, if this is more than a one-off transaction - I strongly encourage you to use something like the HTML Agility Pack instead. It's pretty easy to integrate, and there are lots of examples on the Intarwebs.

0人赞添加讨论(0) 举报

▲ chillily

5楼-- · 2019-09-08 02:31

You could try:

Row.Split /<tr><td>|<\/td><td>|<\/td><\/tr>/

But it depends on how regular the HTML is. Is it programmatically generated, or does a human write it? You should only use a regular expression if you're sure it will always be generated the same way, otherwise you should use a proper HTML parser

0人赞添加讨论(0) 举报

Split html row into string array

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间