convert tab/space delimited lines into nested arra

2020-02-29 10:56发布

I would like convert the below text into a nested array, something like you would get with MPTT database structure.

I am getting the data from a shell script and need to display it on a website. Don't have any control over the format :/

There is lots of information about array -> list, but not much going the other way.

Any input would be appreciated, thanks.

cat, true cat
       => domestic cat, house cat, Felis domesticus, Felis catus
           => kitty, kitty-cat, puss
           => mouser
           => alley cat
           => tom, tomcat
               => gib
           => Angora, Angora cat
           => Siamese cat, Siamese
               => blue point Siamese
       => wildcat
           => sand cat
           => European wildcat, catamountain, Felis silvestris
           => cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
           => ocelot, panther cat, Felis pardalis
           => manul, Pallas's cat, Felis manul
           => lynx, catamount
               => common lynx, Lynx lynx
               => Canada lynx, Lynx canadensis

2条回答
爷、活的狠高调
2楼-- · 2020-02-29 11:08

You have got a sorted tree list here already. Each next line is either a child of the previous line or a sibling. So you can process over the list, get the name of an item, gain the level an item is in by it's indentation and create an element out of it.

1 Line <=> 1 Element (level, name)

So every element has a name and zero or more children. From the input it can also be said which level it belongs to.

An element can be represented as an array, in which it's first value is the name and the second value is an array for the children.

As the list is sorted, we can use a simple map, which per level is an alias to the children of a certain level. So with the level each element has, we can add it to the stack:

    $self = array($element, array());
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[1];

As this code-example shows, the stack/map for the current level is getting $self as children added:

    $stack[$level][] = &$self;

The stack for the level one higher, get's the reference to the children of $self (index 1):

    $stack[$level + 1] = &$self[1];

So now per each line, we need to find the level. As this stack shows, the level is sequentially numbered: 0, 1, 2, ... but in the input it's just a number of spaces.

A little helper object can do the work to collect/group the number of characters in a string to levels, taking care that - if a level yet does not exist for an indentation - it is added, but only if higher.

This solves the problem that in your input there is no 1:1 relation between the size of the indentation and it's index. At least not an obvious one.

This helper object is exemplary named Levels and implements __invoke to provide the level for an indent while transparently adding a new level if necessary:

$levels = new Levels();
echo $levels(''); # 0
echo $levels('    '); # 1
echo $levels('    '); # 1
echo $levels('      '); # 2
echo $levels(' '); # Throws Exception, this is smaller than the highest one

So now we can turn indentation into the level. That level allows us to run the stack. The stack allows to build the tree. Fine.

The line by line parsing can be easily done with a regular expression. As I'm lazy, I just use preg_match_all and return - per line - the indentation and the name. Because I want to have more comfort, I wrap it into a function that does always return me an array, so I can use it in an iterator:

$matches = function($string, $pattern)
{
    return preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)
        ? $matches : array();
};

Using on input with a pattern like

/^(?:(\s*)=> )?(.*)$/m

will give me an array per each line, that is:

array(whole_line, indent, name)

You see the pattern here? It's close to

1 Line <=> 1 Element (level, name)

With help of the Levels object, this can be mapped, so just a call of a mapping function:

function (array $match) use ($levels) {
    list(, $indent, $name) = $match;
    $level = $levels($indent);
    return array($level, $name);
};

From array(line, indent, name) to array(level, name). To have this accessible, this is returned by another function where the Levels can be injected:

$map = function(Levels $levels) {
    return function ...
};
$map = $map(new Levels());

So, everything is in order to read from all lines. However, this needs to be placed into the the tree. Remembering adding to the stack:

function($level, $element) use (&$stack) {
    $self = array($element, array());
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[1];
};

($element is the name here). This actually needs the stack and the stack is actually the tree. So let's create another function that returns this function and allow to push each line onto the stack to build the tree:

$tree = array();
$stack = function(array &$tree) {
    $stack[] = &$tree;
    return function($level, $element) use (&$stack) {
        $self = array($element, array());
        $stack[$level][] = &$self;
        $stack[$level + 1] = &$self[1];
    };
};
$push = $stack($tree);

So the last thing to do is just to process one element after the other:

foreach ($matches($input, '/^(?:(\s*)=> )?(.*)$/m') as $match) {
    list($level, $element) = $map($match);
    $push($level, $element);
}

So now with the $input given this creates an array, with only (root) child nodes on it's first level and then having an array with two entries per each node:

array(name, children)

Name is a string here, children an array. So this has already done the list to array / tree here technically. But it's rather burdensome, because you want to be able to output the tree structure as well. You can do so by doing recursive function calls, or by implementing a recursive iterator.

Let me give an Recursive Iterator Example:

class TreeIterator extends ArrayIterator implements RecursiveIterator
{
    private $current;

    public function __construct($node)
    {
        parent::__construct($node);
    }

    public function current()
    {
        $this->current = parent::current();
        return $this->current[0];
    }

    public function hasChildren()
    {
        return !empty($this->current[1]);
    }

    public function getChildren()
    {
        return new self($this->current[1]);
    }
}

This is just an array iterator (as all nodes are an array, as well as all child nodes) and for the current node, it returns the name. If asked for children, it checks if there are some and offers them again as a TreeIterator. That makes using it simple, e.g. outputting as text:

$treeIterator = new RecursiveTreeIterator(
    new TreeIterator($tree));

foreach ($treeIterator as $val) echo $val, "\n";

Output:

\-cat, true cat
  |-domestic cat, house cat, Felis domesticus, Felis catus
  | |-kitty, kitty-cat, puss
  | |-mouser
  | |-alley cat
  | |-tom, tomcat
  | | \-gib
  | |-Angora, Angora cat
  | \-Siamese cat, Siamese
  |   \-blue point Siamese
  \-wildcat
    |-sand cat
    |-European wildcat, catamountain, Felis silvestris
    |-cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
    |-ocelot, panther cat, Felis pardalis
    |-manul, Pallas's cat, Felis manul
    \-lynx, catamount
      |-common lynx, Lynx lynx
      \-Canada lynx, Lynx canadensis

If you're looking for more HTML output control in conjunction with an recursive iterator, please see the following question that has an example for <ul><li> based HTML output:

So how does this look like all together? The code to review at once as a gist on github.

查看更多
够拽才男人
3楼-- · 2020-02-29 11:12

In contrast to my previous answer that is quite a bit long and explains all the steps, it's also possible to do the same but more compressed.

  • The line splitting can be done with strtok
  • The preg_match then "on" the line making mapping more immanent
  • The Levels can be compressed into an array taken for granted that the input is correct.

This time for the output, it's a recursive function not iterator that spills out a nested <ul> list. Example code (Demo):

// build tree
$tree = $levels = array();
$stack[1] = &$tree;
for ($line = strtok($input, $token = "\n"); $line; $line = strtok($token)) {
    if (!preg_match('/^(?:(\s*)=> )?(.*)$/', $line, $self)) {
        continue;
    }
    array_shift($self);
    $indent = array_shift($self);
    $level = @$levels[$indent] ? : $levels[$indent] = count($levels) + 1;
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[];
    unset($self);
}
unset($stack);

// output
tree_print($tree);
function tree_print(array $tree, $in = '') {
    echo "$in<ul>\n";
    $i = $in . '  ';
    foreach ($tree as $n)
        printf("</li>\n", printf("$i<li>$n[0]") && $n[1] && printf($i, printf("\n") & tree_print($n[1], "$i  ")));

    echo "$in</ul>\n";
}

Edit: The following goes even one step further to completely drop the tree array and do the output directly. This is a bit mad because it mixes the reordering of the data and the output, which tights things together so not easy to change. Also the previous example already looks cryptic, this is beyond good and evil (Demo):

echo_list($input);

function echo_list($string) {
    foreach ($m = array_map(function($v) use (&$l) {
        return array(@$l[$i = &$v[1]] ? : $l[$i] = count($l) + 1, $v[2]);
    }, preg_match_all('/^(?:(\s*)=> )?(.*)$/m', $string, $m, PREG_SET_ORDER) ? $m : array()) as $i => $v) {
        $pi = str_repeat("    ", $pl = @$m[$i - 1][0]); # prev
        $ni = str_repeat("    ", $nl = @$m[$i + 1][0]); # next
        (($v[0] - $pl) > 0) && printf("$pi<ul>\n");     # is child of prev
        echo '  ' . str_repeat("    ", $v[0] - 1), "<li>$v[1]"; # output self
        if (!$close = (($nl - $v[0]) * -1)) echo "</li>\n"; # has sibling
        else if ($close < 0) echo "\n";                     # has children
        else for (printf("</li>\n$ni" . str_repeat("    ", $close - 1) . "</ul>\n"); --$close;) # is last child
                echo $ni, $nn = str_repeat("    ", $close - 1), "  </li>\n",
                     $ni, $nn, "</ul>\n";
    }
}

This drops strtok again and goes back to the idea to use preg_match_all. Also it stores all lines parsed, so that it's possible to look behind and ahead to determine how many <ul> elements need to be opened or closed around the current element.

查看更多
登录 后发表回答