Split string to array of strings with 1-3 words de

2020-07-11 08:22发布

I have following input string

Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia ...

Splitting rules by example

[
     "Lorem ipsum dolor",  // A: Tree words <6 letters  
     "sit amet",           // B: Two words <6 letters if next word >6 letters
     "consectetur",        // C: One word >=6 letters if next word >=6 letters
     "adipiscing elit",    // D: Two words: first >=6, second <6 letters
     "sed doeiusmod",      // E: Two words: firs<6, second >=6 letters
     "tempor"              // rule C
     "incididunt ut"       // rule D
     "Duis aute irure"     // rule A
     "dolor in"            // rule B
     "reprehenderit in"    // rule D
     "esse cillum"         // rule E
     "dolor eu fugia"      // rule D
     ...
]

So as you can see string in array can have min one and max tree words. I try to do it as follows but doesn't work - how to do it?

let s="Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

let a=[""];
s.split(' ').map(w=> {
  let line=a[a.length-1];
  let n= line=="" ? 0 : line.match(/ /g).length // num of words in line
  if(n<3) line+=w+' ';
  n++;
  if(n>=3) a[a.length-1]=line 
}); 

console.log(a);

UPDATE

Boundary conditions: if last words/word not match any rules then just add them as last array element (but two long words cannot be newer in one string)

SUMMARY AND INTERESTING CONCLUSIONS

We get 8 nice answer for this question, in some of them there was discussion about self-describing (or self-explainable) code. The self-describing code is when the person which not read the question is able to easy say what exactly code do after first look. Sadly any of answers presents such code - so this question is example which shows that self-describing is probably a myth

标签: javascript
8条回答
欢心
2楼-- · 2020-07-11 08:51

If we define words with length <6 to have size 1 and >=6 to have size 2, we can rewrite the rules to "if the next word would make the total size of the current row >= 4, start next line".

function wordSize(word) {
  if (word.length < 6) 
    return 1;
  return 2;
}
let s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusd tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";
var result = [];
var words = s.split(" ");
var row = [];
for (var i = 0; i < words.length; ++i) {
  if (row.reduce((s, w) => s + wordSize(w), 0) + wordSize(words[i]) >= 4) {
    result.push(row);
    row = [];
  }
  row.push(words[i]);
}
result.push(row);
result = result.map(a => a.join(" "));
console.log(result);

查看更多
成全新的幸福
3楼-- · 2020-07-11 08:55

This sounds like a problem you would get during a job interview or on a test. The right way to approach this problem is to think about how to simplify the problem into something that we can understand and write legible code for.

We know that there are two conditions: smaller than six or not. We can represent each word in the string as a binary digit being 0(smaller than 6) or 1(larger than 6).

Turning the string of words into a string of binary will make it easier to process and understand:

const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

const b = s.split(' ').reduce((array, word) => {
  return array + (word.length >= 6 ? "1" : "0");
}, "");

console.log(b);

Next we need to simplify the rules. Each rule can be thought of as a string of binary(a set of words). Since some rules are more complicated than others, adding the next word we will think of as part of the string:

  1. [0,0,0] -> 000
  2. [0,0,1] -> 001
  3. [1,1] -> 11
  4. [1,0] -> 10
  5. [0,1] -> 01

For a string of numbers remaining, whichever rule fits at the beginning will be the next set of strings. This is a pretty simple logical operation:

const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

let b = s.split(' ').reduce((array, word) => {
  return array + (word.length >= 6 ? "1" : "0");
}, "");

//console.log(b);
let a = '';
while (b != "") {
  switch (0) {
    case b.indexOf('000'):
      b = b.substring(3);
      a += '3';
      break;
    case b.indexOf('10'):
      b = b.substring(2);
      a += '2';
      break;
    case b.indexOf('01'):
      b = b.substring(2);
      a += '2';
      break;
    case b.indexOf('001'):
      b = b.substring(2);
      a += '2';
      break;
    case b.indexOf('11'):
      b = b.substring(1);
      a += '1';
      break;
  }
}

console.log(a);
//Go through the string of multi-word lengths and turn the old string into separate strings. 

const acc = [];
words = s.split(' ');
for (let index in a) {
  acc.push(words.splice(0, a[index]).join(' '));
}
console.log(acc);

YAY! We successfully converted a complex problem into something easy to understand. While this is not the shortest solution, it is very elegant, and there is still room for improvement without sacrificing readability(compared to some other solutions).

This way of conceptualizing the problem opens doors for more rules or even more complex states(0,1,2).

查看更多
爷的心禁止访问
4楼-- · 2020-07-11 09:01

One option is to first create an array of rules, like:

const rules = [
  // [# of words to splice if all conditions met, condition for word1, condition for word2, condition for word3...]
  [3, 'less', 'less', 'less'],
  // the above means: splice 3 words if the next 3 words' lengths are <6, <6, <6
  [2, 'less', 'less', 'eqmore'],
  // the above means: splice 2 words if the next 3 words' lengths are <6, <6, >=6
  [1, 'eqmore', 'eqmore'],
  [2, 'eqmore', 'less'],
  [2, 'less', 'eqmore']
];

Then iterate through the array of rules, finding the rule that matches, extracting the appropriate number of words to splice from the matching rule, and push to the output array:

    const rules = [
      [3, 'less', 'less', 'less'],
      [2, 'less', 'less', 'eqmore'],
      [1, 'eqmore', 'eqmore'],
      [2, 'eqmore', 'less'],
      [2, 'less', 'eqmore']
    ];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
  const [wordCount] = rules.find(
    ([wordCount, ...conds]) => conds.every((cond, i) => verify(cond, words[i]))
  );
  output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);

Of course, the .find assumes that every input string will always have a matching rule for each position spliced.

For the additional rule that any words not matched by the previous rules just be added to the output, put [1] into the bottom of the rules array:

const rules = [
      [3, 'less', 'less', 'less'],
      [2, 'less', 'less', 'eqmore'],
      [1, 'eqmore', 'eqmore'],
      [2, 'eqmore', 'less'],
      [2, 'less', 'eqmore'],
      [1]
    ];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
  const [wordCount] = rules.find(
    ([wordCount, ...conds]) => conds.every((cond, i) => words[i] && verify(cond, words[i]))
  );
  output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);

查看更多
一纸荒年 Trace。
5楼-- · 2020-07-11 09:01

No tricks needed. This code traverses the array of words, and check the rules for each sequence of 3. The rules are applied trying to do less loops and creating less intermediary objects possible, resulting in a good performance and memory usage.

function apply_rules(stack, stack_i) {

    let small_word_cnt = 0;

    for(let i = 0; i<= 2; i++){

        //Not enough elements to trigger a rule
        if(!stack[stack_i+i]){
            return stack.slice(stack_i, stack.length);
        }

        //Increment the small word counter
        small_word_cnt += stack[stack_i+i].length < 6;

        //2 big words
        if(i== 1 && small_word_cnt == 0){
            return [stack[stack_i]];
        }

        //3 small words
        if(small_word_cnt == 3){
            return stack.slice(stack_i,stack_i+3);
        }
    }

    //mixed small and big words;
    return stack.slice(stack_i,stack_i+2);
}

function split_text(text) {
    const words = text.split(' '), results = [];
    let i = 0;

    while(i < words.length) {
        const chunk = apply_rules(words, i);
        i+= chunk.length;
        results.push(chunk.join(' '));
    }

    return results;
}

console.log(split_text("Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia"));

查看更多
Bombasti
6楼-- · 2020-07-11 09:04

I write in short and faster (in terms of time complexity: I not calc sum by reduce in each loop iteration) version of idea proposed in BoltKey answer (if you want vote up please do it on his answer).

Main idea

  • ws is word size where we have only two values 1 (short word) and 2 (long word)
  • s is current line size in loop (we iterate over each word size)
  • if current line size plus next word size s+ws>3 the rules are broken (and this is MAIN IDEA discovered by BoltKey)
  • if rules are NOT broken then add word to line l, and it size to line size s
  • if rules are broken then add line l to output array r and clean l and s
  • in return statement we handle termination case: add last words from line l to result if l is not empty

let s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusd tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

function split(n,str) {
  let words= str.split(' '), s=0, l=[], r=[];
  
  words.forEach(w=>{ 
    let ws= w.length<n ? 1:2;
    if(s+ws>3) r.push(l.join(' ')), s=0, l=[];
    l.push(w), s+=ws;
  })

  return l.length ? r.concat(l.join(' ')) : r;
}

console.log( split(6,s) );

查看更多
成全新的幸福
7楼-- · 2020-07-11 09:05

You can express your rules as abbreviated regular expressions, build a real regex from them and apply it to your input:

text = "Lorem ipsum, dolor. sit amet? consectetur,   adipiscing,  elit! sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia bla?";

rules = ['(SSS)', '(SS(?=L))', '(L(?=L))', '(SL)', '(LS)', '(.+)']

regex = new RegExp(
    rules
        .join('|')
        .replace(/S/g, '\\w{1,5}\\W+')
        .replace(/L/g, '\\w{6,}\\W+')
    , 'g')

console.log(text.match(regex))

If the rules don't change, the regex construction part is only needed once.

Note that this also handles punctuation in a reasonable way.

查看更多
登录 后发表回答