Split string to array of strings with 1-3 words de

2020-07-11 08:22发布

I have following input string

Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia ...

Splitting rules by example

[
     "Lorem ipsum dolor",  // A: Tree words <6 letters  
     "sit amet",           // B: Two words <6 letters if next word >6 letters
     "consectetur",        // C: One word >=6 letters if next word >=6 letters
     "adipiscing elit",    // D: Two words: first >=6, second <6 letters
     "sed doeiusmod",      // E: Two words: firs<6, second >=6 letters
     "tempor"              // rule C
     "incididunt ut"       // rule D
     "Duis aute irure"     // rule A
     "dolor in"            // rule B
     "reprehenderit in"    // rule D
     "esse cillum"         // rule E
     "dolor eu fugia"      // rule D
     ...
]

So as you can see string in array can have min one and max tree words. I try to do it as follows but doesn't work - how to do it?

let s="Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

let a=[""];
s.split(' ').map(w=> {
  let line=a[a.length-1];
  let n= line=="" ? 0 : line.match(/ /g).length // num of words in line
  if(n<3) line+=w+' ';
  n++;
  if(n>=3) a[a.length-1]=line 
}); 

console.log(a);

UPDATE

Boundary conditions: if last words/word not match any rules then just add them as last array element (but two long words cannot be newer in one string)

SUMMARY AND INTERESTING CONCLUSIONS

We get 8 nice answer for this question, in some of them there was discussion about self-describing (or self-explainable) code. The self-describing code is when the person which not read the question is able to easy say what exactly code do after first look. Sadly any of answers presents such code - so this question is example which shows that self-describing is probably a myth

标签: javascript
8条回答
The star\"
2楼-- · 2020-07-11 09:05

(Updated to incorporate suggestion from user633183.)

I found this an interesting problem. I wanted to write a more generic version immediately, and I settled on one that accepted a list of rules, each of which described the number of words that it would gather and a test for each of those words. So with lt6 being essentially (str) => str.length < 6, the first rule (A) would look like this:

[3, lt6, lt6, lt6],

This, it turns out, is quite similar to the solution from CertainPerformance; that answer uses strings to represent two different behaviors; this one uses actual functions. But they are quite similar. The implementation, though is fairly different.

const allMatch = (fns, xs) =>
  fns.every ( (fn, i) =>  fn ( xs[i] ) )

const splitByRules = (rules) => {
  const run = 
    ( xs
    , res = []
    , [count] = rules .find 
        ( ([count, ...fns]) => 
          count <= xs .length 
          && allMatch (fns, xs)
        ) 
        || [1] // if no rules match, choose next word only
    ) => xs.length === 0
      ? res
      : run 
        ( xs .slice (count) 
        , res .concat ([xs .slice (0, count) ])
        )

  return (str) => 
    run (str .split (/\s+/) ) 
      .map (ss => ss .join (' '))
}

const shorterThan = (n) => (s) => 
  s .length < n

const atLeast = (n) => (s) =>
  s .length >= n

const lt6 = shorterThan (6)
const gte6 = atLeast (6)

const rules = [
// +------------- Number of words to select in next block 
// |        +--------- Functions to test againt each word
// |   _____|_____
// V  /           \
  [3, lt6, lt6, lt6],   // A
  [2, lt6, lt6, gte6],  // B
  [1, gte6, gte6],      // C
  [2, gte6, lt6],       // D
  [2, lt6, gte6],       // E
]

const words  = 'Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia ...';

console .log (
  splitByRules (rules) (words) 
)

This uses a recursive function that bottoms out when the remaining list of words is empty and otherwise searches for the first rule that matches (with, again like CertainPerformance, a default rule that simply takes the next word) and selects the corresponding number of words, recurring on the remaining words.

For simplicity, the recursive function accepts an array of words and returns an array of arrays of words. A wrapper function handles converting these to and from strings.

The only other function of substance in here is the helper function allMatch. It is essentially ([f1, f2, ... fn], [x1, x2, ..., xn, ...]) => f1(x1) && f2(x2) && ... && fn(xn).

Of course the currying means that splitByRules (myRules) returns a function you can store and run against different strings.

The order of the rules might be important. If two rules could overlap, you need to put the preferred match ahead of the the other.


This added generality may or may not be of interest to you, but I think this technique has a significant advantage: it's much easier to modify if the rules ever change. Say you now also want to include four words, if they all are fewer than five characters long. Then we would just write const lt5 = shorterThan(5) and include the rule

[4, lt5, lt5, lt5, lt5]

at the beginning of the list.

To me that's a big win.

查看更多
来,给爷笑一个
3楼-- · 2020-07-11 09:09

I also found this problem very interesting. This is a long-format answer which shows the process of how I arrived at the final program. There are several code blocks labeled sketch along the way. I hope for this approach to be helpful to beginners in functional style.

Using the data.maybe module, I started out with -

// sketch 1
const wordsToLines = (words = [], r = []) =>
  words.length === 0
    ? Just (r)
    : ruleA (words)
        .orElse (_ => ruleB (words))
        .orElse (_ => ruleC (words))
        .orElse (_ => ruleD (words))
        .orElse (_ => ruleE (words))
        .orElse (_ => defaultRule (words))
        .chain (({ line, next }) => 
          wordsToLines (next, [...r, line ])
        )

Then I started writing some of the rules ...

// sketch 2
const success = (line, next) =>
  Just ({ line, next })

const defaultRule = ([ line, ...next ]) =>
  success (line, next)

const ruleA = ([ a, b, c, ...more ]) =>
  small (a) && small (b) && small(c)
    ? success (line (a, b, c), more)
    : Nothing ()

const ruleB = ([ a, b, c, ...more ]) =>
  small (a) && small (b) && large (c)
    ? success (line (a, b), [c, ...more])
    : Nothing ()

// ...

Way too messy and repetitive, I thought. As the author of these functions, it's my job to make them work for me! So I restarted this time designing the rules to do the hard work -

// sketch 3
const rule = (guards = [], take = 0) =>
  // TODO: implement me...

const ruleA =
  rule
    ( [ small, small, small ] // pattern to match
    , 3                       // words to consume
    )

const ruleB =
  rule ([ small, small, large ], 2)

// ruleC, ruleD, ruleE, ...

const defaultRule =
  rule ([ always (true) ], 1)

These rules are much simpler. Next, I wanted to clean up wordsToLines a bit -

// sketch 4
const wordsToLines = (words = [], r = []) =>
  words.length === 0
    ? Just (r)
    : oneOf (ruleA, ruleB, ruleC, ruleD, ruleE, defaultRule)
        (words)
        .chain (({ line, next }) => 
          wordsToLines (next, [...r, line ])
        )

In our initial sketch, the rules constructed a {line, next} object, but a higher-order rule means we can hide even more complexity away. And the oneOf helper makes it easy to move our rules inline -

// final revision
const wordsToLines = (words = [], r = []) =>
  words.length === 0
    ? Just (r)
    : oneOf
        ( rule ([ small, small, small ], 3) // A
        , rule ([ small, small, large ], 2) // B
        , rule ([ large, large ], 1)        // C
        , rule ([ large, small ], 2)        // D
        , rule ([ small, large ], 2)        // E
        , rule ([ always (true) ], 1) // default
        )
        ([ words, r ])
        .chain (apply (wordsToLines))

Finally, we can write our main function, formatSentence -

const formatSentence = (sentence = "") =>
  wordsToLines (sentence .split (" "))
    .getOrElse ([])

The wires are mostly untangled now. We just have to supply the remaining dependencies -

const { Just, Nothing } =
  require ("data.maybe")

const [ small, large ] =
  dual ((word = "") => word.length < 6)

const oneOf = (init, ...more) => x =>
  more.reduce((r, f) => r .orElse (_ => f(x)), init (x))

const rule = (guards = [], take = 0) =>
  ([ words = [], r = [] ]) =>
    guards .every ((g, i) => g (words[i]))
      ? Just
          ( [ words .slice (take)
            , [ ...r, words .slice (0, take) .join (" ") ]
            ]
          )
      : Nothing ()

And some functional primitives -

const identity = x =>
  x

const always = x =>
  _ => x

const apply = (f = identity) =>
  (args = []) => f (...args)

const dual = f =>
  [ x => Boolean (f (x))
  , x => ! Boolean (f (x))
  ]

Let's run the program -

formatSentence ("Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia ...")

// [ 'Lorem ipsum dolor'
// , 'sit amet'
// , 'consectetur'
// , 'adipiscing elit'
// , 'sed doeiusmod'
// , 'tempor'
// , 'incididunt ut'
// , 'Duis aute irure'
// , 'dolor in'
// , 'reprehenderit in'
// , 'esse cillum'
// , 'dolor eu fugia'
// , '...'
// ]

View the complete program on repl.it and run it to see the results -

查看更多
登录 后发表回答