String truncate on length, but no chopping up of w

2020-02-09 07:05发布

问题:

I want to limit a string field length in MYSQL on a certain length, but I don't want any chopping up of words to occur.

When I do:

SELECT SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 28)

I get this as output:

Business Analist met focus o

But I would like

Business Analist met focus

How can I enforce a limit of 28 chars, but prevent chopping up words? Off course it's easy in [insert programming language of choice here] ;-), but I want to know if it's possible in MYSQL in a simple statement.

回答1:

Let @str be your string and @len the initial position to cut at. Then the necessary steps could be:

  1. Take the leftmost @len characters of @str.

  2. Reverse the substring.

  3. Find the position of the first space in the reversed substring.

  4. Subtract 1 from the position. But if no space was found, let the position remain 0.

  5. Subtract the found position from @len and call it cutpos.

  6. Take the first (leftmost) cutpos characters of @str as str1, take all the other characters (starting from cutpos+1) as str2.

SELECT
  LEFT(str, cutpos) AS str1,
  SUBSTRING(str, cutpos + 1) AS str2
FROM (
  SELECT
    @str AS str,
    @len - IFNULL(NULLIF(LOCATE(' ', REVERSE(LEFT(@str, @len))), 0) - 1, 0) AS cutpos
) s


回答2:

What about splitting on spaces :

SELECT SUBSTRING_INDEX('Business Analist met focus op wet- en regelgeving',' ',4)

will return

Business Analist met focus


回答3:

Very interesting problem. Here's how I did it:

//gets initial string - use 29 instead of 28 to see if the 29th  character is a space
SELECT SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29) 

//inverts the string, so we can get the first 
SELECT REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29))

// find the charindex of the first space (last space in the string not reversed)
SELECT CHARINDEX(' ', REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29)))

// get the substring from the first (last) space
SELECT  SUBSTRING(REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29)), CHARINDEX(' ', REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29))), 29)

// reverse the string again to unfold it.
SELECT REVERSE(SUBSTRING(REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29)), CHARINDEX(' ', REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 29))), 29))


// to try different lengths...
DECLARE  @size  int
select @size = 24
SELECT REVERSE(SUBSTRING(REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, @size)), 
CHARINDEX(' ', REVERSE( SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, @size))), @size))


回答4:

In SQL it would be...

select Substring('Business Analist met focus op wet- en regelgeving', 0 , 28 + 2 - CharIndex(' ',  REVERSE(SUBSTRING('Business Analist met focus op wet- en regelgeving', 0, 28 + 1 )),0))

I dont know if all these functions are available in MYSQL

EDIT: I think for MYSQL substitute "Locate" for "CharIndex"



回答5:

Build on Narnian's answer, here's one that works with two fields (a.product,a.descr) and where "..." is added when the string is truncated. a.descr can be empty as well.

  IF (
CHARACTER_LENGTH(
  IF(
    a.descr = '',
    a.product,
    CONCAT_WS(' - ',a.product,a.descr)
  )
)>35,
IF(
  a.descr = '',
  CONCAT(
    REVERSE(SUBSTRING(REVERSE( SUBSTRING(a.product, 1, 35)), locate(' ', REVERSE( SUBSTRING(a.product, 1, 35))), 35)),
    '...'
  ),
  CONCAT(
    REVERSE(SUBSTRING(REVERSE( SUBSTRING(CONCAT_WS(' - ',a.product,a.descr), 1, 35)), locate(' ', REVERSE( SUBSTRING(CONCAT_WS(' - ',a.product,a.descr), 1, 35))), 35)),
    '...'
  )
),
CONCAT_WS(' - ',a.product,a.descr)
)

I needed something like this so that's why i added it. Might help someone else.



回答6:

@Andriy M. I liked very much your answer :) Anyway I find on my db it works better if you change lines 2 and 3 like this:

SELECT
  IF(LENGTH(str)<=@len,str,LEFT(str, cutpos)) AS str1,
  IF(LENGTH(str)<=@len,'',SUBSTRING(str, cutpos + 1)) AS str2
FROM (
  SELECT
    @str AS str,
    @len - IFNULL(NULLIF(LOCATE(' ', REVERSE(LEFT(@str, @len))), 0) - 1, 0) AS cutpos
  FROM @table
) s

Don't know if it's my fault or what, but the other way it sometimes truncated strings on the first letter when their length was <@len, i.e. "First s" - "tring" instead of "First" "string" when @len=13

I post you a working example:

CREATE TABLE `test` (
  `sometext` varchar(65)
);

INSERT INTO `test` (`sometext`) VALUES
('Firs strin'),
('Alll right'),
('third string'),
('fourth string'),
('a longer example string'),
('Supercalifragilisticexpialidocious');

SELECT
  IF(LENGTH(str)<=12,str,LEFT(str, cutpos)) AS str1,
  IF(LENGTH(str)<=12,'',SUBSTRING(str, cutpos + 1)) AS str2
FROM (
  SELECT
    sometext AS str,
    12 - IFNULL(NULLIF(LOCATE(' ', REVERSE(LEFT(sometext, 12))), 0) - 1, 0) AS cutpos
  FROM test
) s

And here is a not-working example using your original code:

SELECT
  LEFT(str, cutpos) AS str1,
  SUBSTRING(str, cutpos + 1) AS str2
FROM (
  SELECT
    sometext AS str,
    12 - IFNULL(NULLIF(LOCATE(' ', REVERSE(LEFT(sometext,12))), 0) - 1, 0) AS cutpos
  FROM test
) s

I'm not sure if it's a utf8 issue, or I just misinterpreted your code, or what else...



回答7:

Simple function:

DROP FUNCTION IF EXISTS fn_maxlen;
delimiter //
CREATE FUNCTION fn_maxlen(s TEXT, maxlen INT) RETURNS VARCHAR(255)
BEGIN

 RETURN LEFT(s, maxlen - LOCATE(' ', REVERSE(LEFT(s, maxlen))));

END//
delimiter ;

Use:

SELECT fn_maxlen('Business Analist met focus op wet- en regelgeving', 28);


回答8:

Seems like people don't read the mysql manual:

Original: SELECT SUBSTRING('Business Analist met focus op wet- en regelgeving', 1, 28) gives broken words.

Modified: SELECT SUBSTRING_INDEX('Business Analist met focus op wet- en regelgeving', ' ' , 4) gives unbroken words

SUBSTRING_INDEX(string, delimiter, number) will truncate a string by the number of times delimiter is found. Make your delimiter a space and you will get whole words only. so:

SUBSTRING_INDEX( LEFT('Business Analist met focus op wet- en regelgeving',28), ' ' , 4) should do it.