I am doing a real estate feed for a portal and it is telling me the max length of a string should be 20,000 bytes (20kb), but I have never run across this before.
How can I measure byte
size of a varchar string
. So I can then do a while loop to trim it down.
You have to figure out if the string is ascii encoded or encoded with a multi-byte format.
In the former case, you can just use strlen
.
In the latter case you need to find the number of bytes per character.
the strlen documentation gives an example of how to do it : http://www.php.net/manual/en/function.strlen.php#72274
You can use mb_strlen() to get the byte length using a encoding that only have byte-characters, without worring about multibyte or singlebyte strings.
For example, as drake127 saids in a comment of mb_strlen, you can use '8bit' encoding:
<?php
$string = 'Cién cañones por banda';
echo mb_strlen($string, '8bit');
?>
You can have problems using strlen function since php have an option to overload strlen to actually call mb_strlen. See more info about it in http://php.net/manual/en/mbstring.overload.php
For trim the string by byte length without split in middle of a multibyte character you can use:
mb_strcut(string $str, int $start [, int $length [, string $encoding ]] )
Do you mean byte size or string length?
Byte size is measured with strlen()
, whereas string length is queried using mb_strlen()
. You can use substr()
to trim a string to X bytes (note that this will break the string if it has a multi-byte encoding - as pointed out by Darhazer in the comments) and mb_substr()
to trim it to X characters in the encoding of the string.
PHP's strlen()
function returns the number of ASCII characters.
strlen('borsc')
-> 5 (bytes)
strlen('boršč')
-> 7 (bytes)
$limit_in_kBytes = 20000;
$pointer = 0;
while(strlen($your_string) > (($pointer + 1) * $limit_in_kBytes)){
$str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
// here you can handle (0 - n) parts of string
$pointer++;
}
$str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);
// here you can handle last part of string
.. or you can use a function like this:
function parseStrToArr($string, $limit_in_kBytes){
$ret = array();
$pointer = 0;
while(strlen($string) > (($pointer + 1) * $limit_in_kBytes)){
$ret[] = substr($string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
$pointer++;
}
$ret[] = substr($string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);
return $ret;
}
$arr = parseStrToArr($your_string, $limit_in_kBytes = 20000);
Further to PhoneixS answer to get the correct length of string in bytes - Since mb_strlen()
is slower than strlen()
, for the best performance one can check "mbstring.func_overload" ini setting so that mb_strlen()
is used only when it is really required:
$content_length = ini_get('mbstring.func_overload') ? mb_strlen($content , '8bit') : strlen($content);