I have an If-else statement which checks a string to see whether there is an ISBN-10 or ISBN-13 (book ID).
The problem I am facing is with the ISBN-10 check which occurs before the ISBN-13 check, the ISBN-10 check will match anything with 10 characters or more and so may mistake an ISBN-13 for an ISBN-10.
here is the code...
$str = "ISBN:9780113411436";
if(preg_match("/\d{9}(?:\d|X)/", $str, $matches)){
echo "ISBN-10 FOUND\n";
//isbn returned will be 9780113411
return 0;
}
else if(preg_match("/\d{12}(?:\d|X)/", $str, $matches)){
echo "ISBN-13 FOUND\n";
//isbn returned will be 9780113411436
return 1;
}
How do I make sure I avoid this problem?
You really only need one regex for this. Then do a more efficient strlen()
check to see which one was matched. The following will match ISBN-10 and ISBN-13 values within a string with or without hyphens, and optionally preceded by the string ISBN:
, ISBN:(space)
or ISBN(space)
.
Finding ISBNs :
function findIsbn($str)
{
$regex = '/\b(?:ISBN(?:: ?| ))?((?:97[89])?\d{9}[\dx])\b/i';
if (preg_match($regex, str_replace('-', '', $str), $matches)) {
return (10 === strlen($matches[1]))
? 1 // ISBN-10
: 2; // ISBN-13
}
return false; // No valid ISBN found
}
var_dump(findIsbn('ISBN:0-306-40615-2')); // return 1
var_dump(findIsbn('0-306-40615-2')); // return 1
var_dump(findIsbn('ISBN:0306406152')); // return 1
var_dump(findIsbn('0306406152')); // return 1
var_dump(findIsbn('ISBN:979-1-090-63607-1')); // return 2
var_dump(findIsbn('979-1-090-63607-1')); // return 2
var_dump(findIsbn('ISBN:9791090636071')); // return 2
var_dump(findIsbn('9791090636071')); // return 2
var_dump(findIsbn('ISBN:97811')); // return false
This will search a provided string to see if it contains a possible ISBN-10 value (returns 1
) or an ISBN-13 value (returns 2
). If it does not it will return false
.
See DEMO of above.
Validating ISBNs :
For strict validation the Wikipedia article for ISBN has some PHP validation functions for ISBN-10 and ISBN-13. Below are those examples copied, tidied up and modified to be used against a slightly modified version of the above function.
Change the return block to this:
return (10 === strlen($matches[1]))
? isValidIsbn10($matches[1]) // ISBN-10
: isValidIsbn13($matches[1]); // ISBN-13
Validate ISBN-10:
function isValidIsbn10($isbn)
{
$check = 0;
for ($i = 0; $i < 10; $i++) {
if ('x' === strtolower($isbn[$i])) {
$check += 10 * (10 - $i);
} elseif (is_numeric($isbn[$i])) {
$check += (int)$isbn[$i] * (10 - $i);
} else {
return false;
}
}
return (0 === ($check % 11)) ? 1 : false;
}
Validate ISBN-13:
function isValidIsbn13($isbn)
{
$check = 0;
for ($i = 0; $i < 13; $i += 2) {
$check += (int)$isbn[$i];
}
for ($i = 1; $i < 12; $i += 2) {
$check += 3 * $isbn[$i];
}
return (0 === ($check % 10)) ? 2 : false;
}
See DEMO of above.
Use ^
and $
to match beginning and end of string. By using the string delimiters, the order in which you test the 10 or the 13-digit codes will not matter.
10 digits
/^ISBN:(\d{9}(?:\d|X))$/
13 digits
/^ISBN:(\d{12}(?:\d|X))$/
Note: According to http://en.wikipedia.org/wiki/International_Standard_Book_Number, it appears as though ISBNs can have a -
in them as well. But based on the $str
you're using, it looks like you've removed the hyphens before checking for 10 or 13 digits.
Additional note: Because the last digit of the ISBN is used as a sort of checksum for the prior digits, regular expressions alone cannot validate that the ISBN is a valid one. It can only check for 10 or 13-digit formats.
$isbns = array(
'ISBN:1234567890', // 10-digit
'ISBN:123456789X', // 10-digit ending in X
'ISBN:1234567890123', // 13-digit
'ISBN:123456789012X', // 13-digit ending in X
'ISBN:1234' // invalid
);
function get_isbn($str) {
if (preg_match('/^ISBN:(\d{9}(?:\d|X))$/', $str, $matches)) {
echo "found 10-digit ISBN\n";
return $matches[1];
}
elseif (preg_match('/^ISBN:(\d{12}(?:\d|X))$/', $str, $matches)) {
echo "found 13-digit ISBN\n";
return $matches[1];
}
else {
echo "invalid ISBN\n";
return null;
}
}
foreach ($isbns as $str) {
$isbn = get_isbn($str);
echo $isbn."\n\n";
}
Output
found 10-digit ISBN
1234567890
found 10-digit ISBN
123456789X
found 13-digit ISBN
1234567890123
found 13-digit ISBN
123456789012X
invalid ISBN
Put the ISBN-13 check before the ISBN-10 check? This is assuming that you want to match them as a part of any string, that is (your example has an extra "ISBN:" at the start so matching anywhere in a string seems to be a requirement of some sort)
Switch the order of the if else
block, also strip all whitespace, colons, and hyphens from your ISBN:
//Replace all the fluff that some companies add to ISBNs
$str = preg_replace('/(\s+|:|-)/', '', $str);
if(preg_match("/^ISBN\d{12}(?:\d|X)$/", $str, $matches)){
echo "ISBN-13 FOUND\n";
//isbn returned will be 9780113411436
return 1;
}
else if(preg_match("/^ISBN\d{9}(?:\d|X)$/", $str, $matches)){
echo "ISBN-10 FOUND\n";
//isbn returned will be 9780113411
return 0;
}
ISBN10_REGEX = /^(?:\d[\ |-]?){9}[\d|X]$/i
ISBN13_REGEX = /^(?:\d[\ |-]?){13}$/i