I block to program lisp function that mark how many times a string is included in another
I tried this function that sends me an error:
*** - +: "abc" is not a number
(defun string-contain (string1 string2)
(cond
((not (length string1)) nil) ; string1 est vide (pas besoin de le tester à chaque fois)
((> (length string1) (length string2)) nil) ; string1 est plus longue que chaine2
((string= string1 (subseq string2 0 (length string1))) string1)
(t (+ 1(string-include string1 (subseq string2 1))))))
Thank you
In general, when you're doing string processing, you should try to avoid calling subseq, since it creates a new string, and you don't want to be doing all that string allocation. Many of the sequence processing functions in Common Lisp take start and end parameters, so that you can specify which parts of the sequence you're looking for. The function search looks for an occurrence of a sequence within another sequences and returns the index of the first occurrence. You can call search repeatedly with new :start2 values to search farther and farther within the string. For instance:
(defun search-all (needle haystack &key key (test 'eql)
(start1 0)
(end1 (length needle))
(start2 0)
(end2 nil)
(overlaps nil))
"Counts the number of times that NEEDLE appears in HAYSTACK. START1
and END1, and START2 and END2, are bounding index designators of
NEEDLE and HAYSTACK, respectively. If OVERLAPS is true, then
overlapping occurrences will be counted separately."
(do* ((len1 (- end1 start1)) ; length of needle (constant)
(upd (if overlaps 1 len1)) ; how much to increment pos
(occurrences 0 (1+ occurrences)) ; occurrences, increments by 1
(start2 start2 (+ pos upd)) ; start2, updated to pos+upd
(pos #1=(search needle haystack ; pos. of needle, or NIL
:start1 start1 :end1 end1
:start2 start2 :end2 end2
:test test :key key)
#1#))
((null pos) occurrences))) ; when pos is NIL, return occurrences
There's one bit in there that may be a bit confusing. The variable bindings in do and do* loops have the form (variable [init-form [update-form]]), and we want the init-form and update-form for pos to be the same, namely a call to search. In Common Lisp code, you can use #n=form and then use #n# to refer to the same form again later. That's why I've used the #1=(search …) as the init-form, and then #1# as the update-form.
Here are some examples:
;; Find 'ab' within a 'abcdabcd'
(SEARCH-ALL "ab" "abcdabcd")
;;=> 2
;; Find 'cat' within a 'one cat two cat three cat'
(SEARCH-ALL "concatenate" "one cat two cat three cat" :START1 3 :END1 6)
;;=> 3
;; Find 'cat' within 'one cat two cat'
(SEARCH-ALL "concatenate" "one cat two cat three cat" :START1 3 :END1 6 :START2
0 :END2 15)
;;=> 2
;; Fail to find 'cat' in 'Cat'
(SEARCH-ALL "cat" "Cat")
;;=> 0
;; Find 'cat' in 'Cat'
(SEARCH-ALL "cat" "Cat" :TEST 'CHAR-EQUAL)
;;=> 1
;; Find 2 'aaa' in 'baaaaaab' (no overlaps)
(SEARCH-ALL "aaa" "baaaaaab" :OVERLAPS NIL)
;;=> 2
;; Find 4 'aaa' in 'baaaaaab' (with overlaps)
(SEARCH-ALL "aaa" "baaaaaab" :OVERLAPS T)
;;=> 4
Looking at the code, this looks like the source of the error:
((string= string1 (subseq string2 0 (length string1))) string1)
This line will return a string, if the comparison succeeds, it should probably return "1 plus the value of checking if string1 is at the 'head of string2, one character ahead".
You probably also want to skip the (+ 1 ...)
in the default case (no match). And you definitely want to return 0 rather than nil
, in the base cases.
(not (length string))
will always be either false or signal a type error. You probably want to compare to 0, with zerop
.
Your function has three problems noticed with a naked eye:
(not (length string1))
will always be nil
as Svante pointed out.
- Your function returns
nil
in two branches and a number in the last branch. This inconsistency may cause problems in the future.
- There is no function
string-include
.
Here is how I would approach this problem. We want to calculate number of times a given string is included in another string. This can be split into the following cases:
- If the first string ("substring") is shorter than the the second, the answer must be 0.
- If the length of the first string equals the length of the second string and these strings are equal, the answer must be 1.
- If the first string is shorter than the second, but forms a part of it from the beginning, we found 1 inclusion, plus we need to check if the same substring contains in the rest (tail) of the second string.
- Anything else must result in 0.
Here is the code that implements it:
(defun substring-times (substr string)
(cond ((> (length substr) (length string)) 0)
((and (= (length substr) (length string))
(string= substr string))
1)
((string= substr (subseq string 0 (length substr)))
(1+ (substring-times substr (subseq string (length substr)))))
(t 0)))
We can test it on
> (substring-times "ab" "abababababc")
5
This function does not cover the case of "ab" being contained in "cabxabyab". But the change is trivial (and as they like to say in books, left as an exercise).
More interesting is that this kind of function is inefficient (it uses recursion in place where iteration would do) and not idiomatic in Common Lisp. It would be nice to rewrite it using iteration:
(defun substring-times (substr string)
(let ((sublen (length substr))
(len (length string))
(result 0)
(i 0))
(loop
while (<= i (- len sublen))
if (string= substr string :start2 i :end2 (+ i sublen))
do (progn
(incf result)
(incf i sublen))
else
do (incf i)
end
finally (return result))))
This function will also be able to deal with the case of "cabxabyab":
> (substring-times "ab" "cabxabyab")
3
EDIT: I have replaced subseq
with keywords for string=
as Rainer Joswig suggested.