I have a list shaped like this:
'(("Alpha" . 1538)
("Beta" . 8036)
("Gamma" . 8990)
("Beta" . 10052)
("Alpha" . 12837)
("Beta" . 13634)
("Beta" . 14977)
("Beta" . 15719)
("Alpha" . 17075)
("Rho" . 18949)
("Gamma" . 21118)
("Gamma" . 26923)
("Alpha" . 31609))
How can I count the total number of occurrences of the terms in the car of each element in the list? Basically I want:
(("Alpha" . 4)
("Beta" . 5)
("Gamma" . 3)
("Rho" . 1))
No, this is not homework. I just don't have the "thinking in Lisp" thing quite yet.
In C#, I would use LINQ to do this. I can do it in lisp, too, using while loops and such but the way I am thinking of doing it seems overly complicated.
EDIT
This is what I have:
(defun count-uniq (list)
"Returns an alist, each item is a cons cell where the car is
a unique element of LIST, and the cdr is the number of occurrences of that
unique element in the list. "
(flet ((helper (list new)
(if (null list)
new
(let ((elt (assoc (car list) new)))
(helper (cdr list)
(if elt
(progn (incf (cdr elt)) new)
(cons (cons (car list) 1) new)))))))
(nreverse (helper list nil))))
I dunno that this is the most elegant, but it seems reasonable:
(defun add-for-cheeso (data)
(let (result)
(dolist (elt data result)
(let ((sofar (assoc (car elt) result)))
(if sofar
(setcdr sofar (1+ (cdr sofar)))
(push (cons (car elt) 1) result))))))
(defun freqs (list &optional test key)
(let ((h (make-hash-table :test test)))
(dolist (x list)
(let ((key (if key (funcall key x) x)))
(puthash key (1+ (gethash key h 0)) h)))
(let ((r nil))
(maphash #'(lambda (k v) (push (cons k v) r)) h)
(sort r #'(lambda (x y) (< (cdr x) (cdr y)))))))
(freqs '(("Alpha" . 1538)
("Beta" . 8036)
("Gamma" . 8990)
("Beta" . 10052)
("Alpha" . 12837)
("Beta" . 13634)
("Beta" . 14977)
("Beta" . 15719)
("Alpha" . 17075)
("Rho" . 18949)
("Gamma" . 21118)
("Gamma" . 26923)
("Alpha" . 31609))
#'equal #'car)
Combining higher level Common Lisp functions:
(defun count-unique (alist)
(mapcar
(lambda (item)
(cons (car item)
(count (car item) alist :test #'equal :key #'car)))
(remove-duplicates alist :test #'equal :key #'car)))
It doesn't scale to large lists though. If you need O(n) performance use a hash table based solution instead, such as the less elegant:
(defun count-unique (alist)
(loop
with hash = (make-hash-table :test #'equal)
for (key . nil) in alist
do (incf (gethash key hash 0))
finally (return
(loop for key being each hash-key of hash
using (hash-value value)
collect (cons key value)))))
Using Common Lisp extensions:
(require 'cl)
(loop with result = nil
for (key . dummy) in original-list
do (incf (cdr (or (assoc key result)
(first (push (cons key 0) result)))))
finally return (sort result
(lambda (a b) (string< (car a) (car b)))))
You can just say finally return result
if you don't care about sorting the final result.
(require 'cl)
(defun count-uniq (list)
(let ((k 1) (list (sort (mapcar #'car list) #'string<)))
(loop for (i . j) on list
when (string= i (car j)) do (incf k)
else collect (cons i k) and do (setf k 1))))
Using high-order functions sort and reduce.
First sorting (using string<) then reducing (counting consecutive string= values in cons cells):
(reduce (lambda (r e)
(if (and r (string= (caar r) e))
(cons
(cons (caar r) (1+ (cdar r)))
(cdr r))
(cons (cons e 1) r)))
(sort (mapcar 'car alist) 'string<)
:initial-value nil)
Every time you want to traverse a list and return some value afterwards, be it a new list or some aggregate result, you are thinking of a fold, also called "reduce" in Python and Lisps. Fold is a great abstraction, as it allows to write generic code, applicable for many use-cases just by tweaking some elements. What is similar between finding a sum of several numbers, finding a product, finding a minimum integer? They are all folds, because you run through the list and then return some result based on its content. In Emacs Lisp they would look like this:
(reduce '+ '(1 2 3 4 5)) ; 15
(reduce '* '(1 2 3 4 5)) ; 120
(reduce 'min '(1 2 3 4 5)) ; 1
But folds are even more general than this. What is similar between finding a sum, counting a number of even numbers in a list, removing every odd number, and building a list with every number increased by 5? Every such function can be implemented by taking some base value, successively transform it, until you get the result. You take this base value, metaphorical blob of clay, call it "accumulator", then take one element from a list and based on this element do something to this blob of clay, make it a draft of a magnificent sculpture. Then you take the next element from a list and do something new to your sculpture. You repeat that until the list is empty and you end up with a masterpiece. It's as if every element of a list is a single instruction in a large recipe. Just bear in mind, that you are completely free to do anything with the clay, you don't have to use the list elements in the result directly—technically, this means that the accumulator (and, thus, the result) may be of different type.
(reduce '+ '(1 2 3 4 5) :initial-value 0) ; 15
(reduce (lambda (acc x) (if (evenp x) (1+ acc) acc)) '(1 2 3 4 5) :initial-value 0) ; 2
(reduce (lambda (x acc) (if (oddp x) acc (cons x acc))) '(1 2 3 4 5) :initial-value '() :from-end t) ; (2 4)
(reduce (lambda (x acc) (cons (+ x 5) acc)) '(1 2 3 4 5) :initial-value '() :from-end t) ; (6 7 8 9 10)
Note about reducing from end: lists in Lisps are not smart arrays like in Python or Java, they are linked lists, therefore accessing or changing an element somewhere in a list is an O(n) operation, while "consing" to the beginning of a list is O(1). In other words, appending an element to the end of a list is expensive, therefore Lispers usually add elements to the beginning of a list and then finally reverse the list, which is called push/nreverse idiom. If we did the ordinary reduce in the last 2 functions, we would cons 1 to the accumulator and get (1), then cons 2 to accumulator and get (2 1), until we get the correct result but upside-down. We could use reverse
function afterwards, but luckily Emacs's reduce
supports :from-end
keyword argument, so it conses 5, then 4, then 3, and so on.
It's clear now, that your operation is a fold, traverse the original alist and count occurrences of each key. Before writing our fold, let's talk about alists first. Alist in Lisp is a poor man's hash-table. You don't usually tinker with a programming language's hash-table implementation details, do you? You work with an API. In Python this API looks like square bracket syntax (d['a'] = 1
) and dict methods (d.keys()
). For alists API contains function assoc
, which returns an item provided a key.
(assoc 'a '((a . 1) (b . 2))) ; (a . 1)
Why do I talk about implementation details? Because you work via assoc
and you don't care how exactly this alist looks like, you abstract that away. Another piece of API is that if you want to add a new element or change an existing one, you simply cons a dotted pair to the alist. It's how you supposed to work with alists, regardless of their internal structure. Why does that work? For example, if I want to change value for key a
to 10, I would simply run (cons '(a . 10) my-alist)
, and my-alist
would end up being '((a . 10) (a . 1) (b . 2))
. But it's no problem, because assoc
returns only the first dotted pair and ignores the rest, so you can treat alist just like any other key-value data structure. With that in mind let's write our first serious fold.
(reduce (lambda (acc x)
(let* ((key (car x))
(pair (assoc key acc))
(count (cdr pair)))
(if pair
(cons (cons key (1+ count)) acc)
(cons (cons key 1) acc))))
my-alist
:initial-value '())
What happens here? We take your data and an empty list, which will soon become our desired result. At each step we take a pair from data and ask: does our result contain info about this pair? If not, then we add it to the result and put 1—we met this key for the first time. However, if we do find info about this pair in our result, then we must again add it to our result, but this time with a number increased by 1. Repeat that process for every item in your data, and you get:
(("Alpha" . 4) ("Gamma" . 3) ("Gamma" . 2) ("Rho" . 1) ("Alpha" . 3)
("Beta" . 5) ("Beta" . 4) ("Beta" . 3) ("Alpha" . 2) ("Beta" . 2)
("Gamma" . 1) ("Beta" . 1) ("Alpha" . 1))
Remember that assoc
only cares about the first occurrence of a key? This alist would behave the same as if it was just (("Alpha" . 4) ("Gamma" . 3) ("Rho" . 1) ("Beta" . 5))
, so we're good here. Still, could we change our fold as to get the latter, shorter result instead? Hold on, what's the need to over-complicate our fold, if we could just tweak the result afterwards? After all, what is computer programming, if not series of data transformations? There is no reason why you couldn't just remove all the "obsolete" pairs from your alist, just use cl-remove-duplicates
with correct arguments, and you're done.
So we're proud of ourselves, we wrote a fold, a pillar of functional programming, yet careful examination exposes an inefficiency: we traverse the accumulator with assoc
to find a pair and its value to increment. assoc
takes O(n), reduce
itself takes O(n), therefore our algorithm is O(n²) (read about order of growth, if you don't understand Big-O notation). It's clear that we should better work with a proper optimized hash-table instead, and convert it to an alist when we need. Rewrite our fold:
(reduce (lambda (acc x)
(cl-incf (gethash (car x) acc 0))
acc)
my-alist
:initial-value (make-hash-table :test 'equal))
(gethash k d 0)
is equivalent to Python's d.get('k', 0)
, where the last argument is default. cl-incf
(Common Lisp equivalent incf
) is a smart macro that increments its argument in-place (read about setf
to understand smart assignments). make-hash-table
requires custom test function, because strings can't be compared with default eql
function. To get an alist, just convert the result hash-table of our fold with ht->alist
function, that we either take from Wilfred's ht.el
library, or write ourselves:
(defun ht->alist (table)
(let (alist)
(maphash (lambda (k v)
(push (cons k v) alist))
table)
alist))
Here's what I think is an elegant functional solution using Emacs' alist functions, yielding a reusable frequencies
function similar to Eli's answer:
(defun frequencies (vals)
(reduce
(lambda (freqs key)
(cons (cons key (+ 1 (or (cdr (assoc key freqs)) 0)))
(assq-delete-all-with-test key freqs 'equal)))
vals
:initial-value nil)))
(frequencies (mapcar 'car
'(("Alpha" . 1538)
("Beta" . 8036)
("Gamma" . 8990)
("Beta" . 10052)
("Alpha" . 12837)
("Beta" . 13634)
("Beta" . 14977)
("Beta" . 15719)
("Alpha" . 17075)
("Rho" . 18949)
("Gamma" . 21118)
("Gamma" . 26923)
("Alpha" . 31609))))
=> (("Alpha" . 4) ("Gamma" . 3) ("Rho" . 1) ("Beta" . 5))
This is pretty easy and very straightforward using the dash library:
(require 'dash)
(defun frequencies (values)
"Return an alist indicating the frequency of values in VALUES list."
(mapcar (-lambda ((value . items))
(cons value (length items)))
(-group-by #'identity
values)))
(frequencies (mapcar #'car my-list))