I'm both new and old to programming -- mostly I just write a lot of small Perl scripts at work. Clojure came out just when I wanted to learn Lisp, so I'm trying to learn Clojure without knowing Java either. It's tough, but it's been fun so far.
I've seen several examples of similar problems to mine, but nothing that quite maps to my problem space. Is there a canonical way to extract lists of values for each line of a CSV file in Clojure?
Here's some actual working Perl code; comments included for non-Perlers:
# convert_survey_to_cartography.pl
open INFILE, "< coords.csv"; # Input format "Northing,Easting,Elevation,PointID"
open OUTFILE, "> coords.txt"; # Output format "PointID X Y Z".
while (<INFILE>) { # Read line by line; line bound to $_ as a string.
chomp $_; # Strips out each line's <CR><LF> chars.
@fields = split /,/, $_; # Extract the line's field values into a list.
$y = $fields[0]; # y = Northing
$x = $fields[1]; # x = Easting
$z = $fields[2]; # z = Elevation
$p = $fields[3]; # p = PointID
print OUTFILE "$p $x $y $z\n" # New file, changed field order, different delimiter.
}
I've puzzled out a little bit in Clojure and tried to cobble it together in an imperative style:
; convert-survey-to-cartography.clj
(use 'clojure.contrib.duck-streams)
(let
[infile "coords.csv" outfile "coords.txt"]
(with-open [rdr (reader infile)]
(def coord (line-seq rdr))
( ...then a miracle occurs... )
(write-lines outfile ":x :y :z :p")))
I don't expect the last line to actually work, but it gets the point across. I'm looking for something along the lines of:
(def values (interleave (:p :y :x :z) (re-split #"," coord)))
Thanks, Bill
Here's one way:
with-out-writer
binds*out*
such that everything you print will go to the filename or stream you specify, rather than standard-output.Using
def
as you're using it isn't idiomatic. A better way is to use let. I'm using destructuring to assign the 4 fields of each line to 4let
-bound names; then you can do what you want with those.If you're iterating over something for the purpose of side-effects (e.g. I/O) you should usually go for
doseq
. If you wanted to collect up each line into a hash-map and do something with them later, you could usefor
:Please don't use nested def's. It doesn't do, what you think it does. def is always global! For locals use let instead. While the library functions are nice to know, here a version orchestrating some features of functional programming in general and clojure in particular.
Docstrings can be queried in the REPL via (doc translate-coords). Works eg. for all core functions. So supplying one is a good idea.
translator is a (maybe anonymous) function which extracts the translation from the surrounding boilerplate. So we can reuse this functions with different transformation rules. The type hints here avoid reflection for the constructors.
Open the files. with-open will take care, that the files are closed when its body is left. Be it via normal "drop off the bottom" or be it via a thrown Exception.
We bind the
*out*
stream temporarily to the output file. So any print inside the binding will print to the file.The
map
means: take the seq and apply the given function to every element and return the seq of the results. The#()
is a short-hand notation for an anonymous function. It takes one argument, which is filled in at the%
. Thedoseq
is basically a loop over the input. Since we do that for the side effects (namely printing to a file),doseq
is the right construct. Rule of thumb:map
: lazy => for result,doseq
: eager => for side effects.println
takes care for the\n
at the end of the line.interpose
takes the seq and adds the first argument (in our case " ") between its elements.(apply str [1 2 3])
is equivalent to(str 1 2 3)
and is useful to construct function calls dynamically. The->>
is a relatively new macro in clojure, which helps a bit with readability. It means "take the first argument and add it as last item to the function call". The given->>
is equivalent to:(println (apply str (interpose " " (translator coords))))
. (Edit: Another note: since the separator is\space
, we could here write just as well(apply println (translator coords))
, but theinterpose
version allows to also parametrize the separator as we did with the translator function, while the short version would hardwire\space
.)Here we use destructuring (note the double
[[]]
). It means the argument to the function is something which can be turned into a seq, eg. a vector or a list. Bind the first element toy
, the second tox
and so on.Here again less choppy:
Hope this helps.
Edit: For CSV reading you probably want something like OpenCSV.