(def evil-code (str "(" (slurp "/mnt/src/git/clj/clojure/src/clj/clojure/core.clj") ")" ))
(def r (read-string evil-code ))
Works, but unsafe
(def r (clojure.edn/read-string evil-code))
RuntimeException Map literal must contain an even number of forms clojure.lang.Util.runtimeException (Util.java:219)
Does not work...
How to read Clojure code (presering all '#'s as themselves is desirable) into a tree safely? Imagine a Clojure antivirus that want to scan the code for threats and wants to work with data structure, not with plain text.
According to the current documentation you should never use
read
norread-string
to read from untrusted data sources.You should use
read-edn
orclojure.edn/read
which were designed with that purpose in mind.There was a long discussion in the mailing list regarding the use of read and read-eval and best practices regarding those.
I wanted to point out an old library (used in LightTable) that uses
read-string
with a techniques to propose a client/server communicationFetch : A ClojureScript library for Client/Server interaction.
You can see in particular the
safe-read
method :You can see the use of binding
*read-eval*
tofalse
. I think the rest of the code is worth watching at for the kind of abstractions it proposes.In a PR, it is suggested that there is a security problem that can be fixed by using
edn
instead (...aaand back to your question) :First of all you should never read clojure code directly from untrusted data sources. You should use EDN or another serialization format instead.
That being said since Clojure 1.5 there is a kind of safe way to read strings without evaling them. You should bind the read-eval var to false before using read-string. In Clojure 1.4 and earlier this potentially resulted in side effects caused by java constructors being invoked. Those problems have since been fixed.
Here is some example code:
Regarding reader macro's
The dispatch macro (#) and tagged literals are invoked at read time. There is no representation for them in Clojure data since by that time these constructs all have been processed. As far as I know there is no build in way to generate a syntax tree of Clojure code.
You will have to use an external parser to retain that information. Either you roll your own custom parser or you can use a parser generator like Instaparse and ANTLR. A complete Clojure grammar for either of those libraries might be hard to find but you could extend one of the EDN grammars to include the additional Clojure forms. A quick google revealed an ANTLR grammar for Clojure syntax, you could alter it to support the constructs that are missing if needed.
There is also Sjacket a library made for Clojure tools that need to retain information about the source code itself. It seems like a good fit for what you are trying to do but I don't have any experience with it personally. Judging from the tests it does have support for reader macro's in its parser.