What are the caveats of using source versus parse

2019-01-22 17:19发布

Short version

Can I replace

source(filename, local = TRUE, encoding = 'UTF-8')

with

eval(parse(filename, encoding = 'UTF-8'))

without any risk of breakage, to make UTF-8 source files work on Windows?

Long version

I am currently loading specific source files via

source(filename, local = TRUE, encoding = 'UTF-8')

However, it is well known that this does not work on Windows, full stop.

As a workaround, Joe Cheng suggested using instead

eval(parse(filename, encoding = 'UTF-8'))

This seems to work quite well1 but even after consulting the source code of source, I don’t understand how they differ in one crucial detail:

Both source and sys.source do not simply parse and then eval the file content. Instead, they parse the file content and then iterate manually over the parsed expressions, and eval them one by one. I do not understand why this would be necessary in sys.source (source at least uses it to show verbose diagnostics, if so instructed; but sys.source does nothing of the kind):

for (i in seq_along(exprs)) eval(exprs[i], envir)

What is the purpose of evaling statements separately? And why is it iterating over indices instead directly over the sub-expressions? What other caveats are there?

To clarify: I am not concerned about the additional parameters of source and parse, some of which may be set via options.


1 The reason that source is tripped up by the encoding but parse isn’t boils down to the fact that source attempts to convert the input text. parse does no such thing, it reads the file’s byte content as-is and simply marks its Encoding as UTF-8 in memory.

标签: r eval
1条回答
ら.Afraid
2楼-- · 2019-01-22 17:30

This is not a full answer as it primarily addresses the seq_along part of the question, but too lengthy to include as comments.

One key difference between the seq_along followed by [ vs just using for i in x approach (which I believe is be similar to seq_along followed by [[ instead of [) is that the former preserves the expression. Here is an example to illustrate the difference:

> txt <- "x <- 1 + 1
+ # abnormal expression
+   2 *
+     3
+ "
> x <- parse(text=txt, keep.source=TRUE)
> 
> for(i in x) print(i)
x <- 1 + 1
2 * 3
> for(i in seq_along(x)) print(x[i])
expression(x <- 1 + 1)
expression(2 *
    3)

Alternatively:

> attributes(x[[2]])
NULL
> attributes(x[2])
$srcref
$srcref[[1]]
2 *
    3

Whether this has any practical impact when comparing to eval(parse(..., keep.source=T)), I can only say that it could, but can't imagine a situation where it does.

Note that subsetting expression separately also leads to the srcref business getting subset, which could conceivably be useful (...maybe?).

查看更多
登录 后发表回答