I running into an encoding problem with the SPARQL package for R. I'm running the following code:
library(SPARQL)
rights_query <- '
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?edmrights ?provider (COUNT(*) as ?count)
WHERE {
?agg rdf:type ore:Aggregation .
?agg edm:rights ?edmrights .
#?agg dc:rights ?dcrights .
?agg edm:dataProvider ?provider .
?proxy ore:proxyIn ?agg .
?proxy edm:type "IMAGE" .
}
GROUP BY ?edmrights ?provider
ORDER BY ?provider DESC(?count)'
eur <- "http://europeana.ontotext.com/sparql"
eur_data <- SPARQL(eur, rights_query)$results
write.csv(eur_data, "results.csv")
The code runs without any errors or warnings, however the resulting data frame as viewed in RStudio, as well as the CSV, clearly have encoding problems.
For example, the last ought to be partly Cyrillic: Чувашский государственный художественный музей / Chouvashia State Art Museum
However it comes out looking like this: ЧÑваÑÑкий гоÑÑдаÑÑÑвеннÑй ÑÑдожеÑÑвеннÑй мÑзей / Chouvashia State Art Museum
I've inspected the XML returned by the SPARQL query. It passes XML validation, and contains the proper "UTF-8" encoding declaration. The R XML package (which is what the R SPARQL package uses to parse XML output into a data frame) ought to recognize this, right?
You can inspect the entire XML output, as well as the CSV file. I am running R 3.1.0 via RStudio, on OS X Mavericks. I have set RStudio's default character encoding to UTF-8.