From the Spark official document, it says:
Spark SQL can cache tables using an in-memory columnar format by calling sqlContext.cacheTable("tableName") or dataFrame.cache(). Then Spark SQL will scan only required columns and will automatically tune compression to minimize memory usage and GC pressure. You can call sqlContext.uncacheTable("tableName") to remove the table from memory.
What does caching tables using a in-memory columnar format really mean? Put the whole table into the memory? As we know that cache is also lazy, the table is cached after the first action on the query. Does it make any difference to the cached table if choosing different actions or queries? I've googled this cache topic several times but failed to find some detailed articles. I would really appreciate it if anyone can provides some links or articles for this topic.
http://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory