I have a dataframe with as many as 10 million records. How can I get a count quickly? df.count
is taking a very long time.
相关问题
- How to maintain order of key-value in DataFrame sa
- Unusual use of the new keyword
- Get Runtime Type picked by implicit evidence
- Spark on Yarn Container Failure
- What's the point of nonfinal singleton objects
相关文章
- Gatling拓展插件开发,check(bodyString.saveAs("key"))怎么实现
- Livy Server: return a dataframe as JSON?
- RDF libraries for Scala [closed]
- Why is my Dispatching on Actors scaled down in Akk
- How do you run cucumber with Scala 2.11 and sbt 0.
- GRPC: make high-throughput client in Java/Scala
- Setting up multiple test folders in a SBT project
- SQL query Frequency Distribution matrix for produc
It's going to take so much time anyway. At least the first time.
One way is to cache the dataframe, so you will be able to more with it, other than count.
E.g
Subsequent operations don't take much time.