I am able to connect tableau with my database but the table size is really large here. Everytime I try to load the table into tableau, it is crashing and I am not able to find any work around. The table size varies from 10 million - 400 million rows. How should I approach this issue any suggestion ?
问题:
回答1:
I found a simple solution for optimising Tableau to work with very large datasets (1 billion+ rows): Google BigQuery, which is essentially a managed data warehouse.
- Upload data to BigQuery (you can append multiple files into a single table).
- Link that table to Tableau as an external data source
Tableau then sends SQL-like commands to BigQuery whenever a new 'view' is requested. The queries are processed quickly on Google's computing hardware, which then sends a small amount of information back to Tableau.
This method allowed me visualise a 100gb mobile call record dataset with ~1 billion rows on a MacBook.
回答2:
You don't "load" data into Tableau, you point Tableau at an external data source. Then Tableau sends a query to the external data source requesting only the summary info (aka query results) needed to create the visualization you designed.
So, for an extreme example, if you place CNT(Number of Records) on the Columns shelf, Tableau will send a simple short query to the external database asking it to report the number of records. Something along the lines of "select count(*) from xxx".
So even if there are billions of rows in the external database, Tableau will send a small amount of information to the database (a query) and receive back a small amount of information (the query results) to display. This allows Tableau to be very fast on its end, and performance depends on how fast the external database can respond to the query. Tuning your database depends on all kinds of factors: type and amount of memory and disk, how indices are set up, etc.
So the first step is to make sure that the database can perform as needed, regardless of Tableau.
That's the purist response. Now for a few messy details. It is possible to design a very complex visualization in Tableau that will send a complex query asking for a very large result set. For instance, you can design a dashboard that draws a dot on the map for every row in the database, and then refreshes a large volume of data everytime you wave the mouse over the marks on the map.
If you have millions or billions of data rows, and you want high performance, then don't do that. No user can read 60 million dots anyway, and they certainly don't want to wait for them to be sent over the wire. Instead first plot aggregate values, min, max, sum, avg etc and then drill down into more detail on demand.
As others suggest, you can use a Tableau extract to offload workload and cache data in a form for fast use by Tableau. An extract is similar to an optimized materialized view stored in Tableau. Extracts are very helpful with speeding up Tableau, but if you want high performance, filter and aggregate your extracts to contain only the data and level of detail needed to support your views. If you blindly make an extract of your entire database, you are simply copying all your data from one form of database to another.
回答3:
There are two ways to interpret this question:
The data source (which might be a single table, a view, etc.) has 10M to 400M rows and Tableau is crashing at some point in the load process. In that case, I suggest you to contact Tableau tech support. They really like to hear about situations like that and help people through them.
You are trying to create a visualization (such as a text table or crosstab) that has N records resulting in 10M to 400M displayed rows. In that case, you're into a territory that Tableau isn't designed for. A text table with 10M rows is not going to be useful for much of anything than exporting to something else, and in that case, there are better tools than Tableau for doing that (such as export/import tools built into most databases).
回答4:
Not really sure what your use case is, but I find it unlikely that you need all that data for one Tableau view.
You can parse down / aggregate the data using a view in the database or custom SQL from your Tableau connection. Also, try to use extracts rather than live database connections, as they will preform faster.
I like to use views in the database and then use those views to refresh my Tableau extracts on Tableau Server.