How do I filter through data in Cassandra?

2019-04-09 01:52发布

I've been using mySQL for an app for some time, and the more data I collect, the slower it gets. So I have been looking into NOSQL options. One of the things I have in mySQL is a View created from a bunch of joins. The app shows all the important info in a grid, and the user can select ranges, do searches, etc. On this data set. Standard Query stuff.

Looking at Cassandra everything is already sorted based on the parameters I provide in my storage-conf.xml. So I would have a certain string as my key in the SuperColumn, and keep a bunch of the data in Columns below that. But I can only sort by one Column, and I can't do any real searching within the columns without pulling all the SuperColumns, and looping through the data, right?

I don't want to duplicate data across different ColumnFamilies, so I want to make sure Cassandra is appropriate for me. In Facebook, Digg, Twitter, they have plenty of searching functions, so maybe I am just not seeing the solution.

Is there a way with Cassandra for me to search for or filter specific data values in a SuperColumn, or its associated Column(s)? If not, is there another NOSQL option?

In the example below, it seems I can only query for phatduckk, friend1,John, etc. But what if I wanted to find anyone in the ColumnFamily that lived in city == "Beverley Hills"? Can it be done without returning all records? If so, could I do a search for city == "Beverley Hills" AND state == "CA"? It doesn't seem like I can do either, but I want to make sure and see what my options are.

AddressBook = { // this is a ColumnFamily of type Super
  phatduckk: {    // this is the key to this row inside the Super CF
    friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},
    John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
    Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
    Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
    Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
  }, // end row
  ieure: {     
    joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
    William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
  },

}

4条回答
混吃等死
2楼-- · 2019-04-09 02:05

Super family doesn't support secondary index but regular column family do. Using secondary index you can use the GetWhere statement.

Here is one example taken from one of my PHP projects:

public function GetCodeWithValue( $_value )
{
    $result = $this->getDbFamily()->getWhere(array('value' => $_value, 'used' => 0));

    if ( $this->IsValid( $result ))
        return $result->key();
    else 
        return null;
}

This code use this Cassandra API : https://github.com/kallaspriit/Cassandra-PHP-Client-Libraryf

查看更多
叼着烟拽天下
3楼-- · 2019-04-09 02:13

Note that since the question was asked, Cassandra added support for indexes automatically managed by the Cassandra system (I think since 0.8). That can answer the question for some people instead of managing your own index.

http://www.datastax.com/docs/1.1/dml/using_cli#indexing-a-column

This being said, I also wanted to mentioned that an SQL database, when it creates an index, duplicates a lot of your data to generate said index. It is still really cheap in Cassandra especially because you can dearly optimize it. The main problem is that you have to maintain coherency manually which SQL does for you transparently. But both mechanisms use exactly the same theoretical concept.

This is a bit like re-programming your own std::string with specializations that pertain to your application... (think of QString and CString for example!)

查看更多
神经病院院长
4楼-- · 2019-04-09 02:19

You cannot perform those kind of operations in Cassandra. There is a certain kinds of selection predicates that can be set on column-keys but nothing on the value that they hold. Look at the API and check get_slice/get_superslice and get_range query types. Again, all of this is concerning the keys in the ColumnFamily or SuperColumnFamily not the values.

If you want the kind of functionality that you have described then your best bet is a SQL database. Build proper indexes on your tables, especially on the columns that are most queried and you will see a big difference in the query performance. Hope this helps.

查看更多
老娘就宠你
5楼-- · 2019-04-09 02:24

You "don't want to duplicate data across different ColumnFamilies," but that is how you do this kind of query in Cassandra. See http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/

查看更多
登录 后发表回答