Pentaho Frame size (17727647) larger than max leng

In pentaho , when I run a cassandra input step that get around 50,000 rows , I get this exception :

Is there a way to control the query result size in pentaho ? or is there a way to stream the query result and not get it all in bulk?

2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Unexpected error
2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 - 
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:355)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.processRow(CassandraInput.java:234)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2014/10/09 15:14:09 - Cassandra Input.0 -   at java.lang.Thread.run(Unknown Source)
2014/10/09 15:14:09 - Cassandra Input.0 - Caused by: org.apache.thrift.transport.TTransportException: Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1656)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1642)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.cassandra.legacy.LegacyCQLRowHandler.newRowQuery(LegacyCQLRowHandler.java:289)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:333)
2014/10/09 15:14:09 - Cassandra Input.0 -   ... 3 more
2014/10/09 15:14:09 - Cassandra Input.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=1)
2014/10/09 15:14:09 - all customer data - Transformation detected one or more steps with errors.
2014/10/09 15:14:09 - all customer data - Transformation is killing the other steps!

标签： database cassandra bigdata pentaho kettle

4条回答

一纸荒年 Trace。

2楼-- · 2019-08-02 05:04

You can try the following method on the server side:

TNonblockingServerSocket tnbSocketTransport = new TNonblockingServerSocket(listenPort);
TNonblockingServer.Args tnbArgs = new TNonblockingServer.Args(tnbSocketTransport);

// maxLength is configured to 1GB，while the default size is 16MB
tnbArgs.transportFactory(new TFramedTransport.Factory(1024 * 1024 * 1024));

tnbArgs.protocolFactory(new TCompactProtocol.Factory());
TProcessor processor = new UcsInterfaceThrift.Processor<UcsInterfaceHandler>(ucsInterfaceHandler);
tnbArgs.processor(processor);
TServer server = new TNonblockingServer(tnbArgs);
server.serve();

0人赞添加讨论(0) 举报

相关推荐>>

3楼-- · 2019-08-02 05:09

org.apache.thrift.transport.TTransportException: 
  Frame size (17727647) larger than max length (16384000)!

A limit is enforced for how large frames (thrift messages) can be to avoid performance degradation. You can tweak this by modifying some settings. The important thing to note here is that you need to set the settings bot client size and server side.

Server side in cassandra.yaml

# Frame size for thrift (maximum field length).
# default is 15mb, you'll have to increase this to at-least 18.
thrift_framed_transport_size_in_mb: 18 

# The max length of a thrift message, including all fields and
# internal thrift overhead.
# default is 16, try to keep it to thrift_framed_transport_size_in_mb + 1
thrift_max_message_length_in_mb: 19

Setting the client side limit depends on what driver you're using.

0人赞添加讨论(0) 举报

等我变得足够好

4楼-- · 2019-08-02 05:18

Well it did work for me..

Cassandra Version: [cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]

Pentaho PDI Version: pdi-ce-5.4.0.1-130

Changed Settings in cassandra.yaml:

# Whether to start the thrift rpc server.
start_rpc: true

# Frame size for thrift (maximum message length).
thrift_framed_transport_size_in_mb: 35

Cassandra Output Step Settings Changed to:

Port: 9160
"Use CQL Version 3": checked

0人赞添加讨论(0) 举报

放荡不羁爱自由

5楼-- · 2019-08-02 05:23

I resolved these problem by using PDI 5.2 which has the property in Cassandra Input step called as max_length setting this property to higher value like 1GB solves these problem.

0人赞添加讨论(0) 举报

Pentaho Frame size (17727647) larger than max leng

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间