I need to compute the number of rows in a hive table, for that
I am using the query:
ANALYZE TABLE p_7 COMPUTE STATISTICS noscan
I want to fetch the results through java, I am trying with the below
code and have no luck. the error I get is :
Exception in thread "main" java.sql.SQLException: The query did not generate a result set!
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:393)
at HiveJdbcClient.main(HiveJdbcClient.java:22)
code I am using is :
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "hive", "");
System.out.println("connected");
Statement statement = con.createStatement();
String query = "ANALYZE TABLE p_7 COMPUTE STATISTICS noscan";
ResultSet res = statement.executeQuery(query);
}
}
I dont know how to execute a query such as:
ANALYZE TABLE p_7 COMPUTE STATISTICS noscan
through java. Any help on this would be of great help to me. Thanks.
Use the ANALYZE TABLE statement without 'NOSCAN' to compute the number of rows.
Note: This statement does not produce resultSet object.
To fetch the stored stats, use the following statement.
DESCRIBE FORMATTED tableName
In the output, the number of rows is listed in parameters array. Use regex to extract it.
Here is the sample code:
String analyzeQuery = "ANALYZE TABLE p_7 COMPUTE STATISTICS";
String describeQuery = "DESCRIBE FORMATTED p_7";
stmt.execute(analyzeQuery);
StringBuilder sb = new StringBuilder();
try (ResultSet rs = stmt.executeQuery(describeQuery)) {
while (rs.next()) {
int count = rs.getMetaData().getColumnCount();
for (int j = 1; j <= count; j++) {
sb.append(rs.getString(j));
}
}
}
System.out.println("Output: "+ sb.toString());
Refer https://cwiki.apache.org/confluence/display/Hive/StatsDev for details on Table and Partition statistics.
Try the below code for getting number of rows of a table:
public static Connection createConnection(String hive_ip)
{
String hive_url="jdbc:hive2://"+hive_ip;
Connection con=null;
try {
Class.forName("org.apache.hive.jdbc.HiveDriver");
System.out.println(hive_url+"/");
con = DriverManager.getConnection(
hive_url+"/",
hive_username,hive_password);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return con;
}
public static int getHiveColumnRowCount(String tablename,String db_name)
{
int count=0;
Connection con=createConnection();
try {
Statement st=con.createStatement();
int i=0;
String count_query="show tblproperties "+db_name+"."+tablename;
ResultSet rs=st.executeQuery(count_query);
while(rs.next())
{
i++;
if(i==3)
{
count=Integer.parseInt(rs.getString(2));
}
}
System.out.println("COUNT:"+count);
rs.close();
st.close();
con.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return count;
}
Hope it helps :)