I created a database in mysql by setting collation of database and charset of each table as utf16
The reason I going for utf16 is , I wanted to store all kind of math equations like ΦB =B.A=BAcosθ β ( in fact in ΦB, B is superscript )
not only the above formula but database might have any kind of greek letters as shown link
I tried to insert data by reading microsoft excel file before executing query everything seems correct in java code but after inserting in database it shows as ?B =B.A=BAcos? ?
As per utf16 supported charset Link above equation should look fine for me but it is not.
Any idea ?
Database Table schema
mysql> SHOW CREATE TABLE formulae;
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| formulae | CREATE TABLE `formulae` (
`formulaeid` int(11) NOT NULL AUTO_INCREMENT,
`formulae` text CHARACTER SET utf16,
`concept_conceptid` int(11) NOT NULL,
PRIMARY KEY (`formulaeid`),
KEY `fk_formulae_concept_idx` (`concept_conceptid`),
CONSTRAINT `fk_formulae_concept` FOREIGN KEY (`concept_conceptid`) REFERENCES `concept` (`conceptid`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf16 COLLATE=utf16_unicode_ci |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Code :
final static int CONCEPT_SHEET_NUMBER = 0;
final static int FORMULAE_SHEET_NUMBER = 1;
//Reads xl file
public void readProductArgExcel(File inputFile) throws IOException {
FileInputStream file = new FileInputStream(inputFile);
XSSFWorkbook workBook = new XSSFWorkbook(file);
boolean isError = false;
for (int sheetNumber = 0; sheetNumber < workBook.getNumberOfSheets(); sheetNumber++) {
XSSFSheet sheet = workBook.getSheetAt(sheetNumber);
for (int i = 1; i <= sheet.getLastRowNum(); i++) {
isError = false;
Row row = sheet.getRow(i);
List<Object> columnValues = new ArrayList<Object>();
for (int j = 0; j < row.getPhysicalNumberOfCells(); j++) {
Cell cell = row.getCell(j);
if (!(cell == null)) {
if (cell.getCellType() == 0) {
Integer conceptId = (int) cell
.getNumericCellValue();
columnValues.add(conceptId);
}
if (cell.getCellType() == 1) {
String rowString = cell
.getStringCellValue();
columnValues.add(rowString);
}
switch (sheetNumber) {
case CONCEPT_SHEET_NUMBER:
break;
//XL file has two sheets, second sheet has few formulae
case FORMULAE_SHEET_NUMBER:{
if(j == 1){
if(columnValues.size() >= 2)
//this xl sheet has column 1 - formula, column2 - integer
insertFormulae((String)columnValues.get(0),(Integer)columnValues.get(1));
}
break;
}
default:
break;
}
}
}
}
System.out.println(sheet.getSheetName());
}
}
public static Connection getConnection() throws InstantiationException,
IllegalAccessException, ClassNotFoundException, SQLException {
Connection connection = null;
/*Tried with but didn't work
jdbc:mysql://localhost:3306/formulaeDB?useUnicode=true&characterEncoding=utf16
but it perfectly works with
jdbc:mysql://localhost:3306/formulaeDB?useUnicode=true&characterEncoding=utf8
*/
String connectionURL = "jdbc:mysql://localhost:3306/formulaeDB";
Class.forName("com.mysql.jdbc.Driver").newInstance();
connection = DriverManager.getConnection(connectionURL, "root","root");
/*The below code throws an exception
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Variable 'character_set_client' can't be set to the value of 'utf16'
*/
String queryString = "SET NAMES utf16";
PreparedStatement preparedStatement = connection.prepareStatement(
queryString, PreparedStatement.RETURN_GENERATED_KEYS);
preparedStatement.executeUpdate();
return connection;
}
//To insert into table
public static void insertFormulae(String formula,Integer conceptId){
String queryString = "INSERT INTO formulae(formulae,concept_conceptid) VALUES(?,?)";
Connection connection;
try {
connection = getConnection();
PreparedStatement preparedStatement = connection.prepareStatement(
queryString, PreparedStatement.RETURN_GENERATED_KEYS);
preparedStatement.setString(1, formula);
preparedStatement.setInt(2, conceptId);
preparedStatement.executeUpdate();
} catch (InstantiationException | IllegalAccessException
| ClassNotFoundException | SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String st[]) throws IOException {
ExcelParser r = new ExcelParser();
r.readProductArgExcel(new File(
“c://data/Formulae.xlsx"));
}
I have heard of no product that exposes utf16. Utf8 is the usual format for interchange. MySQL's utf8mb4 is even better, because it allows for 4-byte utf8 characters (that its utf8 does not).
Change all settings to utf8 to see if they interop better.
If you already have datafiles or databases with utf16 characters in them, then the problem is dicier.
You need to make sure that the data you read from excel is correct, that your table has been created properly, that your connection has been setup properly, and that your code which reads data from excel and writes them to the database is working properly.
If there seems to be no problem in all that, then the next thing to suspect is that perhaps the data in the database is just fine, and your means of observing the data is wrong.
Specifically, if you use mysql console commands, then you need to be sure that the console is capable of properly displaying unicode fields fetched from your database.
Graphical database management tools usually do not suffer from such drawbacks, so why not use the "MySQL Workbench" tool that ships together with MySQL?
This should take care of your unicode issues.
You will still have another problem: formatting.
Superscripts and subscripts are something that unicode has very little support for, (you can have a few superscript numbers and that's it,) so excel uses special formatting within the cell text to represent superscript and subscript.
Unfortunately, excel does not give this formatting to you when you read cells the way you do. It seems to be a hard problem to solve, but there appears to be some sort of solution here: Reading rich text from an Excel cell