如何才能辨别不使用BOM的不同编码？(How can I identify different en

我有一个抓住从UTF-16LE编码的文件越来越多内容的文件观察者。写入数据的第一位拥有可用的BOM - 我就是用这个来确定对UTF-8（其中大部分我的文件进来的编码）的编码。我赶上了BOM和重新编码成UTF-8，所以我的解析器不吓坏了。问题是，因为它是一个不断增长的文件不是数据的每一位拥有它的BOM。

我的问题是-没有前面加上BOM字节到每一组数据我有（ 因为我没有在源控制 ）可我只能寻找那些在UTF-16 \ 000是固有的空字节，然后用作为我的标识，而不是BOM？这会不会导致我头痛的道路？

我的架构涉及红宝石Web应用程序接收到的数据记录到一个临时文件时，我的解析器用Java编写的捡起来。

现在写我的鉴定/重新编码的代码如下所示：

  // guess encoding if utf-16 then
  // convert to UTF-8 first
  try {
    FileInputStream fis = new FileInputStream(args[args.length-1]);
    byte[] contents = new byte[fis.available()];
    fis.read(contents, 0, contents.length);

    if ( (contents[0] == (byte)0xFF) && (contents[1] == (byte)0xFE) ) {
      String asString = new String(contents, "UTF-16");
      byte[] newBytes = asString.getBytes("UTF8");
      FileOutputStream fos = new FileOutputStream(args[args.length-1]);
      fos.write(newBytes);
      fos.close();
    }

    fis.close();
    } catch(Exception e) {
      e.printStackTrace();
  }

UPDATE

我想支持的东西像欧元，长划线，这样其他字符。我修改了上面的代码看起来像这样，它似乎把我的那些人物的所有测试：

  // guess encoding if utf-16 then
  // convert to UTF-8 first
  try {
    FileInputStream fis = new FileInputStream(args[args.length-1]);
    byte[] contents = new byte[fis.available()];
    fis.read(contents, 0, contents.length);
    byte[] real = null;

    int found = 0;

    // if found a BOM then skip out of here... we just need to convert it
    if ( (contents[0] == (byte)0xFF) && (contents[1] == (byte)0xFE) ) {
      found = 3;
      real = contents;

    // no BOM detected but still could be UTF-16
    } else {

      for(int cnt=0; cnt<10; cnt++) {
        if(contents[cnt] == (byte)0x00) { found++; };

        real = new byte[contents.length+2];
        real[0] = (byte)0xFF;
        real[1] = (byte)0xFE;

        // tack on BOM and copy over new array
        for(int ib=2; ib < real.length; ib++) {
          real[ib] = contents[ib-2];
        }
      }

    }

    if(found >= 2) {
      String asString = new String(real, "UTF-16");
      byte[] newBytes = asString.getBytes("UTF8");
      FileOutputStream fos = new FileOutputStream(args[args.length-1]);
      fos.write(newBytes);
      fos.close();
    }

    fis.close();
    } catch(Exception e) {
      e.printStackTrace();
  }

你怎么都认为？