Convert unicoded string to corresponding string in

2019-08-16 22:03发布

问题:

I need to convert a unicoded string to its appropriate language. I need to read from a text file line by line. There is a possibility that a line may contain a unicode some thing like this

\xE6\xAC\xA2\xE8\xBF\x8E

This is basically a chinese text which is equal to

欢迎

Now I need to remove this line (\xE6\xAC\xA2\xE8\xBF\x8E) from text file, convert this unicode to chinese text, append this chinese text to the text file.

Below is the content of my data.txt file:

testing
programming
\xE6\xAC\xA2\xE8\xBF\x8E
development

I would like to get the file content as:

testing
programming
development
欢迎

Below is what I have done so far

#include <stdio.h>
#include <string.h>
#include <stdlib.h>


#define MAX 256

  int main() 
  {
        int ctr = 0;
        char ch;
        FILE *fptr1, *fptr2;
        char fname[MAX] = "data.txt";
        char str[MAX], temp[] = "temp.txt";
        char str2[256];

        fptr1 = fopen(fname, "r");
        if (!fptr1) 
        {
                printf(" File not found or unable to open the input file!!\n");
                return 0;
        }
        fptr2 = fopen(temp, "w"); // open the temporary file in write mode 
        if (!fptr2) 
        {
                printf("Unable to open a temporary file to write!!\n");
                fclose(fptr1);
                return 0;
        }

        // copy all contents to the temporary file except the specific line with unicode characters
        while (!feof(fptr1)) 
        {
            strcpy(str, "\0");
            fgets(str, MAX, fptr1);
            if (!feof(fptr1)) 
            {
                ctr++;
                if(strstr(str,"\\")!=NULL)
                {
                    memset(str2,'\0',sizeof(str2));
                    printf("Input String Contains Unicode Character\n");                    
                    str[strlen(str)-1]='\0';

                    sprintf(str2,"echo %s >> data.txt",str);
                    printf("Final String: %s\nUnicode String Size: %ld\n",str2,strlen(str));
                    system(str2);
                }
                else
                {

                    fprintf(fptr2, "%s", str);                  
                }
            }
        }
        fclose(fptr1);
        fclose(fptr2);
        remove(fname);          // remove the original file 
        rename(temp, fname);    // rename the temporary file to original name
/*------ Read the file ----------------*/
   fptr1=fopen(fname,"r"); 
            ch=fgetc(fptr1); 
          printf(" Now the content of the file %s is : \n",fname); 
          while(ch!=EOF) 
            { 
                printf("%c",ch); 
                 ch=fgetc(fptr1); 
            }
        fclose(fptr1);
/*------- End of reading ---------------*/
        return 0;

  } 

When tried to compile and run this code, below is the output I am seeing

Input String Contains Unicode Character
Final String: echo \xE6\xAC\xA2\xE8\xBF\x8E >> data.txt
Unicode String Size: 24
 Now the content of the file data.txt is : 
testing
programming
development
xE6xACxA2xE8xBFx8E

The same code when changed the below lines, it was working as expected

 sprintf(str2,"echo %s >> data.txt",str); 
 sprintf(str2,"echo %s >> data.txt","\xE6\xAC\xA2\xE8\xBF\x8E");

But when the value is read from file it was not working.

Also this line, the string is identified as unicode string with correct size

printf("Final String: %s\nUnicode String Size: %ld\n",str2,strlen(str));
The String Size: 6

Can some one please let me know, how to convert the value to chinese when read from text file.

回答1:

You'd have to identify the \x positions in your line, say pointer p then points to the next character. Now

char hex[3] = { p[0], p[1], 0 }; 
char val = strtoul(hex, 0, 16);
p += 2;

will return the value of the following two bytes interpreted in hex in val.



回答2:

I was able to get the conversion done. Below is my final code

                if(strstr(str,"\\")!=NULL)
                {
                    memset(str2,'\0',sizeof(str2));
                    printf("Input String Contains Unicode Character\n");                    
                    str[strlen(str)-1]='\0';


                    sprintf(str2,"echo %s | sed \'s/[\\\\x]//g\' | xxd -r -p >> data.txt",str);
                    printf("Final String: %s\nUnicode String Size: %ld\n",str2,strlen(str));
                    system(str2);
                }

Thanks for all your response and thanks @chux for your pointer