open() function in Linux with extended characters

2019-08-23 03:44发布

问题:

When i try to create a file in LINUX using open() function, i get an error '-1' for the filename that contains extended character (ex: Björk.txt). Here the file contains a special character ö (ASCII 148)

I am using the below code:

char* szUnixPath

/home/user188/Output/Björk.txt

open(szUnixPath, locStyle, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

I always get a -1 error, and NO FILE is created.

As the OS encounters the ASCII 148, it throws an error.

The same function works perfectly fine if i use a tilde ~ (ASCII 126, example: Bj~rk.txt) or any other character below ASCII value 128.

can somebody explain why do i get the -1 error only for filename having special character ranging between 128-255 ?

回答1:

I recommend just trying yourself to see what bytes this name contains.

Create the file in a directory, then run the following simple C program:

#include <dirent.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
    /* Open directory */
    DIR * currdir = opendir(".");

    /* Iterate over files */
    struct dirent * directory_entry = NULL;
    while (NULL != (directory_entry = readdir(currdir)))
    {
        char * entry_name = directory_entry->d_name;
        printf("Directory entry: %s\n", entry_name);
        printf("Name bytes (len: %d):\n", strlen(entry_name));
        for (size_t i = 0; i < strlen(entry_name); ++i)
        {
            printf("\tname[%d] = %d\n", i, entry_name[i]);
        }
    }

    return 0;
}

We can easily see in the output that 'Björk' length is 6-bytes. And we can see these bytes values:

Directory entry: Björk
Name bytes (len: 6):
    name[0] = 66
    name[1] = 106
    name[2] = -61
    name[3] = -74
    name[4] = 114
    name[5] = 107


回答2:

Filenames in Linux are generally specified in UTF-8, not CP437. The open is failing because the filename you're passing doesn't match the one in the OS.

Try opening this file instead: /home/user188/Output/Bj\xc3\xb6rk.txt. This is the special character encoded in UTF-8 as two bytes.



标签: c++ c linux