flex+bison output in a glib's hash container

2019-09-05 23:36发布

问题:

I have managed considerable progress in parsing the bib file, but the next step is quite tough for my present level of understanding. I have created bison and flex code, that parses the bib file above correctly:

%{
#include <stdio.h>
%}

// Symbols.
%union
{
    char    *sval;
};
%token <sval> VALUE
%token <sval> KEY
%token OBRACE
%token EBRACE
%token QUOTE
%token SEMICOLON 

%start Input
%%
Input: 
     /* empty */ 
     | Input Entry ;  /* input is zero or more entires */
Entry: 
     '@' KEY '{' KEY ','{ printf("===========\n%s : %s\n",$2, $4); } 
     KeyVals '}' 
     ;
KeyVals: 
       /* empty */ 
       | KeyVals KeyVal ; /* zero or more keyvals */
KeyVal: 
      KEY '=' VALUE ',' { printf("%s : %s\n",$1, $3); };

%%

int yyerror(char *s) {
  printf("yyerror : %s\n",s);
}

int main(void) {
  yyparse();
}

and

%{
#include "bib.tab.h"
%}

%%
[A-Za-z][A-Za-z0-9]*      { yylval.sval = strdup(yytext); return KEY; }
\"([^\"]|\\.)*\"|\{([^\"]|\\.)*\}     { yylval.sval = strdup(yytext); return VALUE; }
[ \t\n]                   ; /* ignore whitespace */
[{}@=,]                   { return *yytext; }
.                         { fprintf(stderr, "Unrecognized character %c in input\n", *yytext); }
%%

I want to have those values in a container. For last few days, I read the vast documentation of glib and came out with hash container as most suitable for my case. Below is a basic hash code, where it have the hashes correctly, once the values are put in the array keys and vals.

#include <glib.h>
#define slen 1024

int main(gint argc, gchar** argv) 
{
  char *keys[] = {"id", "type", "author", "year",NULL};
  char *vals[] = {"one",  "Book",  "RB", "2013", NULL};
  gint i;
  GHashTable* table = g_hash_table_new(g_str_hash, g_str_equal);
  GHashTableIter iter;
  g_hash_table_iter_init (&iter, table);
  for (i= 0; i<=3; i++)
  {
    g_hash_table_insert(table, keys[i],vals[i]);
    g_printf("%d=>%s:%s\n",i,keys[i],g_hash_table_lookup(table,keys[i]));
  }
}

The problem is, how I integrate this two code, i.e., use the parsed data in the C code. Any kind help is appreciated.

Edit: as to explain @UncleO 's response: @UncleO,

Thanks for your comment. I don't no how to explain it better. Here is a try. The recent status of my code(bison) is:

%{
#include <stdio.h>
#include <glib.h>
%}

// Symbols.
%union
{
    char    *sval;
};
%token <sval> VALUE
%token <sval> KEY
%token OBRACE
%token EBRACE
%token QUOTE
%token SEMICOLON 

%start Input
%%
Input: 
     /* empty */ 
     | Input Entry ;  /* input is zero or more entires */
Entry: 
     '@' KEY '{' KEY ','{ printf("===========\n%s : %s\n",$2, $4); } 
     KeyVals '}' 
     ;
KeyVals: 
       /* empty */ 
       | KeyVals KeyVal ; /* zero or more keyvals */
KeyVal: 
      KEY '=' VALUE ',' { printf("%s : %s\n",$1, $3); };

%%

int yyerror(char *s) {
  printf("yyerror : %s\n",s);
}

int main(void) {
 GHashTable* table = g_hash_table_new(g_str_hash, g_str_equal);
  char *keys[] = {"id", "type", "author", "year",NULL};
  char *vals[] = {"one",  "Book",  "RB", "2013", NULL};
  gint i;
  yyparse();
  GHashTableIter iter;
  g_hash_table_iter_init (&iter, table);
  for (i= 0; i<=3; i++)
  {
    g_hash_table_insert(table, keys[i],vals[i]);
    g_printf("%d=>%s:%s\n",i,keys[i],g_hash_table_lookup(table,keys[i]));
  }
}

with the lex file unchanged. The elements of array keys and vals are for testing purpose. A example of input file is

@Booklet{ab19,
    Author="Rudra Banerjee and A. Mookerjee",
    Editor="sm1",
    Title="sm2",
    Publisher="sm3",
    Volume="sm4",
    Issue="sm5",
    Page="sm6",
    Month="sm8",
    Note="sm9",
    Key="sm10",
    Year="1980",
    Add="osm1",
    Edition="osm2",
}

So, while parsing, the code parses the value correctly. I want to use those values from the parsed input to be inserted in the hash table, which will be different for each input. So, my final goal is to remove the arrays keys and vals from the code; and the line

g_hash_table_insert(table, keys[i],vals[i]);

should be replaced by something like:

g_hash_table_insert(table, <$1 from bison>,<$3 from bison>);

Does this makes sense?

Edit:=====================================

@Uncle0: Here is the updated code; probably my intention is clear with this one. I am trying a lot to fix this up, but while print from bison line is printing things as expected, its not the case while printing from the hash table (last line of the code)

%{
#include <stdio.h>
#include <glib.h>
#define slen 1024
GHashTable* table;
%}

// Symbols.
%union
{
    char    *sval;
};
%token <sval> VALUE
%token <sval> KEY
%token OBRACE
%token EBRACE
%token QUOTE
%token SEMICOLON 

%start Input
%%
Input: 
     /* empty */ 
     | Input Entry ;  /* input is zero or more entires */
Entry: 
     '@' KEY '{' KEY ','{ g_hash_table_insert(table, "TYPE", $2);
                  g_hash_table_insert(table, "ID", $4);
              g_printf("%s: %s\n", $2, $4);
              } 
     KeyVals '}' 
     ;
KeyVals: 
       /* empty */ 
       | KeyVals KeyVal ; /* zero or more keyvals */
KeyVal: 
      KEY '=' VALUE ',' { g_hash_table_insert(table, $1, $3);
                          g_printf("%s: %s\n", $1, $3); };

%%

int yyerror(char *s) {
  printf("yyerror : %s\n",s);
}

int main(void) {
  table = g_hash_table_new(g_str_hash, g_str_equal);
gint i;
do{
   g_hash_table_remove_all (table);
   yyparse();
   parse_entry (table);
//  g_printf("%s:%s\n","Author=>",g_hash_table_lookup(table,"Author"));
//  g_printf("%s:%s\n","KEY=>",g_hash_table_lookup(table,"KEY"));
  }
  while(!EOF);
}
void parse_entry (GHashTable *table)
{
  GHashTableIter iter;
  gchar *key, *val;
  char *keys[] = {"id", "type", "author", "year", "title", "publisher", "editor", 
    "volume", "number", "pages", "month", "note", "address", "edition", "journal",
    "series", "book", "chapter", "organization", NULL};
  char *vals[] = {NULL,  NULL,  NULL, NULL, NULL,
    NULL,  NULL,  NULL, NULL, NULL,
    NULL,  NULL,  NULL, NULL, NULL,
    NULL,    NULL,  NULL, NULL, NULL};

  gchar **kiter;
  gint i;
  g_hash_table_iter_init (&iter, table);
  while (g_hash_table_iter_next (&iter, (void **)&key, (void **)&val))
  {
    for (kiter = keys, i = 0; *kiter; kiter++, i++)
    {
      if (!g_ascii_strcasecmp(*kiter, key))
      {
    vals[i] = g_strndup(val,slen);
    break;
      }
    g_printf("%d=>%s:%s\n",i,keys[i],vals[i]);
    }
  }
}

回答1:

You haven't been clear on what you want to do with the input, but here is an explanation to get you started.

flex is going to take your file of regular expressions and produce a function called yylex().

bison is going to take your grammar file and produce a function called yyparse() that uses the yylex() function repeatedly to tokenize strings. The main() function will only call yyparse() once, and each time the yyparse() function matches a rule in the grammar, it will execute the code fragments you have specified. Right now, you are merely printing the values, but you can do other things like insert into the hash table or whatever you want.

The grammar.y file has sections for code that comes before the definition of yyparse() and code that comes after. It is okay to put the main() function at the end of this file if you want to, but it is only better to put it in another file an link the two. Usually, the main() function does things like open the input for reading, etc., then calls yyparse() to perform the bulk of the work. After yyparse() returns, main can clean up.

EDIT: Hi Rudra,

I see you want to keep main() in the grammar file. That's okay.

All you need to do now is change the printf statements in the snippets to insert into the table, The table variable will have to be declared outside of main() for yyparse() to see it.

%{
#include <stdio.h>
#include <glib.h>

GHashTable* table;

%}

// Symbols.
%union
{
    char    *sval;
};
%token <sval> VALUE
%token <sval> KEY
%token OBRACE
%token EBRACE
%token QUOTE
%token SEMICOLON 

%start Input
%%
Input: 
     /* empty */ 
     | Input Entry ;  /* input is zero or more entires */
Entry: 
     '@' KEY '{' KEY ','{ printf("===========\n%s : %s\n",$2, $4); } 
     KeyVals '}' 
     ;
KeyVals: 
       /* empty */ 
       | KeyVals KeyVal ; /* zero or more keyvals */
KeyVal: 
      KEY '=' VALUE ',' { g_hash_table_insert(table, $1, $3); printf("%s : %s\n",$1, $3); };

%%

int yyerror(char *s) {
  printf("yyerror : %s\n",s);
}

int main(void) {
  table = g_hash_table_new(g_str_hash, g_str_equal);

  yyparse();
}

Are you sure you don't want to do anything with the first terms in the data? It seems like you are not using them for anything.