I'm designing a linguistic analyzer for French text. I have a dictionary in XML format, that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<Dictionary>
<!--This is the base structure for every entry in the dictionary. Values on attributes are given
as explanations for the attributes. Though this is the structure of the finished product for each word, definition, context and context examples will be ommitted as they don't have a real effect on the application at this moment. Defini-->
<Word word="The word in the dictionary (any word that would be defined)." aspirate="Whether or not the word starts with an aspirate h. Some adjectives that come before words that start with a non-aspirate h have an extra form (AdjectiveForms -> na [non-aspirate]).">
<GrammaticalForm form="The grammatical form of the word is the grammatical context in which it is used. Forms may consist of a word in noun, adjective, adverb, exclamatory or other form. Each form (generally) has its own definition, as the meaning of the word changes in the way it is used.">
<Definition definition=""></Definition>
</GrammaticalForm>
<ConjugationTables>
<NounForms ms="The masculin singular form of the noun." fs="The feminin singular form of the noun." mpl="The masculin plural form of the noun." fpl="The feminin plural form of the noun." gender="The gender of the noun. Determines"></NounForms>
<AdjectiveForms ms="The masculin singular form of the adjective." fs="The feminin singular form of the adjective." mpl="The masculin plural form of the adjective." fpl="The feminin plural form of the adjective." na="The non-aspirate form of the adjective, in the case where the adjective is followed by a non-aspirate word." location="Where the adjective is placed around the noun (before, after, or both)."></AdjectiveForms>
<VerbForms group="What group the verb belongs to (1st, 2nd, 3rd or exception)." auxillary="The auxillary verb taken by the verb." prepositions="A CSV list of valid prepositions this verb uses; for grammatical analysis." transitive="Whether or not the verb is transitive." pronominal="The pronominal infinitive form of the verb, if the verb allows pronominal construction.">
<Indicative>
<Present fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Present>
<SimplePast fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></SimplePast>
<PresentPerfect fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></PresentPerfect>
<PastPerfect fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></PastPerfect>
<Imperfect fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Imperfect>
<Pluperfect fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Pluperfect>
<Future fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Future>
<PastFuture fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></PastFuture>
</Indicative>
<Subjunctive>
<Present fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Present>
<Past fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Past>
<Imperfect fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Imperfect>
<Pluperfect fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Pluperfect>
</Subjunctive>
<Conditional>
<Present fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></Present>
<FirstPast fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></FirstPast>
<SecondPast fps="(Je) first person singular." sps="(Tu) second person singular." tps="(Il) third person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural." tpp="(Ils) third person plural."></SecondPast>
</Conditional>
<Imperative>
<Present sps="(Tu) second person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural."></Present>
<Past sps="(Tu) second person singular." fpp="(Nous) first person plural." spp="(Vous) second person plural."></Past>
</Imperative>
<Infinitive present="The present infinitive form of the verb." past="The past infinitive form of the verb."></Infinitive>
<Participle present="The present participle of the verb." past="The past partciple of the verb."></Participle>
</VerbForms>
</ConjugationTables>
</Word>
</Dictionary>
Sorry it's so long, but it's necessary to show exactly how the data is modeled (tree-node structure).
Currently I am using structs
to model the conjugation tables, nested structs
to be more specific. Here is the class I created to model what is a single entry in the XML file:
class Word
{
public string word { get; set; }
public bool aspirate { get; set; }
public List<GrammaticalForms> forms { get; set; }
struct GrammaticalForms
{
public string form { get; set; }
public string definition { get; set; }
}
struct NounForms
{
public string gender { get; set; }
public string masculinSingular { get; set; }
public string femininSingular { get; set; }
public string masculinPlural { get; set; }
public string femininPlural { get; set; }
}
struct AdjectiveForms
{
public string masculinSingular { get; set; }
public string femininSingular { get; set; }
public string masculinPlural { get; set; }
public string femininPlural { get; set; }
public string nonAspirate { get; set; }
public string location { get; set; }
}
struct VerbForms
{
public string group { get; set; }
public string auxillary { get; set; }
public string[] prepositions { get; set; }
public bool transitive { get; set; }
public string pronominalForm { get; set; }
struct IndicativePresent
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativeSimplePast
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativePresentPerfect
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativePastPerfect
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativeImperfect
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativePluperfect
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativeFuture
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct IndicativePastFuture
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct SubjunctivePresent
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct SubjunctivePast
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct SubjunctiveImperfect
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct SubjunctivePluperfect
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct ConditionalPresent
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct ConditionalFirstPast
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct ConditionalSecondPast
{
public string firstPersonSingular { get; set; }
public string secondPersonSingular { get; set; }
public string thirdPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
public string thirdPersonPlural { get; set; }
}
struct ImperativePresent
{
public string secondPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
}
struct ImperativePast
{
public string secondPersonSingular { get; set; }
public string firstPersonPlural { get; set; }
public string secondPersonPlural { get; set; }
}
struct Infinitive
{
public string present { get; set; }
public string past { get; set; }
}
struct Participle
{
public string present { get; set; }
public string past { get; set; }
}
}
}
I'm new to C#, and I'm not too familiar with the data structures. Based on my limited knowledge of C++, I know that structs
are useful when you are modeling small, highly-related pieces of data, which is why I am currently using them in this fashion.
All of these structs could realistically be made into a ConjugationTables class
, and would have, to a high degree, the same structure. I'm unsure of whether to make these into a class, or use a different data structure that would be better suited to the problem. In order to give some more information about the problem specifications, I'll say the following:
- Once these values have been loaded from the XML file, they will not be changed.
- These values will be read/fetched very often.
- The table-like structure must be maintained - that is to say that
IndicativePresent
must be nested underVerbForms
; the same applies to all other structs that are members of theVerbForms
struct. These are conjugation tables after all! - Perhaps the most important: I need the organization of the data to be set up in a way that, if for example a
Word
in the XML file does not have aGrammaticalForm
of verb, that noVerbForms
struct will actually be created for that entry. This is in an effort to improve efficiency - why instantiateVerbForms
if the word is not actually a verb? This idea of avoiding unnecessary creation of these "forms" tables (which are currently represented asstruct XXXXXForms
) is absolutely imperative.
In accordance with (primarily) point #4 above, what kinds of data structures would be best used in modeling conjugation tables (not database tables)? Do I need to change the format of my data in order to be compliant with #4? If I instantiate a new Word
, will the structs, in their current state, be instantiated as well and take up a lot of space? Here's some math... after Googling around and eventually finding this question...
In all the conjugation tables (nouns, adjectives, verbs), there are a total of (coincidence?) 100 string
s allocated, and that are empty. So 100 x 18 bytes = 1800 bytes for each Word
, at minimum, if these data structures are created and remain empty (there will always be at least some overhead for the values that would actually be filled in). So assuming (just randomly, could be more or less) 50,000 Word
s that would need to be in memory, that's 90 million bytes, or approximately 85.8307 megabytes.
That's a lot of overhead just to have empty tables. So what is a way that I can put this data together to allow me to instantiate only certain tables (noun, adjective, verb) depending on what GrammaticalForms
the Word
actually has (in the XML file).
I want these tables to be a member of the Word
class, but only instantiate the tables that I need. I can't think of a way around it, and now that I did the math on the structs
I know that it's not a good solution. My first thought is to make a class for each type of NounForms
, AdjectiveForms
, and VerbForms
, and instantiate the class if the form appears in the XML file. I'm not sure if that is correct though...
Any suggestions?