寻找一个C ++实现的C4.5算法(Looking for a C++ implementation

我一直在寻找一个C ++实现的C4.5算法，但我一直没能找到一个还没有。我发现昆兰的C4.5版本8 ，但它是用C写的......已经有人看到了C4.5算法的任何开源C ++实现？

我正在考虑移植J48的源代码（或简单地围绕着C版写的包装），如果我不能找到一个开源C ++实现在那里，但我希望我没有做！请让我知道如果你遇到一个C ++实现的算法。

更新

我一直在考虑写周围的C实现的C5.0算法的薄C ++包装 （的选项C5.0是C4.5的改进版）。我下载并编译C实现C5.0算法，但它看起来并不像它很容易移植到C ++。 C实现使用了大量的全局变量和简单地写周围的C函数薄C ++包装将不会导致面向对象的设计，因为每一个类实例将修改同一个全局参数。换句话说： 我不会有任何的封装，这就是我需要一个非常基本的东西。

为了得到封装我需要让C代码的完全成熟的港口到C ++，这是大致相同移植Java版本（J48）为C ++。

2.0更新

下面是一些具体的要求：

每个分类实例必须封装自身的数据（从恒者即没有全局变量除外）。
支持分类的同时培训和分类的同时评估。

这里是一个很好的情景，我做10倍交叉验证，我想同时培训10个决策树各自的训练集的片。如果我只是运行C程序针对每个片我会运行10个进程，这并不可怕。但是，如果我需要数千分类数据的实时采样，然后我会开始一个新的进程，因为我想和分类，这不是非常有效的每个样品。

Answer 1:

我可能已经找到了一个可能的C ++ C5.0（See5.0）的“执行” ，但我一直没能挖成的源代码足以确定它是否真的像宣传的那样。

要重申，我原来的顾虑，该端口的撰文者对C5.0算法如下：

与See5Sam [C5.0]的另一个缺点是在同一时间，以有一个以上的应用程序树是不可能的。应用程序从文件中的每个可执行文件运行，并存储在全局变量在这里和那里的时间阅读。

我会尽快更新我的答案，因为我得到了一段时间寻找到的源代码。

更新

它看起来很不错，这里是C ++接口：

class CMee5
{
  public:

    /**
      Create a See 5 engine from tree/rules files.
      \param pcFileStem The stem of the See 5 file system. The engine
             initialisation will look for the following files:
              - pcFileStem.names Vanilla See 5 names file (mandatory)
              - pcFileStem.tree or pcFileStem.rules Vanilla See 5 tree or rules
                file (mandatory)
              - pcFileStem.costs Vanilla See 5 costs file (mandatory)
    */
    inline CMee5(const char* pcFileStem, bool bUseRules);

    /**
      Release allocated memory for this engine.
    */
    inline ~CMee5();

    /**
      General classification routine accepting a data record.
    */
    inline unsigned int classifyDataRec(DataRec Case, float* pOutConfidence);

    /**
      Show rules that were used to classify the last case.
      Classify() will have set RulesUsed[] to
      number of active rules for trial 0,
      first active rule, second active rule, ..., last active rule,
      number of active rules for trial 1,
      first active rule, second active rule, ..., last active rule,
      and so on.
    */
    inline void showRules(int Spaces);

    /**
      Open file with given extension for read/write with the actual file stem.
    */
    inline FILE* GetFile(String Extension, String RW);

    /**
      Read a raw case from file Df.

      For each attribute, read the attribute value from the file.
      If it is a discrete valued attribute, find the associated no.
      of this attribute value (if the value is unknown this is 0).

      Returns the array of attribute values.
    */
    inline DataRec GetDataRec(FILE *Df, Boolean Train);
    inline DataRec GetDataRecFromVec(float* pfVals, Boolean Train);
    inline float TranslateStringField(int Att, const char* Name);

    inline void Error(int ErrNo, String S1, String S2);

    inline int getMaxClass() const;
    inline int getClassAtt() const;
    inline int getLabelAtt() const;
    inline int getCWtAtt() const;
    inline unsigned int getMaxAtt() const;
    inline const char* getClassName(int nClassNo) const;
    inline char* getIgnoredVals();

    inline void FreeLastCase(void* DVec);
}

我会说，这是迄今为止我已经找到了最好的选择。

Answer 2:

A C ++实现的C4.5称为YaDT可在这里，在“决策树”部分：
http://www.di.unipi.it/~ruggieri/software.html

这是最后一个版本的源代码：
http://www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.5.zip

从其中所描述的工具的文件：

[...]在本文中，我们描述了一个决策树算法，它产生于C4.5的风格基于熵的决策树的新的划伤C ++实现。实现被称为YaDT， 又另决策树生成器的缩写。本文的预期贡献是提出，允许获得一个高效的系统实施的设计原则。我们讨论我们的选择对内存的代表性和数据和元数据模型，对算法的优化以及对内存和时间的演出效果，效率和剪枝启发式的精度之间的权衡。 [...]

该文件可在这里。