Modifying warnings that seems to come from nowhere

2019-08-27 15:14发布

问题:

I forked a repository named rasa_nlu to work on a part of the code I want to modify : there is a function component.train(...) inside of a function train(...) in a file model.py which seems to trigger warnings without providing the origin and I want to find what trigger it.

Basically it applies this function to a list of components:

[<rasa_nlu.utils.spacy_utils.SpacyNLP object at 0x7f3abbfbd780>, <rasa_nlu.tokenizers.spacy_tokenizer.SpacyTokenizer object at 0x7f3abbfbd710>, <rasa_nlu.featurizers.spacy_featurizer.SpacyFeaturizer object at 0x7f3abbfbd748>, <rasa_nlu.featurizers.regex_featurizer.RegexFeaturizer object at 0x7f3abbd1a630>, <rasa_nlu.extractors.crf_entity_extractor.CRFEntityExtractor object at 0x7f3abbd1a748>, <rasa_nlu.extractors.entity_synonyms.EntitySynonymMapper object at 0x7f3abbd1a3c8>, <rasa_nlu.classifiers.sklearn_intent_classifier.SklearnIntentClassifier object at 0x7f3abbd1a240>]

And it seems that the last one triggers the warnings.

I tried to modify the function train() in the components.py file of the repository and it didn't changed anything so I suspect it is not the right one.

Anyway here is the code train(...) in a file model.py:

...

import rasa_nlu
from rasa_nlu import components, utils, config
from rasa_nlu.components import Component, ComponentBuilder
from rasa_nlu.config import RasaNLUModelConfig, override_defaults
from rasa_nlu.persistor import Persistor
from rasa_nlu.training_data import TrainingData, Message
from rasa_nlu.utils import create_dir, write_json_to_file

...

class Trainer(object):
    """Trainer will load the data and train all components.

    Requires a pipeline specification and configuration to use for
    the training."""

    # Officially supported languages (others might be used, but might fail)
    SUPPORTED_LANGUAGES = ["de", "en"]

    def __init__(self,
                 cfg,  # type: RasaNLUModelConfig
                 component_builder=None,  # type: Optional[ComponentBuilder]
                 skip_validation=False  # type: bool
                 ):
        # type: (...) -> None

        self.config = cfg
        self.skip_validation = skip_validation
        self.training_data = None  # type: Optional[TrainingData]

        if component_builder is None:
            # If no builder is passed, every interpreter creation will result in
            # a new builder. hence, no components are reused.
            component_builder = components.ComponentBuilder()

        # Before instantiating the component classes, lets check if all
        # required packages are available
        if not self.skip_validation:
            components.validate_requirements(cfg.component_names)

        # build pipeline
        self.pipeline = self._build_pipeline(cfg, component_builder)

    ...

    def train(self, data, **kwargs):
        # type: (TrainingData) -> Interpreter
        """Trains the underlying pipeline using the provided training data."""
        self.training_data = data

        context = kwargs  # type: Dict[Text, Any]

        for component in self.pipeline:
            updates = component.provide_context()
            if updates:
                context.update(updates)

        # Before the training starts: check that all arguments are provided
        if not self.skip_validation:
            components.validate_arguments(self.pipeline, context)

        # data gets modified internally during the training - hence the copy
        working_data = copy.deepcopy(data)
        for i, component in enumerate(self.pipeline):
            logger.info("Starting to train component {}"
                        "".format(component.name))
            component.prepare_partial_processing(self.pipeline[:i], context)
            print("before train")
            updates = component.train(working_data, self.config,
                                      **context)
            logger.info("Finished training component.")
            print("before updates")
            if updates:
                context.update(updates)
        return Interpreter(self.pipeline, context)

And the output is

before train
before updates
before train
before updates
before train
before updates
before train
before updates
before train
before updates
before train
before updates
before train
Fitting 2 folds for each of 6 candidates, totalling 12 fits
/home/mike/Programming/Rasa/myflaskapp/rasaenv/lib/python3.5/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/home/mike/Programming/Rasa/myflaskapp/rasaenv/lib/python3.5/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/home/mike/Programming/Rasa/myflaskapp/rasaenv/lib/python3.5/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/home/mike/Programming/Rasa/myflaskapp/rasaenv/lib/python3.5/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/home/mike/Programming/Rasa/myflaskapp/rasaenv/lib/python3.5/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/home/mike/Programming/Rasa/myflaskapp/rasaenv/lib/python3.5/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.1s finished
before updates
trainer.persist:

You can see here the warnings which I want to catch and modify to know the origin UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.

Therefore, can you see where does this warnings come from? What calls for sklearn/metrics/classification.py?

回答1:

This is a documented issue on the Rasa NLU repository. I would recommend following these issues or adding your comments there for resolution. One is marked as help wanted, meaning they are looking for a community contribution to address it.

  • Training on demo-rasa.json results in UndefinedMetricWarning
  • Show name of intent(s) that lack enough train examples

The tl:dr on why the warning occurs from the first issue linked above:

so the warning is just a warning. It indicates that there are too few training examples for one / some of the intents. Adding more examples will fix this (thats why adding duplicates will remove this warning, but really you should be adding different examples).

If you want the warning to go away add more training data. Use the evaluation.py script to find the intents that are lacking.

From the warning message you can see it is produced from sklearn/metrics/classification.py which is this file here.