可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

As far as I know typical workflow of TDD is based on black box testing. First we define interface then write one or set of test and then we implement code that pass all tests. So look at the example below:

from abc import ABCMeta


class InterfaceCalculator:
    __metaclass__ = ABCMeta

    @abstractmethod
    def calculate_mean(self):
        pass

Exemplary test case

from unittest import TestCase


class TestInterfaceCalculator(TestCase):

    def test_should_correctly_calcluate_mean(self):
        X=[1,1]
        expected_mean = 1
        calcluator =Calculator()
        self.assertAlmostEqual(calculator.calculate_mean(X), expected_mean)

I skip implementation of the class Calculator(InterfaceCalculator) because it is trivial.

The following idea is pretty easy to understand. How about Machine Learning? Let consider the following example. We would like to implement cat, dog photo classifier. Start from the interface.

from abc import ABCMeta


class InterfaceClassifier:
    __metaclass__ = ABCMeta

    @abstractmethod
    def train_model(self, data):
        pass

    @abstractmethod
    def predict(self, data):
        pass

I prepared very sill set of the unittests

from unittest import TestCase


class TestInterfaceCalculator(TestCase):
    def __init__(self):
        self.model = CatDogClassifier()

    def test_should_correctly_train_model(self, data):
        """
        How can be implemented?
        """
        self.model.train_model(data)

    def test_should_correctly_calcluate_mean(self):
        input ="cat.jpg"
        expected_result = "cat"
        calcluator =.assertAlmostEqual(self.model.preditct(input), expected_result)

Is it the way to use TDD to help work on machine learning model? Or In this case TDD is useless. It, only can help us to verify correctness of input data and add very high level test of the trained model? How can I create good automatic tests?

回答1:

With TDD, you describe the expected behavior in the form of a test and then create the code to satisfy the test. While this can work well for some components of your machine learning model, it usually doesn't work well for the high-level behavior of a machine learning model, because the expected behavior is not precisely known in advance. The process of developing a machine learning model often involves trying different approaches to see which one is most effective. The behavior is likely to be measured in terms of percentages, e,g, recognition is 95% accurate, rather than absolutes.

回答2:

I think you might be talking about two distinct goals here:

How can I improve my algorithm's performance? This would entail the correctness of labeling for a classification problem for example. But this could also mean a lot of other things such as how many hyper-parameter it requires, what the runtime is and so on. One particular problem in this category for example is tuning your model (lets say a logistic regression model) and that can be done standard mechanism of splitting data into training, validation and test set.
How can I catch bugs in my algorithm? This focuses on finding functional issues. In other words, issues that exist because the code was not written according to the design. Even though the design might be a bad one (which falls in point 1 above), the code should correctly implement it. This is where TDD has most value. Yes, for this to be useful the tester code should have specific parameters to validate and assert.