There are models to recognize 2-class sounds, which are class-A and class-B.
How to recognize class-C sounds as abnormal sound ?
I tried to set a threshold while recognizing by frames.
above 70% -> class A or B
else -> abnormal
For example,
If a sound has 10 frames, and the result is
frame 1 2 3 4 5 6 7 8 9 10
A B A B A A A B A A A=7 B=3
-> class A
frame 1 2 3 4 5 6 7 8 9 10
B B A B A A A B A A A=6 B=4
-> abnormal
The performance is very bad.
what should I do ?
There are two ways to look at this: as a classification problem, and as an outlier detection problem.
Classification
As a classification problem, it would be possible to bring in outside sounds which may be encountered in the application of your system and use that to create a third class. It is important for this third class to have a large variety of sounds, and potentially a large number of them.
With this, you may want to use cost sensitive one vs all so adjust the precision / recall for picking out classes A and B.
The benefit of this method is you do not have to set an arbitrary threshold for an outlier / anomaly model. Distance may be hard to measure in this context, so finding a proper threshold could be difficult.
Many people, including myself used this technique on a kaggle competition which is similar to your problem. https://www.kaggle.com/c/axa-driver-telematics-analysis
Outlier / Anomaly detection
Since you are using a neural network, it could be possible to build an autoencoder. This will find a manifold of sounds which represent the sounds you are trying to detect. You could use the reconstruction loss as your distance measure for anomaly detection. This will still require you determine a threshold, and it is good to use some existing anomaly / outlier data to do this.