I am trying to understand the difference between NP-Complete and NP-Hard.
Below is my understanding
An NP-Hard problem is one that is not solvable in polynomial time but can be verified in polynomial time.
An NP-Complete problem is one that is in NP and is also NP-Hard.
Is the above definition correct? If so, What about problems not In NP but NP-Hard. Wouldn't they be harder than NP-Complete problem, say they can only be solved and verified in exponential time?
NP-Hard is lower bound on the problem. Impossible problems are also NP-Hard. NP-Complete means that it is NP-Hard and at the same time NP-Solvable.
Problems that can be verified in polynomial time is one of the definitions of problems in NP.
A
NP
problem (notNP-Hard
problem) is a decision problem which can be verified in polynomial time. Maybe they are solvable in polynomial time, since all problems inP
are also inNP
.A
NP-complete
problem is a decision problem, which allNP
problems can reduced to in polynomial time. They are the hardest problems in the classNP
.The
NP-hard
class is the class of the problems which are at least as hard as theNP-complete
problem. They are not necessarily a decision problem. Given that we don't know whetherNP = P
or not, it would be hard to say whether we can verify aNP-hard
problem in polynomial time.For example, the decision problem of maximum clique (Give a graph
G
an integerK
, to tell whether there is a complete graph with at leastK
vertices ) isNP
problem. It is alsoNP-complete
andNP-hard
. However, maximum clique problem (Find the maximum clique in the given Graph) is notNP
orNP-complete
, since it is not decision problem. We can say it isNP-hard
, since it is at least as hard as the decision version of maximum clique problem.Your definition for NP-Hard is not correct, it looks more like the (not precisely correct) definition of the complexity class NP.
What is the complexity class NP?
A computational problem
p
is in the complexity class NP if it can be efficiently verified. In complexity theory, we deem computation that takes polynomial time to be efficient. So formallyp ∈ NP
ifp
is polynomial-time verifiable.In your definition, you mentioned the concept polynomial-time solvable, which corresponds to the complexity class P. A NP-Complete problem is polynomial-time solvable if and only if P = NP. Note that the famous P vs NP is one of the biggest open problems in Computer Science, so currently no one knows whether P = NP or P ⊊ NP, and it is inappropriate to say that NP problems are not polynomial-time solvable (though it is widely believed to be the case).
What are NP-Hard problems?
Intuitively, NP-Hard problems are computational problems that are at least as hard as the problems in NP. When we say a computational problem
p
is at least as hard as another problemq
, we actually think about it reversely - if we can solvep
in time T, than we can also solveq
in time roughly the same as T (say, differ by a polynomial factor).More precisely, we say that
p
is at least as hard as another problemq
if there is a polynomial-time reduction fromq
top
. Roughly speaking, a polynomial-time reduction means given an algorithmA
that solvesp
, we can construct a polynomial-time algorithmB
by usingA
as black-box (i.e. we treat the time complexity ofA
asO(1)
) to solveq
.In our case of NP-Hard problem, if an NP-Hard problem can be solved in polynomial-time, then ALL NP problems can be solved in polynomial-time (and hence P = NP!). So it is widely believed that NP-hard problems are NOT polynomial-time solvable.
What are NP-Complete problems?
As you have stated correctly in your question, a computational problem
p
is NP-Complete if it is NP-Hard andp ∈ NP
.NP-Hard problems that are not in NP?
If there exists a NP-Hard problem that is not in NP (to the best of my knowledge, no such problem has been proved to fall in this category at this moment of time), such problem is harder than NP-Complete problems.
Proof: Suppose our claim is not true. Let
p
be a NP-Complete problem that is at least as hard as another problemq
that is NP-Hard but not in NP. Sincep
is at least as hard asq
, we have a polynomial-time reduction (say it runs in timeP(n)
) fromq
top
. Sincep
is in NP, it can be verified by some algorithmA
in timeT(n)
whereT
is a polynomial.Now given any instance
r
ofq
, we can construct an algorithmB
by first reduction it to an instances
ofp
, and then invokeA
to verifys
. Note thatB
verifiesq
in timeT(P(n))
, which is a polynomial inn
, it follows thatq
is in NP, which gives us a contradiction!Your definition is only correct for NP-complete.
Starting from the bottom: P is the class of problems that can be solved by some deterministic Turing machine in polynomial time. NP is the class of problems that can be solved by some non-deterministic Turing machine in polynomial time (or whose solutions can be verified by deterministic Turing machines in polynomial time).
As for NP-hard, it means decision problems X that have the following property: given a Turing machine that solves the problem, one could restructure (Turing reduction) any instance of a problem in NP to an instance of X in polynomial time. Informally, this means that NP-hard problems are those that are "at least as hard as NP", or that the solution for X could be applied to every problem in NP. Note that the problem doesn't have to be verifiable in polynomial time, or actually verifiable at all. NP-hard includes undecidable and unrecognizable problems as well.
We don't know if NP-hard includes problems that can be solved in polynomial time or not (the P ?= NP problem). Currently, not a single polynomial-time solution for a NP-hard problem has been found, but neither has it been proven that such solution can't exist. If such a solution was found for some NP-hard problem X, that would mean P = NP as any instance of any problem in NP could be converted to an instance of X in polynomial time (because of the Turing reduction property of NP-hard problems) and then be solved in polynomial time by X's polynomial time solution.
Let me make it simple.
A professor gives his students a problem, and asks them to provide an efficient algorithm.
Next day, some of his intelligent students have cracked the algorithm to solve it. It has a complexity of O(2n). Now, all are happy that they have got the algorithm to get the solution. Everything looks good.
The professor appreciates them, but says, "The task is not yet over", and challenges them to solve it practically using a system.
So, they immediately try to emulate it in the system. A student says, his system has a fantastic speed of 1 GIPS (1000,000,000 instructions per second) and that it can solve the problem within fractions of a second. So, they code their algorithm and try to execute it.
Then they start with 100 inputs to the data set, and they run it. They were surprised to see that the program runs and runs and runs and doesn't come to a halt.
Then another student did a math on it and figured out that, the system would take 2100 / 109 seconds to solve it. Roughly around 240 years.
Next day, while the program was still running, the professor said, "Very well. My dear students, this is what we call NP-Hard. The system might give the solution one day, but I'm afraid we won't be there to see it".
But, the same problem, once it generates a solution, if we are able to verify the solution of a NP-Hard problem in realistic time, then it's called NP-Complete. For example, Sum of Subsets is a NP-Hard problem. But, once we get a subset solution, we can check it easily in polynomial time. So it becomes NP-Complete.