I've read through a few stackoverflows on this topic, as well as the wikipedia on A* but I'm still a little confused. I think this post almost explained it completely to me: A* heuristic, overestimation/underestimation?
My only confusion left is, how does the A* know the optimal solution? It seems like with an admissible heuristic, you can throw out paths that exceed the known optimal solution, because the heuristic is guaranteed to be less than or equal. But how would A* know the optimal ahead of time?
Would this search work and guarantee an optimal solution if you didn't know the optimal path cost?
A* does not know the optimal solution, the heuristic gives only an educated guess which helps to accelerate the search process. Seeing that you already read some theoretical explanations, let's try a different approach here, with an example:
If you're looking for proofs, here is an informal one from Wikipedia for admissibility:
And for optimality:
You may also want to check this video: A* optimality proof.
It achieves it by passing through all possible variants/chances using heuristic method. So you will have all needed tiles/vertices/waypoints in you closed list.