I have a solver that solves normal symmetric TSP problems. The solution means the shortest path via all the nodes with no restriction on which nodes are the first and the last ones in the path.
Is there a way to transform the problem so that a specific node can be ensured as the start node, and another node as the end node?
One way would be to add an I - a very large distance - to all distances between these start/end nodes and all the others (adding I twice to the distance between start and end node), so the solver is tempted to visit them only once (thus making them as the start and the end of the path).
Are there any big disadvantages of this approach, or is there a better way to do this?
You can add a dummy node, which connects to start and end node with edges with weight 0. Since the TSP must contain the dummy node, the final result must contain the sequence start - dummy node - end (there is no other way to reach the dummy node). Therefore, you can get the shortest Hamilton path with specified start and end node. This solution should work even if the edges in the graph are negative.
Below is a visualization of the "dummy node" concept. On the left is a normal TSP with the same start and end node, A, and the optimal solution [A, B, E, D, C, A]. On the right is the same TSP but where the start node is A and the end node is E. Its optimal solution [A, B, C, D, E] clearly has nothing to do with the one in the normal case. The way we can find that solution is by "hacking" the distance matrix of the TSP graph. At the bottom of the distance matrix the dummy node is inserted and its distances to node A and E are set to 0 and its distances to all other nodes are set to inf. When the solver then tries to search through the distance matrix to find the optimal sequence of nodes A, DUMMY, E will stay together, e.g. [A, B, C, D, E, DUMMY, A] and this can then be cleaned up to give [A, B, C, D, E].
PS. note that this type of hack can have a severe impact on an exact solver's performance. Exact TSP solvers are set up with various geometric heuristics and putting in zero and inf distances clearly messes with that. I e.g. tried this for Concorde and it was not very happy about it and needed much more time to find optimal solutions sometimes. Didn't find any documentation for it to deal with this specific case but maybe there are exact solvers that are specifically optimized for it. If you use a non-exact method instead (e.g. simulated annealing) this is not an issue.