1/0 Knapsack Variation with Weighted Edges

I'm currently investigating a routing problem (finding a subset of places [each with a certain score] I want to visit while not exceeding a maximum travel time) and came up with a variation of the 1/0 knapsack problem that seems to solve my original problem.

According to Wikipedia the 1/0 knapsack is described as:

Given a set of items, each with a mass and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

So for each item there is a fixed weight (mass) that can easily be used when trying to solve the problem, for instance using Dynamic Programming.

But what if the weight of a specific item is dependent on the previous content of the bag? In other words (and in a more general way): Let's consider the following complete graph:

Each node (A,B,C,D,E) represents an item I might want to put into my knapsack. Let's assume each node also has a certain value assigned to it (left out in the graph). I still want to have an optimal knapsack, thus a subset of the nodes with the maximum scores, but this time the weight (or costs of adding a specific node to my current knapsack) is not assigned to the node itself but rather to the edge leading to it.

This means that the costs of adding a node depend on the previous content of the knapsack (for example by using the edge with the lowest cost from any of the already included nodes). For instance, if my current knapsack consisted of {A,C} the costs of adding B were 2 (taking either A->B or C->B). If my current knapsack consisted of {D,E} the costs of adding B were 3 instead.

Unfortunately I can't really come up with a good algorithm to solve this problem. The classical knapsack DP approach doesn't really work here because you can easily construct cases where it does not return the optimal solution. For example if you start building your knapsack with the "wrong" nodes the costs to add a very good node (which should be included in an optimal solution but is tried very late) might exceed the capacity.

Am I taking a completely wrong approach to the problem? Do you think there is a better algorithm to tackle this problem?

First of all, this problem is NP-hard. Here is a reduction from the shortest hamiltonian path in a weighted complete graph to this one. Given a graph, we can assign values of all nodes to 1 and then run binary search over the answer. If there was a polynomial solution for this problem, it could tell if there is a path that contains all vertices and is not longer than a given value. Thus, we would be able to solve the shortest hamiltonian path problem in polynomial time. In practice it means that no one knows an efficient correct polynomial solution to your problem.

Now there are two ways to go:

If the number of vertices is rather small(around 20), you can use dynamic programming to get an exact solution. The state is (mask, last_vertex). The value is the shortest possible time we need to visit all vertices in the mask and stop in the last_vertex. The time complexity of this solution is O(2^n * n^2).
If the number of vertices is much bigger, you can find an approximate solution. There are a lot ways to do it: heuristics, different greedy algorithms, random sampling with local optimizations and so on.