Why is translated Sudoku solver slower than origin

I transcribed my Java Sudoku solver into python. Everything works, however solving takes up to 2mins, while the identical puzzle takes only a few seconds in Java. Also the iterations needed amount to the exact same number. Am I missing something?

import numpy as np
def solve_recursive(puzzle, pos):
        if(pos == 81):
            print puzzle
            return True
        if(puzzle[pos] != 0):
            if (not solve_recursive(puzzle, pos+1)):
                return False
            else:
                return True

        row = np.copy(puzzle[pos//9*9:pos//9*9+9])
        col = np.copy(puzzle[pos%9::9])
        short = (pos%9)//3*3 + pos//27*27
        square = np.concatenate((puzzle[short:short+3],puzzle[short+9:short+12],puzzle[short+18:short+21]))

        for i in range(1,10):
            puzzle[pos] = i
            if(i not in row and i not in col and i not in square and solve_recursive(puzzle, pos+1)):
                return True

        puzzle[pos] = 0
        return False
puzzle = np.array([[0,0,0,0,0,0,0,8,3],
              [0,2,0,1,0,0,0,0,0],
              [0,0,0,0,0,0,0,4,0],
              [0,0,0,6,1,0,2,0,0],
              [8,0,0,0,0,0,9,0,0],
              [0,0,4,0,0,0,0,0,0],
              [0,6,0,3,0,0,5,0,0],
              [1,0,0,0,0,0,0,7,0],
              [0,0,0,0,0,8,0,0,0]])
solve_recursive(puzzle.ravel(), 0)

EDIT:

As suggested by @hpaulj I reworked my code to work with numpy´s 2D arrays:

import numpy as np
def solve_recursive(puzzle, pos):
        if pos == (0,9):
            print puzzle
            raise Exception("Solution")
        if(puzzle[pos] != 0):
            if(pos[0] == 8):
                solve_recursive(puzzle, (0,pos[1]+1))
                return
            elif pos[0] < 8:
                solve_recursive(puzzle, (pos[0]+1, pos[1]))
                return

        for i in range(1,10):
            if(i not in puzzle[pos[0]] and i not in puzzle[:,pos[1]] and i not in puzzle[pos[0]//3*3:pos[0]//3*3+3,pos[1]//3*3:pos[1]//3*3+3]):
                puzzle[pos] = i
                if(pos[0] == 8):
                    solve_recursive(puzzle, (0,pos[1]+1))
                elif pos[0] < 8:
                    solve_recursive(puzzle, (pos[0]+1, pos[1]))
        puzzle[pos] = 0
puzzle = np.array([[0,0,0,0,0,0,0,8,3],
          [0,2,0,1,0,0,0,0,0],
          [0,0,0,0,0,0,0,4,0],
          [0,0,0,6,1,0,2,0,0],
          [8,0,0,0,0,0,9,0,0],
          [0,0,4,0,0,0,0,0,0],
          [0,6,0,3,0,0,5,0,0],
          [1,0,0,0,0,0,0,7,0],
          [0,0,0,0,0,8,0,0,0]])
solve_recursive(puzzle, (0,0))

Ignoring the fact that throwing an exception at the bottom of the recursive calls is rather inelegant, this is just inconsiderably faster than my original solution. Would using dictionaries like the linked Norvig solver be a reasonable alternative?

I modified your function to print the pos and to maintain a running count of the times it's been called. And I stop it early.

Stopping at pos==46 results in 1190 calls, with just a slight visible delay. But for 47 the count is 416621, with a minute or more run.

Assuming it's doing some sort of recursive search, the problem had a quantum jump in difficulty between 46 and 47.

Yes, Python as an interpreted language will run slower than Java. Possible solutions include figuring out why there's this kind of jump in recursion calls. Or improving the speed of each call.

You set up 9x9 numpy array, but then immediately ravel it. The function itself then treats the board as a list of 81 values. That means selecting rows and columns and submatrices is much more complicated than if the array were still 2d. In effect the array is just a list.

I can imagine 2 ways of speeding up the calls. One is to recode it to use a list board. For small arrays and iterative actions lists have less overhead than arrays, so often are faster. An alternative is to code it to really take advantage of the 2d nature of the array. numpy solutions are good only if they let numpy use compiled code to perform most actions. Iteration over an array is slow.

==================

Changing your function so that it works with a flat list instead of the raveled array, runs much faster. For a max pos of 47, it runs in 15sec, versus 1m 15s for your original (same board and iteration count).

I'm cleaning up a 2d numpy array version, but not making it any faster.

A pure list version is also a good candidate for running faster on pypy.

A portion of the code that works with a 2d array

    r,c = np.unravel_index(pos, (9,9))            
    if(puzzle[r,c] != 0):
        return solve_numpy(puzzle, pos+1)
    row = puzzle[r,:].copy()
    col = puzzle[:,c].copy()
    r1, c1 = 3*(r//3), 3*(c//3)
    square = puzzle[r1:r1+3, c1:c1+3].flatten()
    for i in range(1,10):
        puzzle[r,c] = i
        if(i not in row and i not in col and i not in square):
            if solve_numpy(puzzle, pos+1):
                return True
    puzzle[r,c] = 0

Indexing is clearer, but there isn't a speed improvement. Other than simpler indexing, it doesn't make much use of whole array operations.

The list version doesn't look that much different from the original, but is much faster:

    row = puzzle[pos//9*9:pos//9*9+9]
    col = puzzle[pos%9::9]
    short = (pos%9)//3*3 + pos//27*27
    square = puzzle[short:short+3] + \
             puzzle[short+9:short+12] + \
             puzzle[short+18:short+21]

http://norvig.com/sudoku.html Discusses sudoku solution methods, with pythoN - by an AI expert.

With this Norvig solver, your grid solution takes 0.01 sec. Information is stored primarily in dictionaries. Your case is an easy one, than can be solved with his 2 basic assignment strategies. Without searching the solution is very fast.