I write a function to copy files from directory A to directory B recursive.
The code is like this:
import os
import shutil
import sys
from os.path import join, exists
def copy_file(src, dest):
for path, dirs, files in os.walk(src, topdown=True):
if len(dirs) > 0:
for di in dirs:
copy_file(join(path, di), join(dest, di))
if not exists(dest):
os.makedirs(dest)
for fi in files:
shutil.copy(join(path, fi), dest)
In my test, the input args are like this:
src = d:/dev
and it have one sub directory named py
. Also, py
has a sub directory named test
dest = d:/dev_bak
So, when i test my code, something strange happened.
In my dest
directory which is d:/dev_bak
, three sub directories are created.
That is: d:/dev_bak/py
; d:/dev_bak/py/test
; d:/dev_bak/test
.
In my design, the structure of dev_bak
will be same as dev
. So, why this happened!
You can easily diagnose this by putting
print path, dirs, files
right below
for path, dirs, files in os.walk(src, topdown=True):
Essentially, you're recursing twice.
By itself, os.walk
descends into subdirectories. You're double-descending by recursively calling your own function. Here is some example output from that print
statement:
>>> copy_file("c:\Intel", "c:\Intel-Bak")
c:\Intel ['ExtremeGraphics', 'Logs'] []
c:\Intel\ExtremeGraphics ['CUI'] []
c:\Intel\ExtremeGraphics\CUI ['Resource'] []
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\ExtremeGraphics\CUI ['Resource'] []
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\Logs [] ['IntelChipset.log', 'IntelControlCenter.log', 'IntelGFX.log', 'IntelGFXCoin.log']
c:\Intel\ExtremeGraphics ['CUI'] []
c:\Intel\ExtremeGraphics\CUI ['Resource'] []
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\ExtremeGraphics\CUI ['Resource'] []
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\ExtremeGraphics\CUI\Resource [] ['Intel\xae Graphics and Media Control Panel.lnk', 'Intel\xae HD Graphics.lnk']
c:\Intel\Logs [] ['IntelChipset.log', 'IntelControlCenter.log', 'IntelGFX.log', 'IntelGFXCoin.log']
As you can see, the directories get visited twice.
You should fix the logic of your program so it visits each directory only once, but theoretically you could just ignore any directory you've already been to:
visited = []
def copy_file(src, dest):
for path, dirs, files in os.walk(src, topdown=True):
if path not in visited:
for di in dirs:
print dest, di
copy_file(join(path, di), join(dest, di))
if not exists(dest):
os.makedirs(dest)
for fi in files:
shutil.copy(join(path, fi), dest)
visited.append(path)
The shutil module already has a copytree function which will copy directories recursively. You might want to use it instead of providing your own implementation.