How can I run a python script on many files to get

2019-08-15 08:45发布

问题:

I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.

My script knows which file to process because the beginning of my script is:

my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")

The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).

Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?

回答1:

I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.

import glob
import os

# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
    vcf_file = open(vcf_filename, 'r+')

    # Similar name with a different extension
    output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
    outputfile = open(output_filename, 'w')

    # Process the data
    ...

To output the resulting files in a separate directory I would:

import glob
import os

output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)

# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
    vcf_file = open(vcf_filename, 'r+')

    # Similar name with a different extension
    output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
    outputfile = open(os.path.join(output_dir, output_filename), 'w')

    # Process the data
    ...


回答2:

You don't need write shell script, maybe this question will help you?

How to list all files of a directory?



回答3:

It depends on how you implement the iteration logic.

  1. If you want to implement it in python, just do it;

  2. If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.



回答4:

I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:

pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered 

file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want

You need to import os to use the os.listdir command.



回答5:

You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example

import os
import glob
for file in glob.glob('*.py'):
    data = open(file, 'r+')
    output_name = os.path.splitext(file)[0]
    output = open(output_name+'.txt', 'w')
    output.write(data.read())

This code will read the content from input and store it in outputfile.