Python - Convert negative decimals from string to

2019-08-06 06:58发布

问题:

I need to read in a large number of .txt files, each of which contains a decimal (some are positive, some are negative), and append these into 2 arrays (genotypes and phenotypes). Subsequently, I wish to perform some mathematical operations on these arrays in scipy, however the negative ('-') symbol is causing problems. Specifically, I cannot convert the arrays to float, because the '-' is being read as a string, causing the following error:

ValueError: could not convert string to float:

Here is my code as it's currently written:

import linecache

gene_array=[]
phen_array=[]

for i in genotype:

   for j in phenotype:

      genotype='/path/g.txt'
      phenotype='/path/p.txt'

      g=linecache.getline(genotype,1)
      p=linecache.getline(phenotype,1)

      p=p.strip()
      g=g.strip()

      gene_array.append(g)
      phen_array.append(p)

  gene_array=map(float,gene_array)
  phen_array=map(float,phen_array)

I am fairly certain at this point that it is the negative sign that is causing the problem, but it is not clear to me why. Is my use of Linecache the problem here? Is there an alternative method that would be better?

The result of

print gene_array

is

['-0.0448022516321286', '-0.0236187263814157', '-0.150505384829925', '-0.00338459268479522', '0.0142429109897682', '0.0286253352284279', '-0.0462358095345649', '0.0286232317578776', '-0.00747425206137217', '0.0231790239373428', '-0.00266935581919541', '0.00825077426011094', '0.0272744527203547', '0.0394829854063242', '0.0233109171715023', '0.165841084392078', '0.00259693465334536', '-0.0342590874424289', '0.0124600520095644', '0.0713627590092807', '-0.0189374898081401', '-0.00112750710611284', '-0.0161387333242288', '0.0227226505624106', '0.0382173405035751', '0.0455518646388402', '-0.0453048799717046', '0.0168570746329513']

回答1:

The issue seems to be with empty string or space as evident from your error message

ValueError: could not convert string to float:

To make it work, convert the map to a list comprehension

gene_array=[float(e) for e in gene_array if e]
phen_array=[float(e) for e in phen_array if e]

By empty string means

float(" ") or float("") would give value errors, so if any of the items within gene_array or phen_array has space, this will throw an error while converting to float

There could be many reasons for empty string like

  • empty or blank line
  • blank line either at the beginning or end


回答2:

The issue is definitely not in the negative sign. Python converts strings with negative sign without a problem. I suggest you run each of your entries against a float RegEx and see if they all pass.



回答3:

There is nothing in the error message to suggest that - is the problem. The most likely reason is that gene_array and/or phen_array contain an empty string ('').

As stated in the documentation, linecache.getline()

will return '' on errors (the terminating newline character will be included for lines that are found).