I have a small data set to clean. I have opened the text file in Pycharm. The data set is like this:
Code-6667+
Name of xyz company+
Address +
Number+
Contact person+
Code-6668+
Name of abc company, Address, number, contact person+
Code-6669+
name of company, Address+
number, contact person +
I need to separate the code lines and concatenate (or paste) the rest of the lines together till the next code line comes. This way I could separate my data into 2 fields, namely, the code of the company and secondly all the details all in one field. The eventual output being a table. The output should be something like this :
Code6667 - Company details
Code6668 - Company details
Is there a way I could use a loop to do this? Tried this in R programming but now attempting it in Python.
I don't know what these
+
mean in your example.. if they are part of the file you'll want to deal with them as well but here is a way to extract the data (with regex) in adictionary
with the code as key and the info as a list.. afterwards you can format it however you wantThis is assuming your entries, when on the same line are separated by
,
, but it can be adapted for anything else. Also this is based on the fact that in your example every code is on a new line, and has no info after it.result:
(Note: I'm note quite sure whether you want to keep the
+
sign. The following codes assume you do. Otherwise it's easy to get rid of the+
with a bit of string manipulations).Input file
Here is the input file...
dat1.txt
:Code
Here is the code... (comment / uncomment the
print
block for Python 2.x/3.x version)mycode.py
:Output
Here is what you will see in console (via
print()
) ordat2.txt
(viaf2.write()
)...Screenshot
Your question wasn't really clear, following a snippet to print out a line for each company, starting with "CodeXXXX - " and following with the other details.
Output of your example code: