I have two dataframes that I read in via csv. Dataframe one consists of a phone number and some additional data. The second dataframe contains country codes and country names.
I want to take the phone number from the first dataset and compare it to the country codes of the second. Country codes can between one to four digits long. I go from the longest country code to the shortest. If there is a match, I want to assign the country name to the phone number.
Input longlist:
phonenumber, add_info
34123425209, info1
92654321762, info2
12018883637, info3
6323450001, info4
496789521134, info5
Input country_list:
country;country_code;order_info
Spain;34;1
Pakistan;92;4
USA;1;2
Philippines;63;3
Germany;49;4
Poland;48;1
Norway;47;2
Output should be:
phonenumber, add_info, country, order_info
34123425209, info1, Spain, 1
92654321762, info2, Pakistan, 4
12018883637, info3, USA, 2
6323450001, info4, Philippines, 3
496789521134, info5, Germany, 4
I have it solved once like this:
#! /usr/bin/python
import csv
import pandas
with open ('longlist.csv','r') as lookuplist:
with open ('country_list.csv','r') as inputlist:
with open('Outputfile.csv', 'w') as outputlist:
reader = csv.reader(lookuplist, delimiter=',')
reader2 = csv.reader(inputlist, delimiter=';')
writer = csv.writer(outputlist, dialect='excel')
for i in reader2:
for xl in reader:
if xl[0].startswith(i[1]):
zeile = [xl[0], xl[1], i[0], i[1], i[2]]
writer.writerow(zeile)
lookuplist.seek(0)
But I would like to solve this problem, using pandas. What I got to work: - Read in the csv files - Remove duplicates from "longlist" - Sort list of countries / country code
This is, what I have working already:
import pandas as pd, numpy as np
longlist = pd.read_csv('path/to/longlist.csv',
usecols=[2,3], names=['PHONENUMBER','ADD_INFO'])
country_list = pd.read_csv('path/to/country_list.csv',
sep=';', names=['COUNTRY','COUNTRY_CODE','ORDER_INFO'], skiprows=[0])
# remove duplicates and make phone number an index
longlist = longlist.drop_duplicates('PHONENUMBER')
longlist = longlist.set_index('PHONENUMBER')
# Sort country list, from high to low value and make country code an index
country_list=country_list.sort_values(by='COUNTRY_CODE', ascending=0)
country_list=country_list.set_index('COUNTRY_CODE')
(...)
longlist.to_csv('path/to/output.csv')
But any way trying the same with datasets does not work. I cannot apply startswith (cannot iterate through objects and cannot apply it on objects). I would really appreciate your help.