I am combined two data-frames that have some common columns, however there are some different columns. I would like to apply Singular Value Decomposition (SVD) on the combined data-frame. However, filling NaN values will affect the results, even filling the data with zeros will be wrong in my case since there are some columns have zero values. Here's an example. Is there any ways to address this issue ?.
>>> df1 = pd.DataFrame(np.random.rand(6, 4), columns=['A', 'B', 'C', 'D'])
>>> df1
A B C D
0 0.763144 0.752176 0.601228 0.290276
1 0.632144 0.202513 0.111766 0.317838
2 0.494587 0.318276 0.951354 0.051253
3 0.184826 0.429469 0.280297 0.014895
4 0.236955 0.560095 0.357246 0.302688
5 0.729145 0.293810 0.525223 0.744513
>>> df2 = pd.DataFrame(np.random.rand(6, 4), columns=['A', 'B', 'C', 'E'])
>>> df2
A B C E
0 0.969758 0.650887 0.821926 0.884600
1 0.657851 0.158992 0.731678 0.841507
2 0.923716 0.524547 0.783581 0.268123
3 0.935014 0.219135 0.152794 0.433324
4 0.327104 0.581433 0.474131 0.521481
5 0.366469 0.709115 0.462106 0.416601
>>> df3 = pd.concat([df1,df2], axis=0)
>>> df3
A B C D E
0 0.763144 0.752176 0.601228 0.290276 NaN
1 0.632144 0.202513 0.111766 0.317838 NaN
2 0.494587 0.318276 0.951354 0.051253 NaN
3 0.184826 0.429469 0.280297 0.014895 NaN
4 0.236955 0.560095 0.357246 0.302688 NaN
5 0.729145 0.293810 0.525223 0.744513 NaN
0 0.969758 0.650887 0.821926 NaN 0.884600
1 0.657851 0.158992 0.731678 NaN 0.841507
2 0.923716 0.524547 0.783581 NaN 0.268123
3 0.935014 0.219135 0.152794 NaN 0.433324
4 0.327104 0.581433 0.474131 NaN 0.521481
5 0.366469 0.709115 0.462106 NaN 0.416601
>>> U, s, V = np.linalg.svd(df3.values, full_matrices=True)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/numpy-1.11.0b3-py3.4-macosx-10.6-intel.egg/numpy/linalg/linalg.py", line 1359, in svd
u, s, vt = gufunc(a, signature=signature, extobj=extobj)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/numpy-1.11.0b3-py3.4-macosx-10.6-intel.egg/numpy/linalg/linalg.py", line 99, in _raise_linalgerror_svd_nonconvergence
raise LinAlgError("SVD did not converge")
numpy.linalg.linalg.LinAlgError: SVD did not converge
Note: I can't apply interpolation because i want to preserve that some records don't have some columns information, but other records have