I usually use Dataframe.merge to combine dataframes in pandas. From my understanding, this only works on equality joins. What is the idiomatic way to join two dataframes using other types of checks (e.g. inequality)?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- SQL join to get the cartesian product of 2 columns
- How to get the background from multiple images by
- Evil ctypes hack in python
merge() is fairly limited. You can accomplish more complex joins using pandasql.sqldf. You can write pretty much any sql query and refer to your existing dataframes as table names in the sql statements.
https://github.com/yhat/pandasql/ A known bug is the inability to select multiple tables in product joins, such as
However, if you can do joins without any problem, and a statement like I have above can be translated into a join.
Pandas merge() allows for
outer
,left
,right
joins (not justinner
joins) between two data frames, so you can return unmatched records. Additionally,merge()
can even be generalized to return a cross join (all combination matches between two data frames) and with filtering afterwards you can return unmatched records. Still more, there is the isin() pandas method.Consider the following demonstration. Below are two data frames of something we come to enjoy, computer languages. As seen, the first data frame is a subset of second data frame. An outer join returns records in both with
NaN
for unmatched columns which can be later filtered out. A cross join returns full complete rows which can be filtered andisin()
searches values within columns:Admittedly, the cross join may be redundant and verbose here but should your unmatched needs require permutations across data frames, it can be handy.