How to do intersection match between 2 DataFrames in Pandas?

Home / Uncategorized / How to do intersection match between 2 DataFrames in Pandas?

Question:
Assume exists 2 DataFrames A and B like following

A:a A
b B
c C

B:1 2
3 4

How to produce C DataFrame like
a A 1 2
a A 3 4
b B 1 2
b B 3 4
c C 1 2
c C 3 4

Is there some function in Pandas can do this operation?


Answer:
First all values has to be unique in each DataFrame.

I think you need product:from itertools import product

A = pd.DataFrame({‘a’:list(‘abc’)})
B = pd.DataFrame({‘a’:[1,2]})

C = pd.DataFrame(list(product(A[‘a’], B[‘a’])))
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

Pandas pure solutions with MultiIndex.from_product:mux = pd.MultiIndex.from_product([A[‘a’], B[‘a’]])

C = pd.DataFrame(mux.values.tolist())
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
C = mux.to_frame().reset_index(drop=True)
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

Solution with cross join with merge and column filled by same scalars by assign:df = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on=’tmp’).drop(‘tmp’, 1)
df.columns = [‘a’,’b’]
print (df)
a b
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

EDIT:A = pd.DataFrame({‘a’:list(‘abc’), ‘b’:list(‘ABC’)})
B = pd.DataFrame({‘a’:[1,3], ‘c’:[2,4]})

print (A)
a b
0 a A
1 b B
2 c C

print (B)
a c
0 1 2
1 3 4

C = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on=’tmp’).drop(‘tmp’, 1)
C.columns = list(‘abcd’)
print (C)
a b c d
0 a A 1 2
1 a A 3 4
2 b B 1 2
3 b B 3 4
4 c C 1 2
5 c C 3 4
Read more

Leave a Reply

Your email address will not be published. Required fields are marked *