Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (2024)

We can merge two data frames in R by using the merge() function or by using family of join() function in dplyr package. The data frames must have same column names on which the merging happens. Merge() Function in R is similar to database join operation in SQL. The different arguments to merge() allow you to perform natural joins i.e. inner join, left join, right join,cross join, semi join, anti join and full outer join. We can perform Join in R using merge() Function or by using family of join() functions in dplyr package.

We will have look at an example of

  • Inner join using merge() function in R or inner_join() function of dplyr with example
  • Outer join using merge() function or full_join() function of dplyr with example
  • Left join using left_join() function of dplyr or merge() function
  • Right join using right_join() function of dplyr or merge() function.
  • Cross join with merge() function
  • semi join and anti join in R using semi_join() function and anti_join() function.

Syntax of merge() function in R

merge(x, y, by.x, by.y,all.x,all.y, sort = TRUE)

  • x:data frame1.
  • y:data frame2.
  • by,x,by.y:The names of the columns that are common to bothxandy. The default is to use the columns with common names between the two data frames.
  • all, all.x, all.y:Logical values that specify the type of merge. The default value isall=FALSE(meaning that only the matching rows are returned).

UNDERSTANDING THE DIFFERENT TYPES OF MERGE IN R:

  • Natural join or Inner Join:To keep only rows that match from the data frames, specify the argumentall=FALSE.
  • Full outer join or Outer Join:To keep all rows from both data frames, specifyall=TRUE.
  • Left outer join or Left Join:To include all the rows of your data framexand only those fromythat match, specifyx=TRUE.
  • Right outer join or Right Join:To include all the rows of your data frameyand only those fromxthat match, specifyy=TRUE.

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (1)

Lets look at with some examples

# data frame 1df1 = data.frame(CustomerId = c(1:6), Product = c("Oven","Television","Mobile","WashingMachine","Lightings","Ipad"))df1 # data frame 2df2 = data.frame(CustomerId = c(2, 4, 6, 7, 8), State = c("California","Newyork","Santiago","Texas","Indiana")) df2 

so we will get following two data frames

df1 will be
Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (2)

df2 will be
Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (3)

INNER JOIN Explained

Inner Join in R is the simplest and most common type of join. It is also known as simple join or Natural Join. Inner join returns the rows when matching condition is met.

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (4)

Inner join in R using merge() function: merge() function takes df1 and df2 as argument. merge() function by default performs inner join there by return only the rows in which the left table have matching keys in the right table.

#### Left Join using merge functiondf = merge(x=df1,y=df2,by="CustomerId")df

the resultant inner joined dataframe df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (5)

Inner join in R using inner_join() function of dplyr:

dplyr() package has inner_join() function which performs inner join of two dataframes by “CustomerId” as shown below.

#### Left Join using inner_join function library(dplyr)df= df1 %>% inner_join(df2,by="CustomerId")df

the resultant inner joined dataframe df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (6)

OUTER JOIN Explained

Outer Join in R combines the results of both left and right outer joins. The joined table will contain all records from both the tables

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (7)

Outer join in R using merge() function: merge() function takes df1 and df2 as argument along with all=TRUE there by returns all rows from both tables, join records from the left which have matching keys in the right table.

###### outer join in R using merge() functiondf = merge(x=df1,y=df2,by="CustomerId",all=TRUE)df

the resultant data frame df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (8)

outer join in R using full_join() function of dplyr:

dplyr() package has full_join() function which performs outer join of two dataframes by “CustomerId” as shown below.

###### outer join in R using outer_join() function library(dplyr)df= df1 %>% full_join(df2,by="CustomerId")df

the resultant outer joined dataframe df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (9)

LEFT JOIN Explained:

TheLEFT JOIN in R returns all records from theleftdataframe (A), and the matched records from the right dataframe (B)

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (10)

Left join in R: merge() function takes df1 and df2 as argument along with all.x=TRUE there by returns all rows from the left table, and any rows with matching keys from the right table.

###### left join in R using merge() function df = merge(x=df1,y=df2,by="CustomerId",all.x=TRUE)df

the resultant data frame df will be
Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (11)

Left join in R using left_join() function of dplyr:

dplyr() package has left_join() function which performs left join of two dataframes by “CustomerId” as shown below.

###### left join in R using left_join() function library(dplyr)df= df1 %>% left_join(df2,by="CustomerId")df

the resultant Left joined dataframe df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (12)

RIGHT JOIN Explained:

TheRIGHT JOIN in R returns all records from therightdataframe (B), and the matched records from the left dataframe (A)

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (13)

Right join in R: merge() function takes df1 and df2 as argument along with all.y=TRUE and thereby returns all rows from the right table, and any rows with matching keys from the left table.

###### right join in R using merge() function df = merge(x=df1,y=df2,by="CustomerId",all.y=TRUE)df

the resultant data frame df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (14)

Right join in R using right_join() function of dplyr:

dplyr() package has right_join() function which performs outer join of two dataframes by “CustomerId” as shown below.

###### right join in R using merge() function library(dplyr)df= df1 %>% right_join(df2,by="CustomerId")df

the resultant right joined dataframe df will be
Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (15)

Cross join in R: A Cross Join (also sometimes known as a Cartesian Join) results in every row of one table being joined to every row of another table

##### cross join in Rdf = merge(x = df1, y = df2, by = NULL)df

the resultant data frame df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (16)

SEMI JOIN in R using dplyr:

This is like inner join, with only the left dataframe columns and values are selected

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (17)

#### Semi join in Rlibrary(dplyr)df= df1 %>% semi_join(df2,by="CustomerId")df

the resultant data frame df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (18)

ANTI JOIN in R using dplyr:

This join is like df1-df2, as it selects all rows from df1 that are not present in df2.

#### anti join in Rlibrary(dplyr)df= df1 %>% anti_join(df2,by="CustomerId")df

the resultant data frame df will be

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (19)

For further understanding of join() function in R using dplyr one can refer the dplyr documentation

Related Topics:

  • Cumulative sum of a column in R
  • Stratified Random Sampling in R – Dataframe
  • Simple Random Sampling in R – Dataframe , vector
  • Strip Leading, Trailing spaces of column in R (remove Space)
  • Concatenate two columns of dataframe in R
  • Get String length of the column in R dataframe
  • Delete or Drop rows in R with conditions
  • Exponential of the column in R
  • Get Sign of a column in R
  • Type cast to date in R – Text to Date in R , Factor to date in R
  • Get day of the week from date in R
  • Get year from date in R
  • Get month from date in R
  • Get month year from date in R

  • Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (22)

    Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts

'); instant= new adsenseLoader( '#quads-ad2-place', { onLoad: function( ad ){ if (ad.classList.contains("quads-ll")) { ad.classList.remove("quads-ll"); } } }); }if ( quads_screen_width >= 1024 && quads_screen_width < 1140 ) {__ez.scxr.getDW(document).write('

'); instant= new adsenseLoader( '#quads-ad2-place', { onLoad: function( ad ){ if (ad.classList.contains("quads-ll")) { ad.classList.remove("quads-ll"); } } }); }if ( quads_screen_width >= 768 && quads_screen_width < 1024 ) {__ez.scxr.getDW(document).write('

Join in R: How to join (merge) data frames (inner, outer, left, right) in R - DataScience Made Simple (2024)

References

Top Articles
Latest Posts
Article information

Author: Kieth Sipes

Last Updated:

Views: 6203

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.