Merge multiple Excel files into one sheet in R

  • Combining one Excel Document info one data.frame
  • Combining multiple Excel Documents into one data.frame
  • Combining multiple Excel Documents in a directory into one data.frame
  • Combining multiple Excel Documents in a directory and sub-directories into one data.frame

Combining one Excel Document info one data.frame

Combining multiple Excel Documents into one data.frame

Combining multiple Excel Documents in a directory into one data.frame

directory = "dataSheetsDir"
allFiles = list.files(directory)
print(allFiles)

#create function to check file extentions 
endsWith <- function(var, match) {
  return (substr(var, pmax(1, nchar(var) - nchar(match) + 1), nchar(var)) == match)
}

possibleExcelSheetFile<-function(fileName){
  return (endsWith(fileName, ".xls") | endsWith(fileName, ".xlsx"))
}

#get all files that end with tab.txt
data = data.frame()
for(file in allFiles){
  if(possibleExcelSheetFile(file)){
    #create empty data.frame that will hold data for this excel file
    fileData = data.frame()
    #get the sheet names for this file
    sheetNames = readxl::excel_sheets(file)
    #iterate over the sheet names, adding to the fileData frame as each sheet is read in
    for(sheet in sheetNames){
      sheetData = readxl::read_excel()
      #add the sheet name as a column to the data.frame
      sheetData["sheet"] = sheet
      #if this is the first sheet, and therefore fileData is empty just replace the empty data.frame with the sheet data.frame
      #otherwise add the rows of this sheet data.frame to file data.frame
      if(0 == nrow(fileData)){
        fileData = sheetData
      }else{
        dplyr::bind_rows(fileData, sheetData)
      }
    }
    #add a column to hold the filename from which the data came from
    fileData["filename"] = file
    #if the master data data.frame for all files is empty, just replace with the current file data.frame
    #otherwise just add the current file's data to the already growing data.frame
    if(0 == nrow(data)){
      data = fileData
    }else{
      dplyr::bind_rows(fileData, sheetData)
    }
  }
}

Combining multiple Excel Documents in a directory and sub-directories into one data.frame

Niharika Goel

Niharika Goel

Senior Software Engineer | Ex Accenture, IBM

Published Jun 12, 2018

Consider a case when you have multiple CSV or Excel sheets in a folder and you have to merge them into one single file. Different files can have data of different years, eg. sales of retail store for the year 2016, 2017 and 2018.

You can do the merging task using R very easily.

library(openxlsx)

path <- "sample-data/merge-files/xlsx"
merge_file_name <- "sample-data/merge-files//merged_file.xlsx"

filenames_list <- list.files(path= path, full.names=TRUE)

All <- lapply(filenames_list,function(filename){
    print(paste("Merging",filename,sep = " "))
    read.xlsx(filename)
})

df <- do.call(rbind.data.frame, All)
write.xlsx(df,merge_file_name)

You can find the sample data files and R code for both Excel & CSV format on Github: https://github.com/NiharikaGoel12/R-Playground.

R can be used to do many powerful and complex data analysis like filtering, data cleaning, aggregation & grouping based on various parameters. Microsoft Excel can also perform these functions, but it becomes slow when dealing with large dataset.

Cross-post from https://medium.com/@niharika.goel/merge-multiple-csv-excel-files-in-a-folder-using-r-e385d962a90a

Explore topics

If you ask people who work with data, you will get to know that combining Excel files or merging workbooks is a part of their daily work.

Agree?

A simple an example: Let’s say you want to create a sales report and you have data of four different zones in four different files.

Now:

The very first thing you need to do is to combine those files in one single workbook and only then you can create your report further.

The point is: You have to have a method which you can use for merging these files. Say “YES” in the comment section if you want to know the best method for this.

Today in this post, I’m going to share with you the best way to merge data from multiple Excel files into a SINGLE workbook.

But, here's the kicker.

This post will teach you something you need to learn to use in the real world data problem so make sure to read the entire post.

The Best Possible Way for Combining Excel Files by Merging data into ONE Workbook - POWER QUERY

Power Query is the best way to merge or combine data from multiple Excel files in a single file. You need to store all the files in a single folder and then use that folder to load data from those files into the power query editor. It also allows you to transform that data along with combining.

It works something like this:

  1. Saving All the Files into a Single Folder
  2. Combining them using Power Quer
  3. Merging Data into a Single Table

Make sure to download these sample file from here to follow along and check out this tutorial to learn power query.

Note: For combining data from different Excel files, your data should be structured in the same way. That means the number of columns and their order should be the same.

To merge files, you can use the following steps:

  1. First of all, extract all the files from the sample folder and save that folder at the desktop (or wherever you want to save it).
    Merge multiple Excel files into one sheet in R
  2. Now, the next thing is to open a new Excel workbook and open “POWER Query”.
  3. For this, go to Data Tab ➜ Get & Transform Data ➜ Get Data ➜ From File ➜ From Folder.
    Merge multiple Excel files into one sheet in R
  4. Here you need to locate the folder where you have files.
    Merge multiple Excel files into one sheet in R
  5. In the end, click OK, and once you click OK, you’ll get a window listing all the file from the folder, just like below.
    Merge multiple Excel files into one sheet in R
  6. Now, you need to combine data from these files and for this click on “Combine & Edit”.
    Merge multiple Excel files into one sheet in R
  7. From here, the next thing is to select the table in which you have data in all the workbooks and yes, you’ll get a preview of this at the side of the window.
    Merge multiple Excel files into one sheet in R
  8. Once you select the table, click OK. At this point, you have merged data from all the files into your power query editor and, if you look closely you can see a new column with the name of the workbooks from which data is extracted.
    Merge multiple Excel files into one sheet in R
  9. So, right-click on the column header and select “Replace Values”.
    Merge multiple Excel files into one sheet in R
  10. Here in the “Value to Replace” enter the text “.xlsx” and leave “Replace With” blank (here idea is to remove the file extension from the name of the workbook).
    Merge multiple Excel files into one sheet in R
  11. After that, double click on the header and select “Rename” to enter a name for the column i.e. Zone
    Merge multiple Excel files into one sheet in R
    .
  12. At this point, your merged data is ready and all you need is to load it into your new workbook. So, go to the Home Tab and click on the “Close & Load”.
    Merge multiple Excel files into one sheet in R

Now you have your combined data (from all the workbooks) into a single workbook.

Merge multiple Excel files into one sheet in R

This is the moment of JOY, write “Joy” in the comment section if you love to use “Power Query for combining data from multiple files”.

Important Point

In the above steps, we have used the table name to combine data from all the files and add all of it into a single workbook. But not all time you will have the same table name in all the Excel files and at that point, you can use the worksheet name as a key to summarizing all that data.

One more thing:

As I said, you can use a worksheet name to combine data with the power query but there are few more things which I want to share with you and you need to take care of those. Power Query is case sensitive, so when combining files make sure to have the name of worksheets in all the workbooks in the same letters.

The next thing is, to have the same name for the column headers, but here the kicker: The order of the columns doesn’t matter. If column1 in the north.xlsx is column2 in the west.xlsx, Power Query will match it, but you have to have column names the same.

So now, while combining files using power query you can use the worksheet name instead of the table name, and here you have "SalesData" as the worksheet name in all the files.

You select it and click on the "Combine & Edit" and follow all the steps which I have mentioned in the above method.

Why Power Query is the Best Way to Merge Data into a Single File?

Merge Data from Multiple Workbooks When you don’t have the Same Name for Worksheets and data in Tables

This is the hard truth…

…that in some situations, you won’t have the same name for worksheets and not all the data in tables all the time.

Now, what you should be doing in that case?

Well…

…in this case, you must know how you can combine data from all the files and I don’t want to miss to share with this thing with you.

...so without any further ado, let's get started.

  • First of all, open the “From Folder” dialog box to locate the folder where you have all the files.
  • Now in this dialog box, locate the folder and click OK.

Merge multiple Excel files into one sheet in R

  • After that, click on the “Edit” to edit the table.

Merge multiple Excel files into one sheet in R

  • At this point, you will have a table like below in your power query editor.

Merge multiple Excel files into one sheet in R

  • Next, select the first two columns of the table and click on the “Remove Other Columns” from the right-click menu.

Merge multiple Excel files into one sheet in R

From here, we need to add a custom column to fetch data from the worksheets of the workbooks.

  • For this, go to Add Column Tab and click on the “Custom Column” button. This will open the “Custom Column” dialog box.

Merge multiple Excel files into one sheet in R

  • In the dialog box, enter =Excel.Workbook([Content]) and click OK.

Merge multiple Excel files into one sheet in R

…at this time you have a new column in the table but next, you need to extract data from it.

  • Now, open the filter from that newly added custom column and click OK to expand all the data into the table.

Merge multiple Excel files into one sheet in R

  • Here you have the newly expanded table with some new columns.
  • Now from this new table, delete all the columns except third and fourth.

Merge multiple Excel files into one sheet in R

  • So, open the filter for the column “Custom.Data” to expand it and click OK.

Merge multiple Excel files into one sheet in R

The moment you click OK, you’ll get all the data from all the files into a single table…

you need to make some changes into it to make it PERFECT.

If you notice, all the heading of the column are into data itself...

Merge multiple Excel files into one sheet in R

...so you need to add the column headings.

  • To do this, you need to double click on the header and add a name, or you can right click and select rename it.

Merge multiple Excel files into one sheet in R

The next you need to exclude the headings which you have in the data table.

  • Now open any column’s filter option and unselect the heading name which you have in the column data and click OK after that.

Merge multiple Excel files into one sheet in R

Now our data is ready to load into the worksheet, so, go to the Home Tab and click on the close and load.

Merge multiple Excel files into one sheet in R

Congratulations! you have just combined data from the different workbooks (with different worksheets name and without any table).

This is also important:

At this point, you have merged the data into one table.

But there’s one thing you need to do…

…and that’s applying some formatting to it and making sure that it won’t go away when you update your data.

Here’s what you need to do…

  • First of all, select the column where you have dates (as it is formatted as number right now) and format it as dates.
  • After that, make all the columns wide as per the data you have in them.
  • Here you can also format amount and price as “Currency”. 

Merge multiple Excel files into one sheet in R

But the next thing is to make this formatting fix.

  • For this, go to “Design Tab”, and open properties.
  • Untick “Adjust Column” width and tick mark “Preserve Cell Formatting”.
  • Yes, that’s it.

Merge multiple Excel files into one sheet in R

Now you have a query in your workbook which can combine data from multiple files...

...and merge it into a single workbook...

...even if the worksheet name is not the same or if you don’t have tables.

And yes, you have also made the formatting fix. ?

In the end,

As I said, POWER QUERY is real and if you frequently use to combine data from multiple files then you must use this method…

…as it’s a ONE-TIME setup.

The most important thing is you when you use power query you can even clean the data from those files as well.

I hope this tutorial will help you to Get Better at Excel. But now, you need to tell me one thing.

Which method do you use to combine data from multiple files?

Make sure to share your views with me in the comment section, I'd love to hear from you. And please, don’t forget to share this post with your friends, I am sure they will appreciate it.

You must Read these Next

  • Consolidate Data From Multiple Worksheets: This option can help you to combine data from multiple worksheets into a single one...
  • Unpivot Data using Power Query: In this situation, you need to put some efforts and spend your precious time to make it re-usable...
  • Create a Pivot from Multiple Files: In this post, I’d like to show you a 3 steps process to create a pivot table by using data from multiple...

About the Author

Merge multiple Excel files into one sheet in R

Puneet is using Excel since his college days. He helped thousands of people to understand the power of the spreadsheets and learn Microsoft Excel. You can find him online, tweeting about Excel, on a running track, or sometimes hiking up a mountain.

How do I merge Excel sheets into one in R?

The code for doing this step is:.
library(openxlsx) # Read workbook. wb<- loadWorkbook("Three Sheets.xlsx") ... .
# Read each sheet as a list. wb$sheet_names |> lapply(function(x) read. ... .
# Read each sheet as a list. wb$sheet_names |> lapply(function(x) read. ... .
library(openxlsx) # Read workbook. wb<- loadWorkbook("Three Sheets.xlsx").

How do you combine multiple Excel files into sheets?

Open the Excel file where you want to merge sheets from other workbooks and do the following:.
Press Alt + F8 to open the Macro dialog..
Under Macro name, select MergeExcelFiles and click Run..
The standard explorer window will open, you select one or more workbooks you want to combine, and click Open..

How do I merge multiple Excel files in one workbook but in different sheets?

In the Combine Worksheets wizard, select Combine multiple worksheets from workbooks into one workbook option, and then click the Next button. See screenshot: 3. In the Combine Worksheets - Step 2 of 3 dialog box, click the Add > File or Folder to add the Excel files you will merge into one.

How do I combine multiple Excel files quickly?

Click the Tools menu and then select Merge Workbooks…. If prompted, save the workbook. In the file navigation dialog box, click the copy of the workbook that contains the changes you want to merge, then click OK.