I have a bunch of excel files that are formatted in a pretty weird way that I would like to make an automated import script for, so I can easily get this data into a sheet with the correct formatting.
It has hourly values for each month in the first 12 columns, then the date along with hours in the next 12 columns.
What I would like to do is be able to get this data in to a table where the first column has the date and hour (in excel format) and the second one containing the data. My thought was to record a macro during the process of adjusting the data with power query and then repeating the macro with multiple files. However, I can't seem to find a good way to make "Column 2"'s data moved to the end of "Column 1" and repeating that for both values and the dates using Power Query. Any pointers?
Also notice, column 1s length differs from column 2 since January has more values than February. However, column 1s lenght is the same as column 13, column 2s length same as 14 and so on.
I have uploaded a sample file here
Create a blank query from scratch (on my machine I do this in Excel via: Data > Get Data > From Other Sources > Blank Query
).
Click Home > Advanced Editor
, copy-paste the code below and change this line folderPath = "C:\Users\user\",
to the path of the parent folder that contains the Excel files. Then click Close & Load
.
As the data is being imported from multiple workbooks (and possibly multiple sheets), the first two columns of the loaded table should be the workbook and worksheet that that row of data came from. (If you want to get rid of the first two columns, edit the query.)
let
folderPath = "C:\Users\user\",
getDataFromSheet = (sheetData as table) =>
let
promoteHeaders = Table.PromoteHeaders(sheetData, [PromoteAllScalars=true]),
standardiseHeaders =
let
headers = Table.ColumnNames(promoteHeaders),
zipWithLowercase = List.Zip({headers, List.Transform(headers, Text.Lower)}),
renameAsLowercase = Table.RenameColumns(promoteHeaders, zipWithLowercase)
in
renameAsLowercase,
emptyTable = Table.FromColumns({{},{}}, {"Date", "Value"}),
monthsToLoopOver = {"januari", "februari", "mars", "april", "maj", "juni", "juli", "augusti", "september", "oktober", "november", "december"},
appendEachMonth = List.Accumulate(monthsToLoopOver, emptyTable, (tableState, currentMonth) =>
let
selectColumns = Table.SelectColumns(standardiseHeaders, {currentMonth & "tim", currentMonth}, MissingField.UseNull),
renameColumns = Table.RenameColumns(selectColumns, {{currentMonth & "tim", "Date"}, {currentMonth, "Value"}}),
appendToTable = Table.Combine({tableState, renameColumns})
in
appendToTable
),
tableOrNull = if List.Contains(Table.ColumnNames(standardiseHeaders), "januari") then appendEachMonth else null
in
tableOrNull,
getDataFromWorkbook = (filePath as text) =>
let
workbookContents = Excel.Workbook(File.Contents(filePath)),
sheetsOnly = Table.SelectRows(workbookContents, each [Kind] = "Sheet"),
invokeFunction = Table.AddColumn(sheetsOnly, "f", each getDataFromSheet([Data]), type table),
appendAndExpand =
let
selectColumnsAndRows = Table.SelectColumns(Table.SelectRows(invokeFunction, each not ([f] is null)), {"Name", "f"}),
renameColumns = Table.RenameColumns(selectColumnsAndRows, {{"Name", "Sheet"}}),
expandColumn = Table.ExpandTableColumn(renameColumns, "f", {"Date", "Value"})
in
expandColumn
in
appendAndExpand,
filesInFolder = Folder.Files(folderPath),
validFilesOnly = Table.SelectRows(filesInFolder, each [Extension] = ".xlsx"),
invokeFunction = Table.AddColumn(validFilesOnly, "f", each getDataFromWorkbook([Folder Path] & [Name])),
appendAndExpand =
let
selectRowsAndColumns = Table.SelectColumns(Table.SelectRows(invokeFunction, each not ([f] is null)), {"Name", "f"}),
renameColumns = Table.RenameColumns(selectRowsAndColumns, {{"Name", "Workbook"}}),
expandColumn = Table.ExpandTableColumn(renameColumns, "f", {"Sheet", "Date", "Value"})
in
expandColumn,
excludeBlankDates = Table.SelectRows(appendAndExpand, each not (Text.StartsWith([Date], " "))),
transformTypes =
let
dateAndHour = Table.TransformColumns(excludeBlankDates, {{"Date", each Text.Split(_, " ")}}),
changeTypes = Table.TransformColumns(dateAndHour, {{"Workbook", Text.From, type text}, {"Sheet", Text.From, type text}, {"Date", each DateTime.From(_{0}) + #duration(0, Number.From(_{1}), 0, 0), type datetime}, {"Value", Number.From, type number}})
in
changeTypes
in
transformTypes
For reliability and robustness, it would be good if you create a folder and put all Excel files (that need restructuring) into that folder -- and ensure nothing else goes into that folder (not even the file that will be doing the importing/restructuring).
If you can't do this for whatever reason, then click the validFilesOnly
step whilst in the Query Editor and amend the filter criteria, such that the table only includes files you want restructured.
You can do the formatting pretty quickly by writing your own macro in VBA. Then you can create another macro to run the formatting macro on multiple files within a folder.
Run same excel macro on multiple excel files
Here is an example of a macro that will reformat your data closer to what you are looking for.
Sub FormatBlad()
' create a new sheet and rename it
Sheets.Add After:=ActiveSheet
Sheets(Sheets.Count).Name = "Formatted"
' set helper variables
Dim blad As Worksheet
Dim format As Worksheet
Set blad = Sheets("Blad1")
Set ft = Sheets("Formatted")
Dim blad_row_num As Integer
Dim ft_row_num As Integer
Dim month_offset As Integer
blad_row_num = 2
ft_row_num = 2
month_offset = 13 ' column N - 1
' set column headers in formatted sheet
ft.Range("A1").Value = "Date"
ft.Range("B1").Value = "Value"
' loop through months
For i = 1 To 12
blad_row_num = 2
While blad.Cells(blad_row_num, i).Value <> ""
ft.Cells(ft_row_num, 1).Value = blad.Cells(blad_row_num, month_offset + i).Value
ft.Cells(ft_row_num, 2).Value = blad.Cells(blad_row_num, i).Value
blad_row_num = blad_row_num + 1
ft_row_num = ft_row_num + 1
Wend
Next i
End Sub