How to merge all contents in two csv files where r

2019-07-04 21:31发布

问题:

I have two csv files. They both have SamAccountName in common. User records may or may not have a match found for every record between both files (THIS IS VERY IMPORTANT TO NOTE).

I am trying to basically just merge all columns (and their values) into one file (based from the SamAccountNames found in the first file...).

If the SamAccountName is not found in the 2nd file, it should add all null values for that user record in the merged file (since the record was found in the first file).

If the SamAccountName is found in the 2nd file, but not in the first, it should ignore merging that record.

Number of columns in each file may vary (5, 10, 2, so forth...).

Function MergeTwoCsvFiles
{
    Param ([String]$baseFile, [String]$fileToBeMerged, [String]$columnTitleLineInFileToBeMerged)

    $baseFileCsvContents = Import-Csv $baseFile
    $fileToBeMergedCsvContents = Import-Csv $fileToBeMerged

    $baseFileContents = Get-Content $baseFile

    $baseFileContents[0] += "," + $columnTitleLineInFileToBeMerged

    $baseFileCsvContents | ForEach-Object {
        $matchFound = $False
        $baseSameAccountName = $_.SamAccountName
        [String]$mergedLineInFile = $_

        [String]$lineMatchFound = $fileToBeMergedCsvContents | Where-Object {$_.SamAccountName -eq $baseSameAccountName}
        Write-Host '$mergedLineInFile =' $mergedLineInFile
        Write-Host '$lineMatchFound =' $lineMatchFound
        Exit
    }
}

The problem is, the record in the file is being written as a hash table instead of a string like line (if you were to view it as .txt). So I'm not really sure how to do this...

Adding results csv example files...

First CSV File

"SamAccountName","sn","GivenName"
"PBrain","Pinky","Brain"
"JSteward","John","Steward"
"JDoe","John","Doe"
"SDoo","Scooby","Doo"

Second CSV File

"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori@mystuff.com"
"JSteward","43518790","512","JSteward@mystuff.com"
"JKibogabi","24356","546","JKibogabi@mystuff.com"
"JDoe","902187u4","1114624","JDoe@mystuff.com"
"CStrife","54627","512","CStrife@mystuff.com"

Expected Merged CSV File

"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward@mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe@mystuff.com"
"SDoo","Scooby","Doo","","",""

Note: This will be part of a loop process in merging multiple files, so I would like to avoid hardcoding the title names (with $_.SamAccountName as an exception)

Trying suggestion from "restless 1987" (Not Working)

$baseFileCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMergedCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\lookup.csv'
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()

$baseFileContents = Get-Content 'D:\Scripts\Powershell\Tests\base.csv'

$recordsMatched = compare-object $baseFileCsvContents $fileToBeMergedCsvContents -Property SamAccountName

switch ($recordsMatched)
{
    '<=' {}
    '=>' {}
    '==' {$resultsFileContents += $_}
}

$resultsFileCsv = $resultsFileContents | ConvertTo-Csv
$resultsFileCsv | Export-Csv $resultsFile -NoTypeInformation -Force

Output gives a blank file :(

回答1:

The code below outputs the desired results based on the inputs you provided.

function CombineSkip1($s1, $s2){
    $s3 = $s1 -split ',' 
    $s2 -split ',' | select -Skip 1 | % {$s3 += $_}
    $s4 = $s3 -join ', '

    $s4
}

Write-Output "------Combine files------"

# content
$c1 = Get-Content D:\junk\test1.csv
$c2 = Get-Content D:\junk\test2.csv

# users in both files, could be a better way to do this
$t1 = $c1 | ConvertFrom-Csv
$t2 = $c2 | ConvertFrom-Csv
$users = $t1 | Select SamAccountName

# generate final, combined output
$combined = @()
$combined += CombineSkip1 $c1[0] $c2[0]

$c2PropCount = ($c2[0] -split ',').Count - 1
$filler = (', ""' * $c2PropCount)

for ($i = 1; $i -lt $c1.Count; $i++){
    $user = $c1[$i].Split(',')[0]
    $u2 = $c2 | where {([string]$_).StartsWith($user)}
    if ($u2)
    {
        $combined += CombineSkip1 $c1[$i] $u2
    }
    else
    {
        $combined += ($c1[$i] + $filler)
    }
}

# write to output and file
Write-Output $combined
$combined | Set-Content -Path D:\junk\test3.csv -Force


回答2:

You can use compare-object for that purpose. Use -property samaccountname with it. For example:

$a = 1,2,3,4,5
$b = 4,5,6,7
$side = compare-object $a $b
switch ($side){
'<=' {is not in $a}
'=>' {is not in $b}
'==' { is on both sides}
}

When you have all the data in your output-variable, trow it at convertto-csv and write it in a file



回答3:

After an entire day, I finally came up with something that works...

...

Edit

Reason: breaking the inner loop and removing the found element from the array will be much faster when merging files with thousands of records...

Function GetTitlesFromFileToBeMerged
{
    Param ($csvFile)

    [String]$fileToBeMergedTitles = Get-Content $fileToBeMerged -TotalCount 1

    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`",`"", "|").Trim()
    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`"", "").Trim()
    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "SamAccountName", "").Trim()

    [String[]]$listOfColumnTitles = $fileToBeMergedTitles.Split('|',[System.StringSplitOptions]::RemoveEmptyEntries)

    Write-Output $listOfColumnTitles
}

$baseFile = 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMerged = 'D:\Scripts\Powershell\Tests\lookup.csv'
$baseFileCsvContents = Import-Csv $baseFile
$baseFileContents = Get-Content $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
[System.Collections.Generic.List[System.Object]]$fileToBeMergedContents = Get-Content $fileToBeMerged
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()

[String]$baseFileTitles = $baseFileContents[0]
[String]$fileToBeMergedTitles = (Get-Content $fileToBeMerged -TotalCount 1) -replace "`"SamAccountName`",", ""
$resultsFileContents += $baseFileTitles + "," + $fileToBeMergedTitles

[String]$lineMatchNotFound = ""
$arrayFileToBeMergedTitles = GetTitlesFromFileToBeMerged $fileToBeMerged
For ($valueNum = 0; $valueNum -lt $arrayFileToBeMergedTitles.Length; $valueNum++)
{
    $lineMatchNotFound += ",`"`""
}

$baseLineCounter = 1
$baseFileCsvContents | ForEach-Object {
    $baseSameAccountName = $_.SamAccountName
    [String]$baseLineInFile = $baseFileContents[$baseLineCounter]

    $lineMatchCounter = 1
    $lineMatchFound = ""
    :inner
    ForEach ($line in $fileToBeMergedContents) {
        If ($line -like "*$baseSameAccountName*") {
            [String]$lineMatchFound = "," + ($line -replace '^"[^"]*",', "")
            $fileToBeMergedContents.RemoveAt($lineMatchCounter)
            break inner
        }; $lineMatchCounter++
    }

    If (!($lineMatchFound))
    {
        [String]$lineMatchFound = $lineMatchNotFound
    }

    $mergedLine = $baseLineInFile + $lineMatchFound
    $resultsFileContents += $mergedLine
    $baseLineCounter++
}

ForEach ($line in $resultsFileContents)
{
    Write-Host $line
}

$resultsFileContents | Set-Content $resultsFile -Force

I'm very sure this is not the best approach and there is something better that would handle this much faster. If anyone has any ideas, I'm open to them. Thanks.