find_duplicate_lines.bat
Creates a duplicate report and a cleaned UTF-8 file while preserving the order of first occurrences.
Find and cleanCheck whether a TXT file contains repeated lines, create a cleaned copy without duplicates, preserve the original order, avoid UTF-8 encoding problems, or use PowerShell and Command Prompt for manual checks.
The easiest reusable method is the BAT script in Method 1. Drag a TXT file onto it to create both a duplicate report and a cleaned copy without repeated lines. The original file is not modified. For a quick console-only check, use the following PowerShell command:
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name
Replace input.txt with the actual file name. This command only reports duplicates; it does not create a cleaned file. The -Encoding UTF8 option is important in Windows PowerShell 5.1 because a UTF-8 file without a byte order mark may otherwise be decoded as the Windows ANSI code page.
-CaseSensitive option treats Server and server as different lines. Remove this option when letter case should be ignored.
ะ ัะก..., the file was read with the wrong encoding. Keep -Encoding UTF8 for UTF-8 files. Use -Encoding Default for a legacy ANSI file or -Encoding Unicode for UTF-16 little-endian text.
Creates a duplicate report and a cleaned UTF-8 file while preserving the order of first occurrences.
Find and cleanShows every repeated line and the number of occurrences, but can be slow on very large files.
Quick reportPlaces equal lines next to each other, but does not automatically report or remove duplicates.
Small filesA duplicate line is a line whose comparison value appears more than once in the same file. However, the result depends on the comparison rules you choose.
| Lines | Exact comparison | Case-insensitive comparison |
|---|---|---|
Windows and Windows |
Duplicate | Duplicate |
Windows and windows |
Different | Duplicate |
example and example |
Different because of the trailing space | Still different unless spaces are trimmed |
| Two empty lines | Duplicate blank line | Duplicate blank line |
For a strict check, use a case-sensitive comparison and do not trim whitespace. For lists of names, URLs, domains, or identifiers, you may prefer to ignore letter case and accidental spaces at the beginning or end of a line.
For repeated checks, use the following find_duplicate_lines.bat script. Drag a text file onto it or enter the full path. The script creates two UTF-8 files in the same folder:
filename_duplicates.txt โ a report containing every repeated line and its total occurrence count.filename_without_duplicates.txt โ a cleaned copy in which only the first occurrence of each line is kept.find_duplicate_lines.bat in the file name field.@echo off
setlocal EnableExtensions DisableDelayedExpansion
title Find and Remove Duplicate Lines in a Text File
set "INPUT_FILE=%~1"
if not defined INPUT_FILE (
echo Drag a text file onto this BAT file, or enter its full path below.
echo.
set /p "INPUT_FILE=Text file path: "
)
if not defined INPUT_FILE (
echo.
echo No file was selected.
pause
exit /b 1
)
for %%I in ("%INPUT_FILE%") do set "INPUT_FILE=%%~fI"
if not exist "%INPUT_FILE%" (
echo.
echo File not found:
echo %INPUT_FILE%
pause
exit /b 1
)
for %%I in ("%INPUT_FILE%") do (
set "DUPLICATE_FILE=%%~dpnI_duplicates.txt"
set "CLEAN_FILE=%%~dpnI_without_duplicates.txt"
)
echo.
echo Processing:
echo %INPUT_FILE%
echo.
set "FD_SCRIPT_FILE=%~f0"
set "FD_INPUT_FILE=%INPUT_FILE%"
set "FD_DUPLICATE_FILE=%DUPLICATE_FILE%"
set "FD_CLEAN_FILE=%CLEAN_FILE%"
powershell.exe -NoProfile -ExecutionPolicy Bypass -Command "$content=[IO.File]::ReadAllText($env:FD_SCRIPT_FILE); $marker=':'+'POWERSHELL'; $code=$content.Substring($content.IndexOf($marker)+$marker.Length); & ([ScriptBlock]::Create($code)) -Path $env:FD_INPUT_FILE -DuplicatePath $env:FD_DUPLICATE_FILE -CleanPath $env:FD_CLEAN_FILE"
set "RESULT=%ERRORLEVEL%"
echo.
if "%RESULT%"=="0" (
echo Finished. Two files were created:
echo Duplicate report:
echo %DUPLICATE_FILE%
echo.
echo File without duplicate lines:
echo %CLEAN_FILE%
) else if "%RESULT%"=="2" (
echo Finished. No exact duplicate lines were found.
echo The cleaned copy was still created:
echo %CLEAN_FILE%
) else (
echo The file could not be processed.
)
echo.
pause
exit /b %RESULT%
:POWERSHELL
param(
[Parameter(Mandatory = $true)]
[string]$Path,
[Parameter(Mandatory = $true)]
[string]$DuplicatePath,
[Parameter(Mandatory = $true)]
[string]$CleanPath
)
$ErrorActionPreference = 'Stop'
function Get-TextEncodingInfo {
param([string]$FilePath)
$stream = [System.IO.File]::OpenRead($FilePath)
try {
$bom = New-Object byte[] 4
$read = $stream.Read($bom, 0, 4)
}
finally {
$stream.Dispose()
}
if ($read -ge 4 -and
$bom[0] -eq 0x00 -and $bom[1] -eq 0x00 -and
$bom[2] -eq 0xFE -and $bom[3] -eq 0xFF) {
return [PSCustomObject]@{
Encoding = [System.Text.Encoding]::GetEncoding(12001)
AllowAnsiFallback = $false
}
}
if ($read -ge 4 -and
$bom[0] -eq 0xFF -and $bom[1] -eq 0xFE -and
$bom[2] -eq 0x00 -and $bom[3] -eq 0x00) {
return [PSCustomObject]@{
Encoding = [System.Text.Encoding]::UTF32
AllowAnsiFallback = $false
}
}
if ($read -ge 3 -and
$bom[0] -eq 0xEF -and $bom[1] -eq 0xBB -and
$bom[2] -eq 0xBF) {
return [PSCustomObject]@{
Encoding = New-Object System.Text.UTF8Encoding($true)
AllowAnsiFallback = $false
}
}
if ($read -ge 2 -and $bom[0] -eq 0xFF -and $bom[1] -eq 0xFE) {
return [PSCustomObject]@{
Encoding = [System.Text.Encoding]::Unicode
AllowAnsiFallback = $false
}
}
if ($read -ge 2 -and $bom[0] -eq 0xFE -and $bom[1] -eq 0xFF) {
return [PSCustomObject]@{
Encoding = [System.Text.Encoding]::BigEndianUnicode
AllowAnsiFallback = $false
}
}
return [PSCustomObject]@{
Encoding = New-Object System.Text.UTF8Encoding($false, $true)
AllowAnsiFallback = $true
}
}
function Read-And-CleanTextFile {
param(
[string]$FilePath,
[System.Text.Encoding]$Encoding,
[string]$TemporaryCleanPath
)
$comparer = [System.StringComparer]::Ordinal
$counts = New-Object 'System.Collections.Generic.Dictionary[string,int]' ($comparer)
$seen = New-Object 'System.Collections.Generic.HashSet[string]' ($comparer)
$utf8WithBom = New-Object System.Text.UTF8Encoding($true)
$reader = New-Object System.IO.StreamReader($FilePath, $Encoding, $true)
$writer = New-Object System.IO.StreamWriter($TemporaryCleanPath, $false, $utf8WithBom)
$totalLines = 0
try {
while (($line = $reader.ReadLine()) -ne $null) {
$totalLines++
if ($seen.Add($line)) {
$writer.WriteLine($line)
}
$count = 0
if ($counts.TryGetValue($line, [ref]$count)) {
$counts[$line] = $count + 1
}
else {
$counts.Add($line, 1)
}
}
}
finally {
$reader.Dispose()
$writer.Dispose()
}
return [PSCustomObject]@{
Counts = $counts
TotalLines = $totalLines
UniqueLines = $seen.Count
}
}
$tempCleanPath = Join-Path ([System.IO.Path]::GetDirectoryName($CleanPath)) ([System.IO.Path]::GetRandomFileName())
try {
$encodingInfo = Get-TextEncodingInfo -FilePath $Path
try {
$data = Read-And-CleanTextFile -FilePath $Path -Encoding $encodingInfo.Encoding -TemporaryCleanPath $tempCleanPath
}
catch [System.Text.DecoderFallbackException] {
if (-not $encodingInfo.AllowAnsiFallback) {
throw
}
if (Test-Path -LiteralPath $tempCleanPath) {
Remove-Item -LiteralPath $tempCleanPath -Force
}
$data = Read-And-CleanTextFile -FilePath $Path -Encoding ([System.Text.Encoding]::Default) -TemporaryCleanPath $tempCleanPath
}
if (Test-Path -LiteralPath $CleanPath) {
Remove-Item -LiteralPath $CleanPath -Force
}
Move-Item -LiteralPath $tempCleanPath -Destination $CleanPath
$duplicates = @(
$data.Counts.GetEnumerator() |
Where-Object { $_.Value -gt 1 } |
Sort-Object Key
)
if ($duplicates.Count -eq 0) {
if (Test-Path -LiteralPath $DuplicatePath) {
Remove-Item -LiteralPath $DuplicatePath -Force
}
Write-Host ("Lines read: {0}" -f $data.TotalLines)
Write-Host ("Unique lines written: {0}" -f $data.UniqueLines)
exit 2
}
$report = New-Object System.Collections.Generic.List[string]
$report.Add("Count`tLine")
foreach ($duplicate in $duplicates) {
$report.Add(("{0}`t{1}" -f $duplicate.Value, $duplicate.Key))
}
$utf8WithBom = New-Object System.Text.UTF8Encoding($true)
[System.IO.File]::WriteAllLines($DuplicatePath, $report, $utf8WithBom)
Write-Host ("Lines read: {0}" -f $data.TotalLines)
Write-Host ("Unique lines written: {0}" -f $data.UniqueLines)
Write-Host ("Duplicate groups found: {0}" -f $duplicates.Count)
exit 0
}
catch {
if (Test-Path -LiteralPath $tempCleanPath) {
Remove-Item -LiteralPath $tempCleanPath -Force
}
Write-Error $_.Exception.Message
exit 1
}
find_duplicate_lines.bat. Alternatively, double-click the BAT file and paste the full path to the text file.filename_duplicates.txt to see repeated lines and their occurrence counts.filename_without_duplicates.txt to get the cleaned list with duplicate occurrences removed.| File | Sorting behavior | Contents |
|---|---|---|
filename_duplicates.txt |
Sorted alphabetically by the duplicate line | One entry per duplicated value, together with its total count |
filename_without_duplicates.txt |
Not sorted; original order is preserved | Only the first occurrence of each exact line |
| Original TXT file | Not sorted or modified | Remains unchanged |
Count Line
4 example.com
2 server-01
Windows and windows are treated as different lines.PowerShell reads the file as a collection of lines. -Encoding UTF8 tells Windows PowerShell 5.1 to decode a UTF-8 file correctly, including UTF-8 files without a byte order mark. Group-Object groups equal lines, and Where-Object keeps only groups that contain more than one item.
powershell, and press Enter.input.txt with your file name.Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name
Count Name
----- ----
4 example.com
3 192.168.1.10
2 Windows 11
This result means that example.com appears four times, 192.168.1.10 appears three times, and Windows 11 appears twice.
You can run the same check from any folder by specifying the full path:
Get-Content -LiteralPath "C:\Users\User\Desktop\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name
-LiteralPath is preferable when a file name contains characters such as square brackets because PowerShell treats the path exactly as written. The examples use -Encoding UTF8; replace it with the correct encoding parameter when the source file is not UTF-8.
Use this version when you only need to know whether at least one repeated line exists:
$duplicate = Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object -First 1
if ($null -ne $duplicate) {
Write-Host "Duplicate lines found."
} else {
Write-Host "No duplicate lines found."
}
The command returns only the first duplicate group, although PowerShell still has to read and group the file before it can produce that result. It is useful in a script, scheduled task, or repeatable validation workflow.
if (Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 | Group-Object -CaseSensitive | Where-Object Count -gt 1 | Select-Object -First 1) { "Duplicate lines found" } else { "No duplicate lines found" }
To save the duplicate report instead of displaying it only in the console, export the grouped results to a CSV file:
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, @{Name="Line"; Expression={$_.Name}} |
Export-Csv -LiteralPath ".\duplicate-report.csv" -NoTypeInformation -Encoding UTF8
The output file duplicate-report.csv can be opened in Excel, LibreOffice Calc, Notepad, or another text editor.
Use the following command when you need a plain TXT file containing one copy of each line that was duplicated:
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
ForEach-Object Name |
Set-Content -LiteralPath ".\duplicate-lines.txt" -Encoding UTF8
Set-Content while the same pipeline is still reading that file.
Group-Object is case-insensitive by default. Remove -CaseSensitive to treat PC, Pc, and pc as the same value:
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object |
Where-Object Count -gt 1 |
Select-Object Count, Name
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
ForEach-Object { $_.Trim() } |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name
This makes example, example, and example equivalent. It does not remove spaces inside a line.
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Where-Object { $_.Trim().Length -gt 0 } |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
ForEach-Object { $_.Trim() } |
Where-Object { $_.Length -gt 0 } |
Group-Object |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name
The built-in Windows sort.exe command can arrange identical lines next to each other. This makes duplicates easier to see manually, but SORT does not automatically identify, count, or remove repeated lines.
cmd, and press Enter.sort "input.txt" /o "sorted.txt"
Open sorted.txt in Notepad. Repeated values will appear next to each other, making a visual check possible.
For an automatic check from Command Prompt, call Windows PowerShell from CMD. This example also specifies UTF-8 explicitly so Cyrillic and accented characters are not decoded as ANSI in Windows PowerShell 5.1:
powershell.exe -NoProfile -Command "Get-Content -LiteralPath '.\input.txt' -Encoding UTF8 | Group-Object -CaseSensitive | Where-Object Count -gt 1 | Select-Object Count, Name"
The BAT script in Method 1 is more efficient than the short Group-Object examples because it does not read the complete file into an array and does not alphabetically sort every source line. It reads sequentially, writes first occurrences directly to the cleaned output, and stores only the unique keys and their counters in memory.
The direct PowerShell commands remain convenient, but Group-Object can be noticeably slower and may consume substantial RAM on a large file.
For a large file, use these precautions:
Group-Object.If PowerShell displays Cyrillic, accented, or other non-English text as sequences such as ะ ัะก..., the source file is usually being decoded with the wrong character encoding. The text itself may still be intact; the problem is how Get-Content interprets the file bytes.
-Encoding UTF8.
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name
This works for UTF-8 text with or without a byte order mark in Windows PowerShell 5.1. In newer PowerShell versions, UTF-8 is normally the default for many text operations, but specifying it explicitly makes the command unambiguous.
| Source file encoding | Get-Content parameter | When to use it |
|---|---|---|
| UTF-8 | -Encoding UTF8 |
Recommended for modern TXT files, including UTF-8 without BOM. |
| Windows ANSI code page | -Encoding Default |
Use for older files saved in the current Windows system code page. |
| UTF-16 little-endian | -Encoding Unicode |
Use for UTF-16 LE files, often identified by an FF FE BOM. |
| UTF-16 big-endian | -Encoding BigEndianUnicode |
Use for UTF-16 BE files, often identified by an FE FF BOM. |
Get-Content -LiteralPath ".\input.txt" -Encoding Default |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name
Get-Content -LiteralPath ".\input.txt" -Encoding Unicode |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name
chcp 65001 changes the console code page, but it does not tell Get-Content how to decode the source file. Specify -Encoding UTF8 on Get-Content instead.
The reusable find_duplicate_lines.bat script included in this guide recognizes UTF-8 with or without BOM, supports BOM-marked UTF-16 and UTF-32, and falls back to the Windows ANSI code page when a BOM-less byte sequence is not valid UTF-8. It writes both the duplicate report and the cleaned copy as UTF-8 with BOM.
Get-Content, Group-Object, and Select-Object only read and analyze the file unless you explicitly add a writing command such as Set-Content or Export-Csv.(Get-Content -LiteralPath ".\input.txt" -Encoding UTF8).Count. For a very large file, this still reads the file through PowerShell and may take time.find_duplicate_lines.bat. It automatically creates *_without_duplicates.txt and leaves the source file unchanged. The cleaned output keeps the first occurrence of each exact line.Group-Object is case-insensitive by default. Add -CaseSensitive when uppercase and lowercase characters must be compared exactly.-Encoding UTF8 immediately after the file path. Changing the console with chcp 65001 alone does not correct how Get-Content decodes the file.find_duplicate_lines.bat reads the selected file and creates *_duplicates.txt plus *_without_duplicates.txt. The original text file is never overwritten.For repeated checks, use find_duplicate_lines.bat. It accepts a file by drag and drop, performs an exact case-sensitive comparison, creates an alphabetically sorted duplicate report, and writes a second UTF-8 file without duplicate occurrences while preserving the original order.
Use PowerShell with Get-Content -Encoding UTF8 and Group-Object when you only need an interactive report or want to customize comparison rules. The Windows sort.exe command is suitable only for manual inspection of small files.