Windows Text File Guide ยท 2026

How to Find Duplicate Lines
in a Text File Using Windows Tools

Check whether a TXT file contains repeated lines, create a cleaned copy without duplicates, preserve the original order, avoid UTF-8 encoding problems, or use PowerShell and Command Prompt for manual checks.

โŠž Windows 10 โŠž Windows 11 โš™ PowerShell ๐Ÿ’ป Command Prompt ๐Ÿ“„ TXT Files ๐Ÿ”ง Built-in Tools

How to Check a Text File for Duplicate Lines in Windows

The easiest reusable method is the BAT script in Method 1. Drag a TXT file onto it to create both a duplicate report and a cleaned copy without repeated lines. The original file is not modified. For a quick console-only check, use the following PowerShell command:

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name

Replace input.txt with the actual file name. This command only reports duplicates; it does not create a cleaned file. The -Encoding UTF8 option is important in Windows PowerShell 5.1 because a UTF-8 file without a byte order mark may otherwise be decoded as the Windows ANSI code page.

โ„น๏ธ
Important The -CaseSensitive option treats Server and server as different lines. Remove this option when letter case should be ignored.
โš ๏ธ
Incorrect Cyrillic or accented characters If the output contains text such as ะ ั–ะก..., the file was read with the wrong encoding. Keep -Encoding UTF8 for UTF-8 files. Use -Encoding Default for a legacy ANSI file or -Encoding Unicode for UTF-16 little-endian text.
Recommended

find_duplicate_lines.bat

Creates a duplicate report and a cleaned UTF-8 file while preserving the order of first occurrences.

Find and clean
Console check

PowerShell Group-Object

Shows every repeated line and the number of occurrences, but can be slow on very large files.

Quick report
Manual check

Command Prompt SORT

Places equal lines next to each other, but does not automatically report or remove duplicates.

Small files

What Counts as a Duplicate Line in a Windows Text File?

A duplicate line is a line whose comparison value appears more than once in the same file. However, the result depends on the comparison rules you choose.

Lines Exact comparison Case-insensitive comparison
Windows and Windows Duplicate Duplicate
Windows and windows Different Duplicate
example and example Different because of the trailing space Still different unless spaces are trimmed
Two empty lines Duplicate blank line Duplicate blank line

For a strict check, use a case-sensitive comparison and do not trim whitespace. For lists of names, URLs, domains, or identifiers, you may prefer to ignore letter case and accidental spaces at the beginning or end of a line.

Find Duplicate Lines and Create a Clean File with a BAT Script

For repeated checks, use the following find_duplicate_lines.bat script. Drag a text file onto it or enter the full path. The script creates two UTF-8 files in the same folder:

โ„น๏ธ
Faster processing without sorting the complete file The current script does not sort all source lines before counting them. It reads the file sequentially, uses a case-sensitive dictionary and hash set, and writes unique lines as they are found. The cleaned file therefore preserves the original line order. Only the much smaller duplicate report is sorted alphabetically.
โœ…
The source file is not changed The BAT script reads the selected file and writes two separate output files. It never overwrites, sorts, or deletes lines from the original file.

Create find_duplicate_lines.bat

  1. Open Notepad.
  2. Copy the complete script shown below.
  3. Select File โ†’ Save As.
  4. Enter find_duplicate_lines.bat in the file name field.
  5. Select All files as the file type and save the BAT file.
@echo off
setlocal EnableExtensions DisableDelayedExpansion
title Find and Remove Duplicate Lines in a Text File

set "INPUT_FILE=%~1"

if not defined INPUT_FILE (
  echo Drag a text file onto this BAT file, or enter its full path below.
  echo.
  set /p "INPUT_FILE=Text file path: "
)

if not defined INPUT_FILE (
  echo.
  echo No file was selected.
  pause
  exit /b 1
)

for %%I in ("%INPUT_FILE%") do set "INPUT_FILE=%%~fI"

if not exist "%INPUT_FILE%" (
  echo.
  echo File not found:
  echo %INPUT_FILE%
  pause
  exit /b 1
)

for %%I in ("%INPUT_FILE%") do (
  set "DUPLICATE_FILE=%%~dpnI_duplicates.txt"
  set "CLEAN_FILE=%%~dpnI_without_duplicates.txt"
)

echo.
echo Processing:
echo %INPUT_FILE%
echo.

set "FD_SCRIPT_FILE=%~f0"
set "FD_INPUT_FILE=%INPUT_FILE%"
set "FD_DUPLICATE_FILE=%DUPLICATE_FILE%"
set "FD_CLEAN_FILE=%CLEAN_FILE%"

powershell.exe -NoProfile -ExecutionPolicy Bypass -Command "$content=[IO.File]::ReadAllText($env:FD_SCRIPT_FILE); $marker=':'+'POWERSHELL'; $code=$content.Substring($content.IndexOf($marker)+$marker.Length); & ([ScriptBlock]::Create($code)) -Path $env:FD_INPUT_FILE -DuplicatePath $env:FD_DUPLICATE_FILE -CleanPath $env:FD_CLEAN_FILE"

set "RESULT=%ERRORLEVEL%"
echo.

if "%RESULT%"=="0" (
  echo Finished. Two files were created:
  echo Duplicate report:
  echo %DUPLICATE_FILE%
  echo.
  echo File without duplicate lines:
  echo %CLEAN_FILE%
) else if "%RESULT%"=="2" (
  echo Finished. No exact duplicate lines were found.
  echo The cleaned copy was still created:
  echo %CLEAN_FILE%
) else (
  echo The file could not be processed.
)

echo.
pause
exit /b %RESULT%

:POWERSHELL
param(
    [Parameter(Mandatory = $true)]
    [string]$Path,

    [Parameter(Mandatory = $true)]
    [string]$DuplicatePath,

    [Parameter(Mandatory = $true)]
    [string]$CleanPath
)

$ErrorActionPreference = 'Stop'

function Get-TextEncodingInfo {
    param([string]$FilePath)

    $stream = [System.IO.File]::OpenRead($FilePath)

    try {
        $bom = New-Object byte[] 4
        $read = $stream.Read($bom, 0, 4)
    }
    finally {
        $stream.Dispose()
    }

    if ($read -ge 4 -and
        $bom[0] -eq 0x00 -and $bom[1] -eq 0x00 -and
        $bom[2] -eq 0xFE -and $bom[3] -eq 0xFF) {
        return [PSCustomObject]@{
            Encoding = [System.Text.Encoding]::GetEncoding(12001)
            AllowAnsiFallback = $false
        }
    }

    if ($read -ge 4 -and
        $bom[0] -eq 0xFF -and $bom[1] -eq 0xFE -and
        $bom[2] -eq 0x00 -and $bom[3] -eq 0x00) {
        return [PSCustomObject]@{
            Encoding = [System.Text.Encoding]::UTF32
            AllowAnsiFallback = $false
        }
    }

    if ($read -ge 3 -and
        $bom[0] -eq 0xEF -and $bom[1] -eq 0xBB -and
        $bom[2] -eq 0xBF) {
        return [PSCustomObject]@{
            Encoding = New-Object System.Text.UTF8Encoding($true)
            AllowAnsiFallback = $false
        }
    }

    if ($read -ge 2 -and $bom[0] -eq 0xFF -and $bom[1] -eq 0xFE) {
        return [PSCustomObject]@{
            Encoding = [System.Text.Encoding]::Unicode
            AllowAnsiFallback = $false
        }
    }

    if ($read -ge 2 -and $bom[0] -eq 0xFE -and $bom[1] -eq 0xFF) {
        return [PSCustomObject]@{
            Encoding = [System.Text.Encoding]::BigEndianUnicode
            AllowAnsiFallback = $false
        }
    }

    return [PSCustomObject]@{
        Encoding = New-Object System.Text.UTF8Encoding($false, $true)
        AllowAnsiFallback = $true
    }
}

function Read-And-CleanTextFile {
    param(
        [string]$FilePath,
        [System.Text.Encoding]$Encoding,
        [string]$TemporaryCleanPath
    )

    $comparer = [System.StringComparer]::Ordinal
    $counts = New-Object 'System.Collections.Generic.Dictionary[string,int]' ($comparer)
    $seen = New-Object 'System.Collections.Generic.HashSet[string]' ($comparer)
    $utf8WithBom = New-Object System.Text.UTF8Encoding($true)
    $reader = New-Object System.IO.StreamReader($FilePath, $Encoding, $true)
    $writer = New-Object System.IO.StreamWriter($TemporaryCleanPath, $false, $utf8WithBom)
    $totalLines = 0

    try {
        while (($line = $reader.ReadLine()) -ne $null) {
            $totalLines++

            if ($seen.Add($line)) {
                $writer.WriteLine($line)
            }

            $count = 0

            if ($counts.TryGetValue($line, [ref]$count)) {
                $counts[$line] = $count + 1
            }
            else {
                $counts.Add($line, 1)
            }
        }
    }
    finally {
        $reader.Dispose()
        $writer.Dispose()
    }

    return [PSCustomObject]@{
        Counts = $counts
        TotalLines = $totalLines
        UniqueLines = $seen.Count
    }
}

$tempCleanPath = Join-Path ([System.IO.Path]::GetDirectoryName($CleanPath)) ([System.IO.Path]::GetRandomFileName())

try {
    $encodingInfo = Get-TextEncodingInfo -FilePath $Path

    try {
        $data = Read-And-CleanTextFile -FilePath $Path -Encoding $encodingInfo.Encoding -TemporaryCleanPath $tempCleanPath
    }
    catch [System.Text.DecoderFallbackException] {
        if (-not $encodingInfo.AllowAnsiFallback) {
            throw
        }

        if (Test-Path -LiteralPath $tempCleanPath) {
            Remove-Item -LiteralPath $tempCleanPath -Force
        }

        $data = Read-And-CleanTextFile -FilePath $Path -Encoding ([System.Text.Encoding]::Default) -TemporaryCleanPath $tempCleanPath
    }

    if (Test-Path -LiteralPath $CleanPath) {
        Remove-Item -LiteralPath $CleanPath -Force
    }

    Move-Item -LiteralPath $tempCleanPath -Destination $CleanPath

    $duplicates = @(
        $data.Counts.GetEnumerator() |
        Where-Object { $_.Value -gt 1 } |
        Sort-Object Key
    )

    if ($duplicates.Count -eq 0) {
        if (Test-Path -LiteralPath $DuplicatePath) {
            Remove-Item -LiteralPath $DuplicatePath -Force
        }

        Write-Host ("Lines read: {0}" -f $data.TotalLines)
        Write-Host ("Unique lines written: {0}" -f $data.UniqueLines)
        exit 2
    }

    $report = New-Object System.Collections.Generic.List[string]
    $report.Add("Count`tLine")

    foreach ($duplicate in $duplicates) {
        $report.Add(("{0}`t{1}" -f $duplicate.Value, $duplicate.Key))
    }

    $utf8WithBom = New-Object System.Text.UTF8Encoding($true)
    [System.IO.File]::WriteAllLines($DuplicatePath, $report, $utf8WithBom)

    Write-Host ("Lines read: {0}" -f $data.TotalLines)
    Write-Host ("Unique lines written: {0}" -f $data.UniqueLines)
    Write-Host ("Duplicate groups found: {0}" -f $duplicates.Count)
    exit 0
}
catch {
    if (Test-Path -LiteralPath $tempCleanPath) {
        Remove-Item -LiteralPath $tempCleanPath -Force
    }

    Write-Error $_.Exception.Message
    exit 1
}

How to Run the BAT Script

  1. Drag the TXT file onto find_duplicate_lines.bat. Alternatively, double-click the BAT file and paste the full path to the text file.
  2. Wait while the script reads the file and counts exact line matches.
  3. Open filename_duplicates.txt to see repeated lines and their occurrence counts.
  4. Open filename_without_duplicates.txt to get the cleaned list with duplicate occurrences removed.

What the Script Does with Line Order

File Sorting behavior Contents
filename_duplicates.txt Sorted alphabetically by the duplicate line One entry per duplicated value, together with its total count
filename_without_duplicates.txt Not sorted; original order is preserved Only the first occurrence of each exact line
Original TXT file Not sorted or modified Remains unchanged

Duplicate Report Example

Count	Line
4	example.com
2	server-01

Comparison Rules Used by the Script

Advantages

  • Creates both a duplicate report and a cleaned file.
  • Preserves the original order in the cleaned output.
  • Does not sort or load the complete source file into an array.
  • Uses built-in Windows components and needs no separate PS1 file.
  • Works by drag and drop.

Limitations

  • Keeps one dictionary entry for every unique line, so files with millions of different lines can still require substantial RAM.
  • Uses exact comparison; normalization of case or spaces requires script changes.
  • Ambiguous legacy encodings without BOM may need manual conversion to UTF-8.
  • Requires Windows PowerShell, included with standard Windows 10 and Windows 11 installations.

How to Show Duplicate Lines and Their Counts with PowerShell

PowerShell reads the file as a collection of lines. -Encoding UTF8 tells Windows PowerShell 5.1 to decode a UTF-8 file correctly, including UTF-8 files without a byte order mark. Group-Object groups equal lines, and Where-Object keeps only groups that contain more than one item.

  1. Place the text file in an easy-to-find folder.
  2. Open that folder in File Explorer.
  3. Click the address bar, type powershell, and press Enter.
  4. Run the command below after replacing input.txt with your file name.
Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name

Example Result

Count Name
----- ----
    4 example.com
    3 192.168.1.10
    2 Windows 11

This result means that example.com appears four times, 192.168.1.10 appears three times, and Windows 11 appears twice.

โœ…
Safe operation This command only reads the source file. It does not change, sort, overwrite, or delete any lines in the original text file.

Use a Full File Path

You can run the same check from any folder by specifying the full path:

Get-Content -LiteralPath "C:\Users\User\Desktop\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name

-LiteralPath is preferable when a file name contains characters such as square brackets because PowerShell treats the path exactly as written. The examples use -Encoding UTF8; replace it with the correct encoding parameter when the source file is not UTF-8.

How to Check Whether Duplicate Lines Exist and Return Only Yes or No

Use this version when you only need to know whether at least one repeated line exists:

$duplicate = Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object -First 1

if ($null -ne $duplicate) {
    Write-Host "Duplicate lines found."
} else {
    Write-Host "No duplicate lines found."
}

The command returns only the first duplicate group, although PowerShell still has to read and group the file before it can produce that result. It is useful in a script, scheduled task, or repeatable validation workflow.

One-Line Version

if (Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 | Group-Object -CaseSensitive | Where-Object Count -gt 1 | Select-Object -First 1) { "Duplicate lines found" } else { "No duplicate lines found" }

How to Export Duplicate Lines and Occurrence Counts to CSV

To save the duplicate report instead of displaying it only in the console, export the grouped results to a CSV file:

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, @{Name="Line"; Expression={$_.Name}} |
Export-Csv -LiteralPath ".\duplicate-report.csv" -NoTypeInformation -Encoding UTF8

The output file duplicate-report.csv can be opened in Excel, LibreOffice Calc, Notepad, or another text editor.

Save Each Duplicate Value Only Once

Use the following command when you need a plain TXT file containing one copy of each line that was duplicated:

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
ForEach-Object Name |
Set-Content -LiteralPath ".\duplicate-lines.txt" -Encoding UTF8
โš ๏ธ
Do not overwrite the source Write the results to a different file. Do not use the original input path with Set-Content while the same pipeline is still reading that file.

How to Ignore Letter Case, Extra Spaces, or Blank Lines

Ignore Letter Case

Group-Object is case-insensitive by default. Remove -CaseSensitive to treat PC, Pc, and pc as the same value:

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object |
Where-Object Count -gt 1 |
Select-Object Count, Name

Ignore Spaces at the Beginning and End

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
ForEach-Object { $_.Trim() } |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name

This makes example, example, and example equivalent. It does not remove spaces inside a line.

Ignore Empty and Whitespace-Only Lines

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Where-Object { $_.Trim().Length -gt 0 } |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name

Ignore Case, Trim Spaces, and Skip Empty Lines

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
ForEach-Object { $_.Trim() } |
Where-Object { $_.Length -gt 0 } |
Group-Object |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name
๐Ÿงญ
Choose rules first Decide whether case differences, leading spaces, trailing spaces, and blank lines are meaningful before interpreting the report. Changing the normalization rules changes which lines qualify as duplicates.

How to Check Duplicate Lines with Command Prompt and the SORT Command

The built-in Windows sort.exe command can arrange identical lines next to each other. This makes duplicates easier to see manually, but SORT does not automatically identify, count, or remove repeated lines.

  1. Open the folder containing the text file.
  2. Click the File Explorer address bar, type cmd, and press Enter.
  3. Run the following command:
sort "input.txt" /o "sorted.txt"

Open sorted.txt in Notepad. Repeated values will appear next to each other, making a visual check possible.

Advantages

  • Uses a built-in Windows command.
  • Simple for a small file.
  • Does not require a script.
  • Preserves the original file when a separate output path is used.

Limitations

  • Does not print a duplicate report.
  • Does not show occurrence counts.
  • Manual inspection is impractical for long files.
  • Sorting can be affected by locale and character encoding.

For an automatic check from Command Prompt, call Windows PowerShell from CMD. This example also specifies UTF-8 explicitly so Cyrillic and accented characters are not decoded as ANSI in Windows PowerShell 5.1:

powershell.exe -NoProfile -Command "Get-Content -LiteralPath '.\input.txt' -Encoding UTF8 | Group-Object -CaseSensitive | Where-Object Count -gt 1 | Select-Object Count, Name"

How to Check Large Text Files for Duplicate Lines in Windows

The BAT script in Method 1 is more efficient than the short Group-Object examples because it does not read the complete file into an array and does not alphabetically sort every source line. It reads sequentially, writes first occurrences directly to the cleaned output, and stores only the unique keys and their counters in memory.

The direct PowerShell commands remain convenient, but Group-Object can be noticeably slower and may consume substantial RAM on a large file.

For a large file, use these precautions:

๐Ÿ’ก
Practical tip The BAT script does not perform a preliminary alphabetical sort of the complete input. It processes the file in one pass and sorts only the final duplicate groups for the report. However, it still keeps one dictionary entry per unique line, so a multi-gigabyte file containing millions of different lines may require substantial RAM.

How to Fix Incorrect Characters When Checking Duplicate Lines

If PowerShell displays Cyrillic, accented, or other non-English text as sequences such as ะ ั–ะก..., the source file is usually being decoded with the wrong character encoding. The text itself may still be intact; the problem is how Get-Content interprets the file bytes.

โš ๏ธ
Windows PowerShell 5.1 behavior Windows PowerShell 5.1 can interpret a UTF-8 file without a byte order mark as the current Windows ANSI code page when no encoding is specified. Therefore, the main commands in this guide explicitly use -Encoding UTF8.

Read a UTF-8 File Correctly

Get-Content -LiteralPath ".\input.txt" -Encoding UTF8 |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Sort-Object Count -Descending |
Select-Object Count, Name

This works for UTF-8 text with or without a byte order mark in Windows PowerShell 5.1. In newer PowerShell versions, UTF-8 is normally the default for many text operations, but specifying it explicitly makes the command unambiguous.

Choose the Parameter That Matches the File

Source file encoding Get-Content parameter When to use it
UTF-8 -Encoding UTF8 Recommended for modern TXT files, including UTF-8 without BOM.
Windows ANSI code page -Encoding Default Use for older files saved in the current Windows system code page.
UTF-16 little-endian -Encoding Unicode Use for UTF-16 LE files, often identified by an FF FE BOM.
UTF-16 big-endian -Encoding BigEndianUnicode Use for UTF-16 BE files, often identified by an FE FF BOM.

Read a Legacy ANSI File

Get-Content -LiteralPath ".\input.txt" -Encoding Default |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name

Read a UTF-16 Little-Endian File

Get-Content -LiteralPath ".\input.txt" -Encoding Unicode |
Group-Object -CaseSensitive |
Where-Object Count -gt 1 |
Select-Object Count, Name
โ„น๏ธ
Why chcp 65001 is not enough Running chcp 65001 changes the console code page, but it does not tell Get-Content how to decode the source file. Specify -Encoding UTF8 on Get-Content instead.

The reusable find_duplicate_lines.bat script included in this guide recognizes UTF-8 with or without BOM, supports BOM-marked UTF-16 and UTF-32, and falls back to the Windows ANSI code page when a BOM-less byte sequence is not valid UTF-8. It writes both the duplicate report and the cleaned copy as UTF-8 with BOM.

FAQ: Finding Duplicate Lines in TXT Files on Windows

QDoes Notepad have a built-in duplicate-line finder?โ–ผ
No. Windows Notepad can search for text, but it does not group lines or report repeated values. Use the BAT script in Method 1 to create both a report and a cleaned file, or use PowerShell for a console-only check.
QDoes the PowerShell command modify the original file?โ–ผ
No. Commands based on Get-Content, Group-Object, and Select-Object only read and analyze the file unless you explicitly add a writing command such as Set-Content or Export-Csv.
QHow do I count all lines in the file?โ–ผ
Run (Get-Content -LiteralPath ".\input.txt" -Encoding UTF8).Count. For a very large file, this still reads the file through PowerShell and may take time.
QHow do I create a file with all unique lines?โ–ผ
Use find_duplicate_lines.bat. It automatically creates *_without_duplicates.txt and leaves the source file unchanged. The cleaned output keeps the first occurrence of each exact line.
QHow do I keep the original order while removing duplicate lines?โ–ผ
The BAT script in Method 1 already does this. It uses a case-sensitive hash set and writes only the first occurrence of each line, so the cleaned file preserves the source order.
QWhy does the report treat differently cased lines as duplicates?โ–ผ
Group-Object is case-insensitive by default. Add -CaseSensitive when uppercase and lowercase characters must be compared exactly.
QWhy does PowerShell display Cyrillic text as strange characters?โ–ผ
Windows PowerShell 5.1 may read a UTF-8 file without BOM as the system ANSI code page. Add -Encoding UTF8 immediately after the file path. Changing the console with chcp 65001 alone does not correct how Get-Content decodes the file.
QDoes the BAT script change or remove lines from the source file?โ–ผ
No. find_duplicate_lines.bat reads the selected file and creates *_duplicates.txt plus *_without_duplicates.txt. The original text file is never overwritten.

Best Windows Method for Finding and Removing Duplicate Text Lines

Recommended Approach

For repeated checks, use find_duplicate_lines.bat. It accepts a file by drag and drop, performs an exact case-sensitive comparison, creates an alphabetically sorted duplicate report, and writes a second UTF-8 file without duplicate occurrences while preserving the original order.

Use PowerShell with Get-Content -Encoding UTF8 and Group-Object when you only need an interactive report or want to customize comparison rules. The Windows sort.exe command is suitable only for manual inspection of small files.