Get distinct words from a given file in Java

Carvia Tech | February 13, 2020 | 2 min read | 256 views


We will extract distinct words from a given file using Java.

Concepts

  • Set data structure does not allow duplicate elements, so it can be used for filtering out duplicate words.

  • Using regex we can split the given text file into words, Java provides StringTokenizer class that can help splitting each line of file.

  • We need to close any input file so as to avoid file handle leaks inside Java program. try with resource takes care of automatically closing the underlying input stream once block of code is executed.

Java 11 code solution

We will use Java 11 to implement the solution for given coding problem.

DistinctWords.java
import java.io.*;
import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;
import java.util.logging.Level;
import java.util.logging.Logger;

public class DistinctWords {

    private static final Logger LOGGER = Logger.getLogger("DistinctWords");

    public Set<String> getDistinctWords(String fileName) {
        Set<String> wordSet = new HashSet<>();
        try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)))) {
            String line;
            while ((line = br.readLine()) != null) {
                StringTokenizer st = new StringTokenizer(line, " ,.;:\"");
                while (st.hasMoreTokens()) {
                    wordSet.add(st.nextToken().toLowerCase());
                }
            }
        } catch (IOException e) {
            LOGGER.log(Level.SEVERE, "IOException occurred", e);
        }
        return wordSet;
    }

    public static void main(String[] args) {
        DistinctWords distinctFileWords = new DistinctWords();
        Set<String> wordList = distinctFileWords.getDistinctWords("<path-to-file>");
        for (String str : wordList) {
            System.out.println(str);
        }
    }

}

Kotlin implementation

Kotlin implementation for the same would be much more concise.

DistinctWords.kt
import java.io.File
import java.util.*

class DistinctWords {
    fun getDistinctWords(fileName: String): Set<String> {
        val wordSet: MutableSet<String> = HashSet()
        File(fileName).forEachLine { line ->
            val words = line.split(" ,.;:\"")
            words.forEach { t: String -> wordSet.add(t) }
        }
        return wordSet
    }
}

fun main() {
    val distinctFileWords = DistinctWords()
    val wordList = distinctFileWords.getDistinctWords("<path-to-file>")
    wordList.forEach { str ->
        println(str)
    }
}

That’s all.


Top articles in this category:
  1. SDET Java Coding Challenges
  2. 50 SDET Java Interview Questions & Answers
  3. Rest Assured API Testing Interview Questions
  4. SDET: JUnit interview questions for automation engineer
  5. Write a program to reverse the order of words in a string
  6. Create anagram buckets from a given input array of words
  7. Find longest non-repeating substring from a given string in Java


Find more on this topic:
SDET Interviews image
SDET Interviews

SDET Java Interview pattern and collection of questions covering SDET coding challenges, automation testing concepts, functional, api, integration, performance and security testing, junit5, testng, jmeter, selenium and rest assured

Last updated 1 week ago


Recommended books for interview preparation:

This website uses cookies to ensure you get the best experience on our website. more info