Find Words Frequency in a Text File using Java

Find Words Frequency in a Text File using Java

This Java program is designed to analyze text files and count the occurrences of words, showcasing key concepts for novice programmers. It utilizes Java’s built-in classes, such as BufferedReader for reading input and PrintWriter for writing output. The program employs a TreeMap data structure to efficiently organize words alphabetically, and a custom class WordData to store information about each word, including its frequency count. The use of generics, demonstrated through TreeMap<String, WordData> and Comparator, ensures type safety and flexibility.

The program demonstrates sorting techniques by sorting a list of words based on their frequency count. Novice programmers can learn about basic file handling, data structures, generics, and sorting algorithms through this program, gaining foundational knowledge applicable to a wide range of Java applications.

The main functionalities of the program include:

  1. Input Reading: The program reads words from an input file using the BufferedReader class. It processes each word and ignores non-letter characters.
  2. Data Representation: Tracks word occurrences by using a TreeMap<String, WordData>, where WordData is a custom class storing information about each word, including its frequency count.
  3. Output Generation: Two distinct outputs are created. The first output displays words in alphabetical order, and the second output presents words ordered by the number of occurrences. This is achieved using the PrintWriter class.
  4. Generic Programming Features: The program utilizes elements of Java’s generic programming framework, such as the use of TreeMap and sorting of a list of words with custom Comparators. The Comparator is now defined with generics to ensure type safety.
  5. Error Handling: The program incorporates improved error handling mechanisms to address potential issues related to file input/output. Explicit closure of the PrintWriter is done to prevent premature closure and potential errors.
  6. Modern File Handling: The outdated TextReader class has been replaced with BufferedReader, and the program utilizes the try-with-resources statement for file handling to ensure proper resource closure.

Find Words Frequency in a Text File Java Program

import java.io.*;
import java.util.*;

public class WordCount {

    static BufferedReader in; // Use BufferedReader for reading input file.
    static PrintWriter out; // Output stream for writing the output file.

    static class WordData {
        String word;
        int count;

        WordData(String w) {
            word = w;
            count = 1;
        }
    }

    static class CountCompare implements Comparator<WordData> {
        public int compare(WordData data1, WordData data2) {
            return data2.count - data1.count;
        }
    }

    

    static void printWords(PrintWriter outStream, Collection<WordData> wordData) {
        Iterator<WordData> iter = wordData.iterator();
        while (iter.hasNext()) {
            WordData data = iter.next();
            outStream.println(" " + data.word + " (" + data.count + ")");
        }
    }

    

    static void readWords(BufferedReader inStream, Map<String, WordData> words) {
        try {
            String line;
            while ((line = inStream.readLine()) != null) {
                String[] wordsInLine = line.split("\\s+"); // Split line into words
                for (String word : wordsInLine) {
                    if (!word.isEmpty() && Character.isLetter(word.charAt(0))) {
                        word = word.toLowerCase();
                        WordData data = words.get(word);

                        if (data == null) {
                            words.put(word, new WordData(word));
                        } else {
                            data.count = data.count + 1;
                        }
                    }
                }
            }
        } catch (IOException e) {
            System.out.println("An error occurred while reading the data.");
            System.out.println(e.toString());
            System.exit(1);
        }
    }
    
    static void openFiles(String[] args) {
        if (args.length != 2) {
            System.out.println("Error: Please specify file names on the command line.");
            System.exit(1);
        }

        try {
            in = new BufferedReader(new FileReader(args[0]));
        } catch (IOException e) {
            System.out.println("Error: Can't open input file " + args[0]);
            System.exit(1);
        }

        try {
            out = new PrintWriter(new FileWriter(args[1]));
        } catch (IOException e) {
            System.out.println("Error: Can't open output file " + args[1]);
            System.exit(1);
        }
    }

    public static void main(String[] args) {
        openFiles(args);

        TreeMap<String, WordData> words = new TreeMap<>();
        readWords(in, words);

        List<WordData> wordsByCount = new ArrayList<>(words.values());
        wordsByCount.sort(new CountCompare());

        out.println("Words found in the file named \"" + args[0] + "\".\n");
        out.println("The number of times that the word occurred in the");
        out.println("file is given in parentheses after the word.\n\n");
        out.println("The words from the file in alphabetical order:\n");

        printWords(out, words.values());

        out.println("\n\nThe words in order of frequency:\n");

        printWords(out, wordsByCount);

        out.close(); // Explicitly close the PrintWriter.

        if (out.checkError()) {
            System.out.println("An error occurred while writing the data.");
            System.out.println("Output file might be missing or incomplete.");
            System.exit(1);
        }

        System.out.println(words.size() + " distinct words were found.");
    }
}

 

 

M. Saqib: Saqib is Master-level Senior Software Engineer with over 14 years of experience in designing and developing large-scale software and web applications. He has more than eight years experience of leading software development teams. Saqib provides consultancy to develop software systems and web services for Fortune 500 companies. He has hands-on experience in C/C++ Java, JavaScript, PHP and .NET Technologies. Saqib owns and write contents on mycplus.com since 2004.
Related Post