January 27, 2015

Anagram Sort

An interesting way to sort strings.

It's interesting to examine different methods of sorting, besides strictly ascending and descending order. One such sorting algorithm is Anagram Sort, which is used to sort strings. It simply takes a list of strings outputs the list with all anagrams next to each other. So how can we do something like this efficiently?

The core of the problem is determining which strings are anagrams of each other. To do this, we can sort the characters in each string and then compare them in some way. An efficient method for storing all of the strings associated with one sorted string of characters is a hash map.

For each string, we will sort it and check if the sorted value is already a key in the map. If not, we will add it as a key associated with a list containing the unsorted string. For each anagram we encounter, we will find that the sorted string is already a key in the map, and we can simply add the unsorted string to the list associated with that key. Once we have completed the process for every string in our list, we can simply iterate over the keys in the hash map and output the strings in each list. They will be anagram sorted.

The time complexity of the entire algorithm is $O(k⋅M(n))$ , where k is the numberof strings in the list and $M(n)$ is the time complexity of sorting each string. Using a good comparison sort, this will be $O(n⋅\log(n))$ .

The code (in Java) is below:

public static List<String> anagramSort(List<String> list) {

    Map<String, List<String>> map = new HashMap<String, List<String>>();
    for (String word : list) {

        // Sort the characters in the string.
        char [] charSeq = word.toCharArray();
        Arrays.sort(charSeq);
        String sortedWord = new String(charSeq);

        // If the sorted word is already a key in the map, get its value
        // list and add the unsorted word to it. Otherwise, create a new
        // list with the unsorted word as the first value, and put the list
        // in the map with the sorted word as the key.
        if (map.containsKey(sortedWord)) {
            map.get(sortedWord).add(word);
        } else {
            List<String> keyValues = new ArrayList<String>();
            keyValues.add(word);
            map.put(sortedWord, keyValues);
        }
    }

    List<String> sortedList = new ArrayList<String>();

    // Add the values from each key to the anagram-sorted list. The values
    // in each key are anagrams of each other.
    for (List anagrams : map.values()) {
        for (String word : anagrams)
            sortedList.add(word);
    }

    return sortedList;
}

Another option that I have not written in code would be to create a histogram for each string, and then use the histograms as the keys in the hash map. The tradeoff here is that constructing a histogram for a string should only be a $O(n)$ operation (as opposed to $O(n⋅\log(n))$ to sort a string), where $n$ is the number of characters in the string. However, it could require more memory, depending on the size of the string. Each histogram could simply be an integer array of size 26 (assuming only letters are used), where each element represents the number of occurences of the corresponding letter in the string (Element 0 is A, Element 1 is B, etc.).