12 June 2009

The best match problem

The best match problem is a very common problem that a programmer regularly encounters when working on audit/feedback programs. So I was quite happy to see the fourth problem on TopCoder I went through was one concerning best matches.

The problem however in this kind of exercise is that the quality of code is not the only important part: speed is important too, and thus sometimes you just simply tweak a bit around to gain some precious time. But it's interesting to see how you code in this kind of situation and this is a nice example of that. Let's see what it's all about.

A new online match making company needs some software to help find the "perfect couples".  People who sign up answer a series of multiple-choice questions. Then, when a member makes a "Get Best Mates" request, the software returns a list of users whose gender matches the requested gender and whose answers to the questions were equal to or greater than a similarity factor when compared to the user's answers.

So here is an example of what you can get and what you should return:

members =
{"BETTY F M A A C C",
"TOM M F A D C A",
"SUE F M D D D D",
"ELLEN F M A A C A",
"JOE M F A A C A",
"ED M F A D D A",
"SALLY F M C D A B",
"MARGE F M A A C C"}

currentUser="BETTY"

sf=2

The first element of each member is their name, followed by their gender ("M" or "F") and followed by the gender they are interested in ("M" or "F" again). Finally comes the answers of the member to a maximum of 10 questions ("A", ..., "D").

The currentUser is the user asking for its best matches among all the members, and the sf parameter is explained below.

The method returns a String[] consisting of members' names who have at least sf identical answers to currentUser and are of the requested gender.  The names should be returned in order from most identical answers to least.  If two members have the same number of identical answers as the currentUser, the names should be returned in the same relative order they were inputted.

So how would you code this? In Java, the first thought is to create a class named Member containing the name, the gender, the requested gender and the answers of each member so that it becomes handy to filter and order them out.

 private class Member {
  public String name;
  
  public String gender;
  
  public String requestedGender;
  
  public String[] answers;
  
  public int matchingScore;
  
  protected Member(String str) {
   String[] split = str.split(" ");
   this.name = split[0];
   this.gender = split[1];
   this.requestedGender = split[2];
   
   if (str.length() > 3) {
    answers = new String[split.length-3];
    for (int i=3; i<split.length; i++) {
     answers[i-3] = split[i];
    }
   } else {
    answers = new String[0];
   }
  }
 }

This class automatically fields its fields with the given string that is one given by the members parameter. Now, why would not the class sort itself out so you would only have to add the member to a Set and return it? Hum... the idea is nice but two problems arises:

  • How can you know the place of a member compared to another one if you do not have their matching score with the currentUser?
  • In the case two members have the same matching score, how do you know which one comes first (= which one comes first in the given members list)?

The answer I chose to this problem was to add two fields to the Member class: id and matchingScore. The tweak is then to calculate these values for each member before adding them to the returned Set.

The Member class thus becomes:

 private class Member implements Comparable<Member> {
  public int id;
  
  public String name;
  
  public String gender;
  
  public String requestedGender;
  
  public String[] answers;
  
  public int matchingScore;
  
  protected Member(int id, String str) {
   this.id = id;
   
   String[] split = str.split(" ");
   this.name = split[0];
   this.gender = split[1];
   this.requestedGender = split[2];
   
    if (str.length() > 3) {
    answers = new String[split.length-3];
    for (int i=3; i<split.length; i++) {
     answers[i-3] = split[i];
    }
   } else {
    answers = new String[0];
   }
  }
  
  public int compareTo(Member m) {
   if (this.matchingScore > m.matchingScore) {
    return -1;
   } else if (this.matchingScore < m.matchingScore) {
    return 1;
   } else {
    return new Integer(this.id).compareTo(new Integer(m.id));
   }
  }

And finally, the main method that processes the data (I will just copy/paste the whole class):

import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;

public class MatchMaker {
 public String[] getBestMatches(String[] members, String currentUser, int sf) {
  List<Member> list = new ArrayList<Member>();
  for (int i=0; i<members.length; i++) {
   list.add(new Member(i, members[i]));
  }
  
  Member currentMember = null;
  for (Member m : list) {
   if (m.name.equals(currentUser)) {
    currentMember = m;
    break;
   }
  }
  
  Set<Member> bestMatches = new TreeSet<Member>();
  for (Member m : list) {
   if (!m.name.equals(currentUser)) {
    if (m.gender.equals(currentMember.requestedGender)) {
     int identicalAnswers = 0;
     for (int i=0; i<currentMember.answers.length && i<m.answers.length; i++) {
      if (currentMember.answers[i].equals(m.answers[i])) {
       identicalAnswers++;
      }
     }
     
     if (identicalAnswers >= sf) {
      m.matchingScore = identicalAnswers;
      bestMatches.add(m);
     }
    }
   }
  }
  
  Object[] objectMatches = bestMatches.toArray();
  String[] toArray = new String[objectMatches.length];
  for (int i=0; i<objectMatches.length; i++) {
   toArray[i] = ((Member) objectMatches[i]).name;
  }
  
  return toArray;
 }
 
 private class Member implements Comparable<Member> {
  public int id;
  
  public String name;
  
  public String gender;
  
  public String requestedGender;
  
  public String[] answers;
  
  public int matchingScore;
  
  protected Member(int id, String str) {
   this.id = id;
   
   String[] split = str.split(" ");
   this.name = split[0];
   this.gender = split[1];
   this.requestedGender = split[2];
   
   if (str.length() > 3) {
    answers = new String[split.length-3];
    for (int i=3; i<split.length; i++) {
     answers[i-3] = split[i];
    }
   } else {
    answers = new String[0];
   }
  }
  
  public int compareTo(Member m) {
   if (this.matchingScore > m.matchingScore) {
    return -1;
   } else if (this.matchingScore < m.matchingScore) {
    return 1;
   } else {
    return new Integer(this.id).compareTo(new Integer(m.id));
   }
  }
 }
}

The steps of getBestMatches() are:

  • create one Member object per member and associate an unique and increasing id for each,
  • identify the object corresponding to the currentUser,
  • for every member, if the member is a potential match (= is not the currentUser and is of the requested gender): count the number of matching answers and if the score is > sf, associate it to the member and add it to the Set to return,
  • return the Set.

There is a better way to do this. Writing an ID and the score in each Member isn't the nice way to do it, since each Member created this way would only be valid for a particular currentUser. However, I think this is one of the best ways to solve this problem in Java. We could have even returned the Set instead of an array so that the calling method would also get the scores of each member for the given currentUser.

11 June 2009

Enhancing my own skills

Ok so it's time to start again this blog with less personal messages and more professional ones, since anyway I'm not that sort of guy that talks about its life to everybody (even if this first sentence IS, in fact, exactly what I hate to do).

I've recently subscribed to the TopCoder website (you'll find a link to it on your right), following an advice from a Google recruiter that called me two or three days ago. I won't give the details on the call itself, but you may already know that you need very good and precise skills to work at Google; some of which you will need to resolve problems found on TopCoder too.

So I started to look at the problem you can find there, and resolved already a few ones. I wanted to post this message in order to give my answer to one of the first Algorithm problem, named "Inv 2001 R1", difficulty 250. On TopCoder, each problem has a difficulty score, the higher the score, the harder the problem. I did not see any problem having a score above 1000, but I surely will encounter a few (or a lot) of them while resolving each problem one after the other.

Now let's see what the problem is all about. Ho, and by the way, I'm coding in Java.

 
TopCoder has decided to automate the process of assigning problem difficulty levels to problems.  TopCoder developers have concluded that problem difficulty is related only to the Average Word Length of Words in the problem statement:

If the Average Word Length is less than or equal to 3,  the problem is a 250 point problem.
If the Average Word Length is equal to 4 or 5, the problem is a 500 point problem.
If the Average Word Length is greater than or equal to 6, the problem is a 1000 point problem.

The problem here comes from the fact that a word is determined by the fact that:

  • it contains only letters,
  • it may end with a single period,
  • the period does not count in the number of letters used to calculate the Average Word Length (AWL).

It is quite easy, in Java, to do this problem, since we already have all the tools needed to split the given string, and check if an element is a word. Here is the code:

 
public class HowEasy {
 public int pointVal(String problemStatement) {
  if (problemStatement == null) {
   return 0;
  }
  
  String[] words = problemStatement.split(" ");
  long wordCount = 0;
  long lettersCount = 0;
  for (String word : words) {
   if (word.matches("[a-zA-Z]+.?")) {
    lettersCount += word.length();
    if (word.endsWith(".")) {lettersCount--;}
    wordCount++;
   }
  }
  
  if (wordCount > 0) {
   long avgWordLength = lettersCount/wordCount;
   
   if (avgWordLength < 4) {
    return 250;
   } else if (avgWordLength < 6) {
    return 500;
   } else {
    return 1000;
   }
  }
  
  return 250;
 }
} 

At first, I made a Map containing the values 250, 500, 1000 associated to the corresponding min and max AWL that were identifying the scores, but it was a bit too much for this simple code so I left them "as-is" in the code.

Now, as I said, using Java simplifies a lot the code. Here is how the given string is splitted:

String[] words = problemStatement.split(" ");

...and here is how each element is checked:

if (word.matches("[a-zA-Z]+.?")) {
   ...
}

The rest of the code is quite simple and does not need any explanation.