I thought I’d have a crack at learning Ruby. Strictly speaking this isn’t my first encounter, and so my title isn’t quite true, since a few years ago I did work through the “Rolling with Ruby on Rails” tutorial on the O’Reilly site. But as I was just copying and pasting code from a website, that doesn’t really count. (My opinion of Rails at the time was that barring some quibbles about the JSP-like HTML generation it seemed to be very good for the sort of thing it was aimed at, but that I was very happy not to be doing that sort of thing for a living.)
I’ve decided not to learn the language by buying a book or working through tutorials or suchlike. Instead, the plan is just to re-write tiny programs that I find interesting for any reason, relying on nothing much but whatever online reference documentation I can find together with my acquaintance with Perl and Python. This isn’t likely to be the best way of learning, but I want to see how far I can get with it.
The other day I attended a Stack Overflow DevDay where Michael Foord’s Python presentation walked through Peter Norvig’s 21-line spelling corrector. The basic idea is simple enough (once you know what it is) so that seemed like a good place to start. Here’s what I came up with:
#!/usr/bin/rubydef initcounts(filename)
@counts = {}
f = File.new(filename, ‘r’)
f.each_line do |line|
line.downcase().split(/[^a-z]+/).each do |word|
@counts[word] = 1 + (@counts[word] || 0)
end
end
f.close()
enddef correct(word)
candidates = []
if @counts[word] then
candidates.push(word)
endif candidates.length == 0 then
fix1(word) do |fixed|
if @counts[fixed] then
candidates.push(fixed)
end
end
endif candidates.length == 0 then
fix2(word) do |fixed|
if @counts[fixed] then
candidates.push(fixed)
end
end
endcandidates.sort! {|x,y| @counts[y] <=> @counts[x]}
return (candidates[0] || word)
enddef fix2(word)
fix1(word) do |fixed|
fix1(fixed) do |refixed|
yield refixed
end
end
end@letters = ‘abcdefghijklmnopqrstuvwxyz’.split(//)
def fix1(word)
pairs = (0 .. word.length).collect do |i|
[word[0,i], word[i,word.length]]
end# Correct additions
pairs.each do |a,b|
if b.length > 0
yield a + b.sub(/^./, ”)
end
end# Correct deletions
pairs.each do |a,b|
@letters.each do |letter|
yield a + letter + b
end
end# Correct substitutions
pairs.each do |a,b|
if b.length > 0
@letters.each do |letter|
yield a + letter + b.sub(/^./, ”)
end
end
end# Correct transpositions
pairs.each do |a,b|
if b.length >= 2
yield a + b.sub(/^(.)(.)/, ‘\2\1’)
end
end
endinitcounts(‘words.txt’)
ARGV.each do |word|
puts word + ‘ => ‘ + correct(word)
end
Sample run:
$ ./speller.rb teh reaclcitranx obelix xyz yessir vuinty teh => the reaclcitranx => recalcitrant obelix => belie xyz => by yessir => lesser vuinty => vanity
(My corpus consisted of an ispell word list and some Jane Austen novels taken from Project Gutenburg.)
Clearly this isn’t very elegant. Glaring faults: I make a meal out of reading words from a file, and also I think I’m missing a trick regarding the streams of suggested words coming out of fix1 and fix2 which forces me to replicate the logic of filtering out bogus words.
It’ll do.
Not a good start: I completely failed to understand how to get substrings using square brackets. (Although it looks like the mistake I made was harmless, in this case.) Serves me right for skimming the documentation so lightly – I knew I didn’t get what was going on, which is why I used regexp substitution to implement transposition.