My first Ruby program

Feeds:: Posts; Comments

My first Ruby program

01/11/2009 by rdn32

I thought I’d have a crack at learning Ruby. Strictly speaking this isn’t my first encounter, and so my title isn’t quite true, since a few years ago I did work through the “Rolling with Ruby on Rails” tutorial on the O’Reilly site. But as I was just copying and pasting code from a website, that doesn’t really count. (My opinion of Rails at the time was that barring some quibbles about the JSP-like HTML generation it seemed to be very good for the sort of thing it was aimed at, but that I was very happy not to be doing that sort of thing for a living.)

I’ve decided not to learn the language by buying a book or working through tutorials or suchlike. Instead, the plan is just to re-write tiny programs that I find interesting for any reason, relying on nothing much but whatever online reference documentation I can find together with my acquaintance with Perl and Python. This isn’t likely to be the best way of learning, but I want to see how far I can get with it.

The other day I attended a Stack Overflow DevDay where Michael Foord’s Python presentation walked through Peter Norvig’s 21-line spelling corrector. The basic idea is simple enough (once you know what it is) so that seemed like a good place to start. Here’s what I came up with:



#!/usr/bin/ruby
def initcounts(filename)

    @counts = {}

    f = File.new(filename, ‘r’)

    f.each_line do |line|

        line.downcase().split(/[^a-z]+/).each do |word|

            @counts[word] = 1 + (@counts[word] || 0)

        end

    end

    f.close()

end
def correct(word)

    candidates = []

    if @counts[word] then

        candidates.push(word)

    end
    if candidates.length == 0 then

        fix1(word) do |fixed|

            if @counts[fixed] then

                candidates.push(fixed)

            end

        end

    end
    if candidates.length == 0 then

        fix2(word) do |fixed|

            if @counts[fixed] then

                candidates.push(fixed)

            end

        end

    end
    candidates.sort! {|x,y| @counts[y] <=> @counts[x]}
    return (candidates[0] || word)

end
def fix2(word)

    fix1(word) do |fixed|

        fix1(fixed) do |refixed|

            yield refixed

        end

    end

end
@letters = ‘abcdefghijklmnopqrstuvwxyz’.split(//)
def fix1(word)

    pairs = (0 .. word.length).collect do |i|

        [word[0,i], word[i,word.length]]

    end
    # Correct additions

    pairs.each do |a,b|

        if b.length > 0

            yield a + b.sub(/^./, ”)

        end

    end
    # Correct deletions

    pairs.each do |a,b|

        @letters.each do |letter|

            yield a + letter + b

        end

    end
    # Correct substitutions

    pairs.each do |a,b|

        if b.length > 0

            @letters.each do |letter|

                yield a + letter + b.sub(/^./, ”)

            end

        end

    end
    # Correct transpositions

    pairs.each do |a,b|

        if b.length >= 2

            yield a + b.sub(/^(.)(.)/, ‘\2\1’)

        end

    end

end
initcounts(‘words.txt’)

ARGV.each do |word|

    puts word + ‘ => ‘ + correct(word)

end

Sample run:

$ ./speller.rb teh reaclcitranx obelix xyz yessir vuinty
teh => the
reaclcitranx => recalcitrant
obelix => belie
xyz => by
yessir => lesser
vuinty => vanity

(My corpus consisted of an ispell word list and some Jane Austen novels taken from Project Gutenburg.)

Clearly this isn’t very elegant. Glaring faults: I make a meal out of reading words from a file, and also I think I’m missing a trick regarding the streams of suggested words coming out of fix1 and fix2 which forces me to replicate the logic of filtering out bogus words.

It’ll do.

Posted in programming | Tagged algorithm, Michael Foord, Peter Norvig, programming, Ruby | 1 Comment

One Response

on 02/11/2009 at 11:37 pm | Reply rdn32

Not a good start: I completely failed to understand how to get substrings using square brackets. (Although it looks like the mistake I made was harmless, in this case.) Serves me right for skimming the documentation so lightly – I knew I didn’t get what was going on, which is why I used regexp substitution to implement transposition.

Comments RSS

rdn32