Shared Examples for Testing Session Links

When I was adding tests for user authentication, I noticed an opportunity to DRY up the code a little by moving some duplicate tests into a shared example group. The tests expect to find different links in the header depending on the whether or not a user was signed in:

#partials_spec.rb
shared_examples "it has signed in header links" do
  it { should have_link('Sign out', href: signout_path) }
  it { should have_link('Subscriptions', href: user_path(user_id))}
  it { should have_link('Account', href: edit_user_path(user_id))}
  it { should_not have_link('Sign in', href: signin_path) }
end

shared_examples "it has signed out header links" do
  it { should_not have_link('Sign out', href: signout_path) }
  it { should have_link('Sign in', href: signin_path) }
end

By moving them into a separate file, the specs will be more adaptable when we need to change the links (and consequently, where they are referenced in the tests).

A Minor Oversight (and the easy fix)

Running “rspec” with the new shared example group in place caused RSpec to show this warning:

jimm$ rspec
WARNING: Shared example group 'it has signed in header links' has been previously defined at:
  .../tosback3/spec/support/partials_spec.rb:1
...and you are now defining it at:
  .../tosback3/spec/support/partials_spec.rb:1
The new definition will overwrite the original one.

WARNING: Shared example group 'it has signed out header links' has been previously defined at:
  .../tosback3/spec/support/partials_spec.rb:8
...and you are now defining it at:
  .../tosback3/spec/support/partials_spec.rb:8
The new definition will overwrite the original one.

I knew that spec_helper.rb was loading everything in the “support” directory, but I couldn’t explain why the file was being loaded twice when another file in the same directory was only being loaded once. After playing around with it, I finally realized that the name of the file is important too. By default, RSpec will load files that end in “_spec.rb”. The warnings disappeared when I changed the filename to “partials_shared.rb” and ran rspec again.

Adding Dynamic Versions to the Policy Factory

FactoryGirl provides a lot of flexibility when it comes to creating associations, so following their example, TOSBack is using an after(:create) callback to produce some content in the tests that appears to be changing over time:

factory :policy_with_sites_and_versions do
  ignore do
    sites_count 5
    versions_count 5
  end

  after(:create) do |policy, eval|
    eval.sites_count.times { policy.sites << FactoryGirl.create(:site) }
    eval.versions_count.times do |n| 
      policy.versions << FactoryGirl.create(:version, policy: policy, previous_crawl: policy.detail[0..-(n+2)], created_at: (n+1).days.ago)

      # i.e. older the created_at date in the db, the shorter the version is. 
      # e.g. two days old = two characters sliced off.
      # It appears in tests like one character is added per day
    end
  end
end #with_sites

Then, it’s as simple as creating the factory with a “versions_count” attribute:

describe "changes_from_previous()" do
  # policy model has a callback that creates a version 'current version' so
  # @policy really has 4 versions below even though versions_count is 3
  # see factories/policies.rb for details on differences between versions
  let(:vcount) { 3 }

  before(:each) { @policy = FactoryGirl.create(:policy_with_sites_and_versions, sites_count: 1, versions_count: vcount) }

You can see the entire factory file for policies on github.

Verifying the generated test data

I added the following “test” to my version_spec.rb file right after the creation of the factory to make sure everything was working as expected:

it { @policy.versions.each {|v| puts "#{v.created_at}\t#{v.previous_crawl}\n\n"} }

Then, running rspec prints the state of the versions at that point during the tests! (edited to fit)

2013-03-13 21:36:05 UTC	Current Version

2013-03-12 21:36:05 UTC	 <p>500px is founded on the principle of helping ...
    We know that you care about how your personal information is used and...
    By visiting the 500px website, you are accepting the practices outlin...
    <p>This Privacy Policy covers 500px's treatment of personal informati...
    This policy does not apply to the practices of third parties that 500...
    <br> Information Collected by 500px <p>We only collect personal infor...
    This information allows us to provide you with a customized and effic...
    We collect the following types of information from our 500px users:<b...
    <br>

2013-03-11 21:36:05 UTC	 <p>500px is founded on the principle of helping ...
    We know that you care about how your personal information is used and...
    By visiting the 500px website, you are accepting the practices outlin...
    <p>This Privacy Policy covers 500px's treatment of personal informati...
    This policy does not apply to the practices of third parties that 500...
    <br> Information Collected by 500px <p>We only collect personal infor...
    This information allows us to provide you with a customized and effic...
    We collect the following types of information from our 500px users:<b...
    <br

2013-03-10 21:36:05 UTC	 <p>500px is founded on the principle of helping ...
    We know that you care about how your personal information is used and...
    By visiting the 500px website, you are accepting the practices outlin...
    <p>This Privacy Policy covers 500px's treatment of personal informati...
    This policy does not apply to the practices of third parties that 500...
    <br> Information Collected by 500px <p>We only collect personal infor...
    This information allows us to provide you with a customized and effic...
    We collect the following types of information from our 500px users:<b...
    <b

…and it passes (with a cryptic name) :P

example at ./spec/models/version_spec.rb:40

Notice that it loses a character for each day older it is in the database. This gives us dynamic, predictable policy versions to use when testing!

Screenshot Tour of New TOSBack Features

TOSBack has seen several updates lately, and I want to present a glimpse of what has been changing! But instead of having you set it all up, here are some screenshots from my local environment!

Sessions

TOSBack has some basic authentication and authorization now. In the header is a link to sign in:

Header image

Trying to visit a protected page without a session will redirect you to sign in:

Sign in Page

And of course, the links in the header change based on the current user:

Sign out header links

Additional auth features that need to be finished

  • Mozilla persona authentication.
  • Admin pages for managing policies.

Views for diffs!

There are some initial views for diffs between policy versions! Originally, we were using the Differ gem, but Diffy has great HTML formatting by default. The wonderful filler text is from hipsteripsum.

Here’s an example showing how Differ handles line by line diffs:

Diffing with Differ

Diffy will automatically create a list from the lines and highlight the exact changes. Here’s a screenshot of it that shows the navigation between changes too:

Diffing with Diffy

Versioning features that need to be completed

  • View changes between any available versions instead of only adjacent versions.
  • Ajax, pjax, or turbolinks for speed :)

What’s next?

It’s nice to start seeing some foundation for the application, but we still have plenty of work ahead! We need an area for managing/approving policies, and most importantly, we need to get some real policies in the database!

If you want to help us refine TOSBack or start adding some features, fork us on github!

Version Callbacks for ToSBack Policies

Our ToSBack policies now have some automatic versioning when the “detail” attribute is changed!

Here’s an example policy in my development environment. Its current version is stored as an attribute in the policy model (detail), but it’s also represented in the versions model:

1.9.3-p327 :001 > pol = Policy.first

1.9.3-p327 :009 >   pol.detail
 => " <p>500px is founded on the principle of helping people discover new photos and photographers..."

1.9.3-p327 :010 > pol.versions.each {|v| puts "#{v.created_at} - #{v.previous_crawl}"}
2013-01-29 07:11:20 UTC - Current Version

I implemented some tests and callback methods today that make tracking changes easier:

1.9.3-p327 :011 > pol.update_attributes(detail:"new crawl 1")
...
=> true
1.9.3-p327 :012 > pol.update_attributes(detail:"new crawl 2")
...
=> true 
1.9.3-p327 :013 > pol.update_attributes(detail:"new crawl 3")
...
=> true 

1.9.3-p327 :014 > pol.versions.each {|v| puts "#{v.created_at} - #{v.previous_crawl}"}
2013-01-29 07:11:20 UTC -  <p>500px is founded on the principle of helping people discover new photos and photographers...
2013-02-01 22:36:36 UTC - new crawl 1
2013-02-01 22:36:46 UTC - new crawl 2
2013-02-01 22:36:49 UTC - Current Version

1.9.3-p327 :015 > pol.detail
=> "new crawl 3"

Each version is stored in the versions table with it’s creation date, and all we did was update the policy! You can see all the code on Github!

Help Us Develop the New ToSBack!

We could use your help building the new Rails version of ToSBack! The hackathon page has a good overview of basic site functionality if you aren’t familiar, but this page should help you see how the components come together. Open your favorite text editor and start contributing!

What We Want to Build

Here’s a basic description of how it should come together:

  1. Policy information in the policies table that isn’t marked as “needs_revision” will be scheduled to scrape policy data from the original site and added to the “crawls” table.
  2. Users will submit new or modified policy information into a “pending_changes” model.
  3. Admins will approve the “pending_changes” and “crawls” (scrape data) before it is visible to most users.
  4. Approving crawls will replace the current data in the policies table and add a row to the versions table.
    • ToSBack will email users that have subscribed to the policy being changed.

And this how the models should be associated:

Model Associations for ToSBack

Would you like to contribute? Awesome! Here are some general guidelines to remember:

  • Write a failing test before implementing the new feature (Red-Green-Refactor).
  • Keep the controllers and views skinny.
  • ToSBack is using factories and RSpec instead of fixtures and Test::Unit.
  • Add the feature you’re working on to the issue tracker if it’s not there already!

Here are some tasks to get us started!

  • Annotating our models, tests, fixtures, and factories.
  • Use layouts and partials to begin building a DRY header and footer area.
  • Add a planned model or controller (with tests and basic validations) that hasn’t been implemented yet.
  • Modify sites#index to display recently changed policies first.
  • Update policy#show action (individual policy)
    • User can choose between available versions.
    • App should display the diff of the versions chosen.
  • Secure any location that may be vulnerable to XSS.

Not a Rails Developer?

Have Questions?

Join the #tosback channel on irc.oftc.net if you have any questions and we’ll be glad to help!

Troubleshooting ToSBack File Handling

For the past few days, ToSBack has been running so smoothly!

Every day, I scroll through the latest “Crawl” commit to see what’s new and to decide which rules need to be updated. And every day this week, I’ve been pleasantly surprised to find that not many files are being modified!

~# tail -f rubytosback/tosback2/logs/run.log
2012-12-05 01:14:30 -0600 - Script finished! Check errors.log for rules to fix :)
2012-12-06 01:06:02 -0600 - Beginning script!
2012-12-06 01:14:43 -0600 - Script finished! Check errors.log for rules to fix :)
2012-12-07 01:06:02 -0600 - Beginning script!
2012-12-07 01:23:40 -0600 - Script finished! Check errors.log for rules to fix :)
2012-12-08 01:06:02 -0600 - Beginning script!
2012-12-09 01:06:02 -0600 - Beginning script!
2012-12-10 01:06:02 -0600 - Beginning script!
2012-12-11 00:06:02 -0600 - Beginning script!

Oh, wait. This is a bad surprise.

In reality, ToSBack had been throwing an exception for the past few days. The script was stopping at a specific file.

tosback.rb:176:in `initialize': No such file or directory - ../crawl/godaddy.com/Trademark and/or Copyright Infringement Policy.txt (Errno::ENOENT)

In the Godaddy rule file, the document looked like this:

<docname name="Trademark and/or Copyright Infringement Policy">
  <url name="https://www.godaddy.com/agreements/showdoc.aspx?pageid=TRADMARK_COPY" xpath="//td[@class='bodyText']">
    <norecurse name="arbitrary"/>
  </url>
</docname>

And “tosback.rb:176″ is refering to line 176 in tosback.rb:

crawl_file = File.open(new_path,"w") # new file or overwrite old file

By putting a “/” (forward slash) in the docname, ToSBack is trying to open a file called “or Copyright Infringement Policy.txt” in a directory called “Trademark and/”. File.open() would usually create a new file if the file didn’t exist, but you can see that it won’t create new directories for you:

$ mkdir testdir
$ cd testdir
$ irb
1.9.3-p327 :001 > path = "new_file.txt"
 => "new_file.txt" 
1.9.3-p327 :002 > new_file = File.open(path,"w")
 => #<File:new_file.txt> 
1.9.3-p327 :003 > new_file.puts "all work and no play"
 => nil 
1.9.3-p327 :004 > new_file.close
 => nil 
1.9.3-p327 :005 > exit
$ ls
new_file.txt
$ tail new_file.txt 
all work and no play
$ irb
1.9.3-p327 :001 > path="new/file.txt"
 => "new/file.txt" 
1.9.3-p327 :003 > File.open(path,"w")
Errno::ENOENT: No such file or directory - new/file.txt
	from (irb):3:in `initialize'
	from (irb):3:in `open'
	from (irb):3
	from /Users/jimmy/.rvm/rubies/ruby-1.9.3-p327/bin/irb:16:in `<main>'
1.9.3-p327 :004 > exit
$ mkdir new
$ irb
1.9.3-p327 :001 > path = "new/file.txt"
 => "new/file.txt" 
1.9.3-p327 :002 > file = File.open(path,"w")
 => #<File:new/file.txt> 
1.9.3-p327 :003 > file.close
 => nil 
1.9.3-p327 :005 > exit
$ ls new
file.txt

So I removed the slash in the docname to resolve the issue.

Adding a Method to Check for Previous Crawl Data

I added a new feature to the script that will hopefully make the TOSBack crawls more reliable! The app has had some intermittent problems downloading pages; in some cases, the crawl data would come back blank even if the document had downloaded properly before. I decided to right a couple methods that would check for previous crawl data, and retry the scrape if there is an existing policy in the crawl folder.

Here’s what it looks like:

  def scrape(checkprev=true) #see below
    download_full_page()
    if @newdata
      apply_xpath()
      strip_tags()
      format_newdata()
    elsif (!@newdata &amp;&amp; (checkprev == true))
      check_prev()
    end
  end #scrape

  def check_prev
    prev = (File.exists?("#{$results_path}#{@site}/#{@name}.txt")) ? File.open("#{$results_path}#{@site}/#{@name}.txt") : nil
    unless prev == nil
      if File.size(prev) &gt; 32
        @has_prev = true
      end #if
    end #unless
    prev.close if prev
  end #check_prev

The “def scrape(checkprev=true)” sets a default value for the “checkprev” variable even if it’s not passed to .scrape(). I implemented it this way so I wouldn’t have to change the existing code, but I can still turn off checking for previous when I call the retry method.

  def retry_docs
    @sites.each do |site|
      site.docs.each do |doc|
        doc.scrape(false) if doc.has_prev == true
      end #@docs
    end #@sites
  end #retry_docs

Passing Arguments to TOSBack

Over time, I’ve added some features to TOSBack that you access by passing arguments to the script. The quickest way to test out new rules/XPath info is to run the script like this:

rubycode$ ruby tosback.rb ../rules/abercrombie.com.xml

Instead of writing the policy to file, it will just print it on the screen for verification. If everything looks good, just run:

rubycode$ ruby tosback.rb ../rules/abercrombie.com.xml -w

to write out the file and update the crawl folder. Since I think I’ve added XPath data to about 400 rule files and tested them all, this has saved me a lot of time.

Searching Crawl Data for Empty Files

I needed a way to scan the crawl data programmatically and determine if there were blank policies that I didn’t know about. With around 1000 rules in the TOSBack app, I need some ways to double check the data that comes back from the web scraping. I decided to set up a class method that would log any files that I determined to be too small.

Here’s how it’s used!

If I run the script with the “-empty” argument:

elsif ARGV[0] == "-empty"
  
  TOSBackApp.find_empty_crawls($results_path,512)

It calls this method:

  def self.find_empty_crawls(path=$results_path, byte_limit)
    Dir.glob("#{path}*") do |filename| # each dir in crawl
      next if filename == "." || filename == ".."

      if File.directory?(filename)
        files = Dir.glob("#{filename}/*.txt")
        if files.length < 1
          TOSBackSite.log_stuff("#{filename} is an empty directory.",$empty_log)
        elsif files.length >= 1
          files.each do |file|
            TOSBackSite.log_stuff("#{file} is below #{byte_limit} bytes.",$empty_log) if (File.size(file) < byte_limit)
          end # files.each
        end # files.length < 1
      end # if File.directory?(filename)
    end # Dir.glob(path)
  end # find_empty_crawls

Full code is here :)

Refactoring TOSBack

When it came time to add a new feature to TOSBack, I realized that it was going to be a lot more difficult than it should be. My script was a messy group of methods passing data all around instead of an OOP app with a DRY structure.

I refactored everything into a few nice classes, and now the code is much more organized.