Adding XPath to TOSBack Rules

I quickly found out that I would have some problems with the old TOSBack rules. Some policies were showing up as being modified daily because the app was downloading the full page instead of just the policy on the page. The XML rules that TOSBack uses to know which policies to download only give us the URL and the policy name. In other words, any site with changing headers, related articles, local time, current temperature in Nebraska, (or anything that might change!) will show up as modified in the app even if nothing about the policy itself changed.

So to get just the information I want, I added something to the script that would get XPath info from the rule file and use that to extract just the policy from the page! Now our rule files are beginning to look something like this:

<docname name="Privacy Policy">
  <url name="" xpath="//div[@id='terms']">
    <norecurse name="arbitrary"/>

Now, I’ll see the policy appear as modified only if the policy changes!

