I needed a way to scan the crawl data programmatically and determine if there were blank policies that I didn’t know about. With around 1000 rules in the TOSBack app, I need some ways to double check the data that comes back from the web scraping. I decided to set up a class method that would log any files that I determined to be too small.
Here’s how it’s used!
If I run the script with the “-empty” argument:
elsif ARGV[0] == "-empty" TOSBackApp.find_empty_crawls($results_path,512)
It calls this method:
def self.find_empty_crawls(path=$results_path, byte_limit)
Dir.glob("#{path}*") do |filename| # each dir in crawl
next if filename == "." || filename == ".."
if File.directory?(filename)
files = Dir.glob("#{filename}/*.txt")
if files.length < 1
TOSBackSite.log_stuff("#{filename} is an empty directory.",$empty_log)
elsif files.length >= 1
files.each do |file|
TOSBackSite.log_stuff("#{file} is below #{byte_limit} bytes.",$empty_log) if (File.size(file) < byte_limit)
end # files.each
end # files.length < 1
end # if File.directory?(filename)
end # Dir.glob(path)
end # find_empty_crawls
Full code is here :)