pmoz.info strikes again

Stern123

Member
Joined
Jul 14, 2008
Messages
56
This is a follow up to this old thread:
http://www.resource-zone.com/forum/index.php?showtopic=47920

In summary, DMOZ relies on a tool from 1and1.com, a webhosting company that hosts many comment spammers, content scrapers, and hacking/vulnerability tools. For that reason, our website blocks requests from 1and1 servers, although we specifically left open a hole for pmoz.info bot that is supposed to "generally from IPs 74.208.25.165 or 216.15.74.85" according to http://pmoz.info/doc/botinfo.htm

Last month, pmoz.info bot came from 74.208.180.106 which is not mentioned on pmoz.info's bot info page, and was blocked. There were NO attempted followup visits from the semi-"official" IPs (as per above).

Someone then apparently attempted to manually verify the website from http://pmoz.info/2/see.php5 (this appears to be an open proxy, anyone who knows the URL can use it to bypass their IP address to browse the Internet for reasons of privacy or more dubious purposes). Unfortunately, it didn't work, either because this tool uses the same (blocked) IP address as pmoz.info bot or they were visiting the wrong page. Perhaps if this (real or imagined) dmoz editor actually visited our website with a legitimate broadband/dsl connection, they would see that the website is up and running just fine. Another way to verify websites is to go to Google and see if the website is listed there. Also, clicking the "Cached" link will show a snapshot of Google's last attempt to visit the page, and of course, there's Google Web Preview.

We have now blocked ALL attempts from 1and1.com, as we feel that the cons outweight the pros of pleasing pmoz.info bot. We recognize that this prevents us from being re-listed in dmoz in several weeks or months if we chose to submit a re-inclusion.

We look forward to the possibility of a dmoz having the funds and resources to use professional reliable tools for verifying DMOZ listings.
 

pvgool

kEditall/kCatmv
Curlie Meta
Joined
Oct 8, 2002
Messages
10,093
I have send a message to the editor who maintaines this tool. Maybe he can give a comment.
 

plantrob

Curlie Admin
Curlie Admin
Joined
Mar 29, 2004
Messages
153
You don't mention which website this pertains to. Was it removed from the directory as a result of these visits?
 

Stern123

Member
Joined
Jul 14, 2008
Messages
56
You don't mention which website this pertains to. Was it removed from the directory as a result of these visits?
Thank you so much for your reply. Yes, I should have clarified that the website was subsequently removed from the ODP. I didn't plan to list the website because I thought it was against the TOS, and wasn't anticipating any site-specific special treatment on this forum, but just providing general feedback. If it helps to troubleshoot, I could privately message the website?
 

plantrob

Curlie Admin
Curlie Admin
Joined
Mar 29, 2004
Messages
153
Sure. Please use the "send feedback" link from my public dmoz profile (http://www.dmoz.org/public/profile?editor=plantrob)
 

plantrob

Curlie Admin
Curlie Admin
Joined
Mar 29, 2004
Messages
153
Feedback sent. Hope it helps.
Thanks. I just attempted to visit the site in question from my regular home broadband connection, and got served a 403 page. It seems you have put a good many protections around the site, keeping not just spambots but also regular visitors out.
 

Stern123

Member
Joined
Jul 14, 2008
Messages
56
Well, that's embarassing :) I don't have any data to troubleshoot what happened in your case, but yes, we do keep a tiny minority of regular visitors out. For example, we block some chunks of IPs from Russia, China, Korea, etc. where excessive amount of spam, etc. outweighs the value of a few real visitors who never buy anything. We only ship to USA/Canada, so if a foreign ISP sends us one real visit a year and the rest is referer spam, hacking exploits, and scraping, we make a judgement call. At the end of the day, our job is generate sales for USA/Canada, and that's our priority when balancing security concerns. That said, if I had to hazard a guess, I'd say that our website is still accessible to 99.9x% of ISPs in the North America and Europe and 98% of the rest of the world.

As per the OP, our website can easily be verified in a few seconds by typing site:www.example.com in Google. The content of the website can be verified a second later by clicking the 'Cache' link in Google.
 

windharp

Meta/kMeta
Curlie Meta
Joined
Apr 30, 2002
Messages
9,204
Well, that's why I always advice not to block access to your website. I do appreciate that plantrob deals with this matter, but in my eyes it's not our duty to work around every single one of the millions of websites possible filtzes out there. In my eyes his time would be much better be spent elsewhere :)

Same situation as "My website is not accessible with the Opera browser, because I don't get any visitors using this browser". What a surprise, when it's not usable with this broser. And due to me reviewing websites with opera that might reduce the chance of being listed in the ODP a lot, if it's one of the categories I care about.
 

jimnoble

DMOZ Meta
Joined
Mar 26, 2002
Messages
18,915
Location
Southern England
I edit world wide from a UK ISP. When suggested websites needing evaluation aren't available, I simply move on. Life's too short to go jumping through hoops (or proxies) when there are hundreds of millions of other websites to examine.
 

Stern123

Member
Joined
Jul 14, 2008
Messages
56
I'm not here to argue with ODP volunteers. I'm just here to provide feedback from another perspective. Like any organization, you are welcome to be defensive or diplomatically pretend to care or seriously take it under consideration for the future.

The facts are:
1) our website is/was up and running
2) our website is/was relevant to the category
3) the pmoz tool did not successfully detect that
4) therefore, technically, pmoz failed to do the job that it was designed for
5) yes, we are responsible for the way we set up our firewalls
6) but pmoz bot came from an unadvertised, unverifiable IP address on a spam-friendly shared webhost (which initiated this whole situation)

Theoretically, what is the most authoritative objective method for verifying a website?
1) one guy in the Netherlands
2) one bot from an unadvertised unverifiable IP address on a spam-friendly host
3) a manual check at Google, Bing, etc.
4) a bot from a dedicated webspace with verifiable return DNS (as used by most or all professional corporate or open-source organization) and bot help page that is accurate and up-to-date.

I know that, outside the confines of the ODP, the answer is #4 or #3

Within the confines of the ODP, well, that's not my call.

Where there's a will, there's a way. If there's no will, there's no way. You've been clear today, as you have 3 years ago, that there's no will and no way. But I can stil try. Thank you for your consideration.
 

jimnoble

DMOZ Meta
Joined
Mar 26, 2002
Messages
18,915
Location
Southern England
We think it's important that editors visit websites using exactly the same methods that the ordinary surfer in the street would use - because we're providing a service for surfers, not website owners.

It's not unknown for websites to serve up different content when they see a dmoz or google referrer as I'm sure you're already aware.

If you the website owner chooses to sometimes obfuscate your website when it's viewed using conventional browsers, that's entirely your choice. Live with any consequences :)
 

pvgool

kEditall/kCatmv
Curlie Meta
Joined
Oct 8, 2002
Messages
10,093
> Theoretically, what is the most authoritative objective method for verifying a website?
> 1) one guy in the Netherlands
That one guy (plantrob) is not in the Netherlands but in USA.
If one of our editors can not access a website and it happens that this editor is reviewing the website the website will be deleted for not being accesible. Editors are located all over the world. A Chinese or Russian editor might review a website from USA or Australia.
> 2) one bot from an unadvertised unverifiable IP address on a spam-friendly host
That tool is only to help editors. It gives a signal "he editor, take a look at this website it might be unavailable"
> 3) a manual check at Google, Bing, etc.
Never. We make conclusions only on the website itself not about if and how it is listed in search engines. SE cach may help us finding were a website has moved to when it does become unavailable.
> 4) a bot from a dedicated webspace with verifiable return DNS (as used by most or all professional corporate or
> open-source organization) and bot help page that is accurate and up-to-date.
DMOZ is neither a professional corporate nor and open-source organization. And pmoz is nor part of DMOZ it is just a tool editors use to make their work easier.

> I know that, outside the confines of the ODP, the answer is #4 or #3
> Within the confines of the ODP, well, that's not my call.
Within DMOZ it is option 1. If an editor can not access the website chances are great it will be deleted.

> Where there's a will, there's a way. If there's no will, there's no way.
If you are willing to remove all those blocks you might not be in this problem.
 

Stern123

Member
Joined
Jul 14, 2008
Messages
56
It's not unknown for websites to serve up different content when they see a dmoz or google referrer as I'm sure you're already aware.
I don't know about cloaking for dmoz, but websites that cloak content from google almost always end up getting caught and banned, AFAIK. Webmasters tend to pander to Google because they're deathly afraid of being banned from the biggest search engine in the world. Google and Bing even have automated stealth checkers looking for cloaked content, and they have the funds and resources to design software that's (no offense intended) a whole other class above pmoz bot. For that reason, I'm not sure how relevant that concern is.

If you the website owner chooses to sometimes obfuscate your website when it's viewed using conventional browsers, that's entirely your choice.
Just to clarify, we don't sometimes "obfuscate" (by the definition of that word) our website to conventional browsers. I honestly don't know why plantrob was unable to access our website from his dsl connection and a conventional browser. We don't have any data to troubleshoot it. We can't remove the protection because we don't have the info to identify the problem and know which protection to remove.

Live with any consequences :)
We were willing to "jump through hoops" to please pmoz from the advertised IPs. So that was fine for a few years. But then the came the new development. I checked our log files and we received less than a dozen referrals from dmoz.org last year. Only half of those are from Canada/USA. So ya, I guess we can live with the consequences :)
 

plantrob

Curlie Admin
Curlie Admin
Joined
Mar 29, 2004
Messages
153
Listings are never removed just for failing to respond to pmoz.info. There are secondary checks that take place. If those secondary checks also fail, then yes, the site will be unlisted, for manual review by an editor, or checking Google's cache (which cannot be automated - Google doesn't like that). With four million listings, the number of sites going dead every week is simply too large to pass through a manual review prior to unlisting. There are many checks and balances to avoid unduly removing listings - unfortunately, your site is much more heavily fortified than most, which does mean some of those checks and balances also come up negative. By the way, your site was unreviewed for returning http 503 (usually a sign of being under construction). That's an unusual code to return when trying to keep out unwanted visitors, so I have to assume something else was going on with the site at the time.
The system isn't perfect, but it works very well for the vast majority of cases. We're always trying to improve, so your constructive comments are welcome.
 
This site has been archived and is no longer accepting new content.
Top