Re: Reorganization and why dmoz human editing fail
RE: I am currently tied up for the moment, but I am not shirking the thread. I will post a rather long paper on a set of possible alternatives to the one tree problem, with graphics as well as a list of possible redesigns. These are the types of requirements that are useful.
Magnolia >> The tree structure itself is fundamentally a problem, there's only some point to it if you believe that people use it to drill down. My last company in Holland was in a government database listing companies by type. To find our company you had to first go to one of the 10 top level company types. For us, who were internet database system architects - you had to start by looking under 'Transport'. Because we were under: Transport/Communication/Internet/Developers/ <<
So we really would like a data dimension. Check to see below if I have that right.
Llisa >>
The editors have quite a pile of feature requests in the internal bugs & features forum. Many of them would significantly enhance how we do our job, but aren't easy (or even possible, in some cases) to implement in the current system. I'm sure the editors would be the first to wish that we could review a site, input it's address (if relevant) for geographical placement, and check off a variety of boxes to say what kind of site it is (informational, commercial, etc.) and what topics it belongs to, and then have the software put the site in the appropriate categories. This, however, is an entirely different software package than what we currently have.
<<
I've had to solve that intersection for the EPA for a number of GIS datasets (riverreach, state, county, zip, watershed, polluting facility, discharge points, etc) for "assisted" location of different types of sites.
I'm thinking a part of the solution will use the overall current tree dmoz tree, with categories enriched by "attributes" instead of subcategories. An attribute is a data dimension that is an alternative to a category in that it describes a feature of the site, and is tunable to that category. This is especially useful for regionalization rather having a subtree of region off a category using @ links. Some dimensions are shared (like location, or engine part) as a common resource, which a category editor can apply to the category.
This is a revised structure of dmoz uses the tree-list paradigm for categories of type, but changes the concept of the data dimensions as (optional) attributes of a site listing, which can be set by the category editor. This reduces the reorganization and depth of the tree, allowing an attribute to be set for the site instead of requiring a subtree. The dimension is an optional alternate category (tree) index available to sites in the category, but is not a branch on the main tree.
Most dimensions would be simple grouping for replacing or simplifying @-link categories that are too complex or too deep to maintain easily. (Reduces the depth of the main tree also, a good thing.)
Some dimensions of the site information are so common that a utility could maintain important parts of these common xml ?? undecided ?? files from available geologic and geopolitical data. Possibly a second user utility would assist submitters to resolve location addresses to GIS points into the map for locating a site. (That way if a country divides, the sites can be re-updated from their physical location on earth, and re-categorized by the utility.)
Additionally submitters reduce the effort on editors to update their listed site location, if they move.
For example, if you are at 1600 Pennsylvania Ave, Washington DC you are in Washington, in the District of Columbia, in the USA, 30 miles from Dumfries, Virginia, and served by PEPCO electric.
I think that most category editors would add filters for shaping the dimension to allow submitted sites to enter appropriate location data for that category. (See the dmoz FAQ on regions as to why i recommend this). A store front business may enter a single address (resolved to a lat-long) or a electric company in a different category may be allowed to enter a service area.
A category may have number of data dimensions, for example boating categories may have location, boat type (e.g. sailboat, powerboat, canoe/kayak), language, and manufacturer. A subcat of boating like Associations may have a dimension of special Interest(e.g association types could be safety, manufacturer SIG , other SIGs) that only relate to this category.
- An additional advantage is the address to geo point (lat,long) is that distance and names can be used in end user queries. Many sites are geo sensitive, such as hospitals, stores, or government offices. A dimension may be enriched by a dimension editor who may add categories of facts to the dimension, or by a site editor adding a detail to a dimension for their category.
Finding marinas “Within 50 KM of Chicago”, “In Mersey”, or “On the Potomac River” (from the river reach file) is possible with this approach, because both the site and the location have geo data (lat longs). The nearest repair dock may not be in my state or country.
LLisa: Thanks for the numbers, I'll use them for the proposed RFC (request for comments) for a more powerful dmoz. Also, thanks for the idea of a tunable category 'bot. I'll think more about that.