There have been several bugs related to the search for some time, but what really hampered things was that the underlying directory data used a wide variety of character encodings, whilst the RDF file upon which search is built was supposed to be all in UTF-8.
The directory itself used ISO-8859-1 for English Language, and other Western European categories, and a whole variety of different encodings for the rest of the world. When the RDF was built either the data was stored there with a note about its encoding, or it was converted to UTF-8. Some categories didn't have a default encoding defined, and this made the data somewhat flaky for some non-English categories. The RDF file is built about once per week, taking several days to build as a background process, and then it is posted to
http://rdf.dmoz.org/rdf/ as 2 files (content and structure).
It has taken a good 6 months to convert all of the underlying directory data over to UTF-8, that is: every category path and name, and then every site URL, title and description within. Some could be done using automated scripts, and some had to be done by hand editing each entry (and we're talking about half a million items here). Some clever editors managed to build some tools which parsed the RDF and then built a screenful of HTML links that took you directly to an edit screen for the error found, with a note of what to look for. Over the last few months the errors have steadily decreased, and the last 4 or 4 RDF files had just a handful of errors (just 2 in the last one, and 1 in the one before that).
In parallel with that was a project to find out how encoding errors get into the directory data in the first place, and then ways to check the data and remove errors before the data is stored (hence the small amount of errors that crept in last week). We think that everything is now converted, and the checking functions are now robust.
The next RDF should have zero errors in it. Having achieved that, it will then be easier for the search to be revisited and its functionality improved, and some bugs to be ironed out; but search is not the highest priority job as far as editors are concerned (same too probably for staff, but I cannot speak for them or their priorites at all). However, whatever happens with search, it is likely to have a timescale of many months, not days or weeks too. The ODP consists of a complex set of software tools, and work goes on on all of them.