Hello! It's me again. ;-)
Here is my review what you can encounter in <mediadate> tag in content.rdf.u8 ODP dump file.
As I understand, the desired content for that tag is date in the format "YYYY.MM.DD".
First of all is separators. The desired separator is dot but I have found many dates with '-', '/', ',', '_' and ' ' characters.
I grab with sed programm all of contents of <mediadate> tags from dump and use simple PHP function to check if a date is bad:
First, it matches a date against regular expression and later check if a date is valid.
Than I have sorted all bad dates and remove duplicates (many equal dates both bad and good can be found in the dump).
My task was to import all data correctly to MySQL database. So, by term "bad" I mean dates which cannot be recognized by MySQL (not in "YYYY.MM.DD" where separator characters can be any). And I want to store date in field of DATE type and not as text string. ;-) I list only bad dates below (without external pages itself) because in other case this message will be veeery looong.
So, there are completely bad dates which cannot be imported:
Also I have found dates in bad format ("DD.MM.YYYY", "YYYY.DD.MM", "DD.MM.YY") which cannot be imported directly without converting by hands:
And who knows how many dates stored in dump in "YYYY.DD.MM" format with day number less than 13 that they cannot be distinguished from "YYYY.MM.DD".
MySQL allows to store dates with day or day and month set to zero (I call that dates incomplete) to indicate that day or both day and month are not known. MySQL depents on settings will or will not consider as error inserting incomlete dates to tables. So, here is the list ("good" incompletes):
But there are many incomplete dates which cannot be recognized by MySQL because of missing day number or both day and month. Here is the list ("bad" incompletes):
Also I have seen many dates in the past (starting from 1795.12.31). I cannot check whether it is incorrect date or the date of some event in the past. I will not list them here.
Next, I have found dates in the future. It may be errors or dates of future events (but I don't know whether you store information about events or not). And here is the list (near future with incompletes):
Later I have found dates in far future. I'm not sure will I be alive at that times. Do you have a time machine? ;-) So, this is the list (far future with incomplete):
And the last, I have found some dates in "YYYYMMDD" format without separators. MySQL can handle them correctly but they are not in common format. So I list them here:
The total number of external pages with dates which MySQL cannot understand is 18070.
Thanks for your attention.
P.S. Maybe you need at least validate new values of the form? ;-)
Here is my review what you can encounter in <mediadate> tag in content.rdf.u8 ODP dump file.
As I understand, the desired content for that tag is date in the format "YYYY.MM.DD".
First of all is separators. The desired separator is dot but I have found many dates with '-', '/', ',', '_' and ' ' characters.
I grab with sed programm all of contents of <mediadate> tags from dump and use simple PHP function to check if a date is bad:
PHP:
function
is_good_date ($d)
{
if (!ereg ("^([0-9]{4})[-./,_ ]([0-9]{1,2})[-./,_ ]([0-9]{1,2})$", $d, $regs)
|| $regs[1] > 2006 || $regs[1] < 1000
|| !checkdate ($regs[2], $regs[3], $regs[1]))
return false;
return true;
}
Than I have sorted all bad dates and remove duplicates (many equal dates both bad and good can be found in the dump).
My task was to import all data correctly to MySQL database. So, by term "bad" I mean dates which cannot be recognized by MySQL (not in "YYYY.MM.DD" where separator characters can be any). And I want to store date in field of DATE type and not as text string. ;-) I list only bad dates below (without external pages itself) because in other case this message will be veeery looong.
So, there are completely bad dates which cannot be imported:
Code:
June, 1998; yyyy.mm.dd; 200.01.28 ; 200.05.01 ; 200.07.05 ; 1997.06.ne;
1999.02.29; 1999.07.99; 2001.00.01; 2001.02.ry; 2001.06.31; 2001.09.31;
2002/03/62; 2003.0.04 ; 19995.09.0; 20001.02.0; 200310.05
Also I have found dates in bad format ("DD.MM.YYYY", "YYYY.DD.MM", "DD.MM.YY") which cannot be imported directly without converting by hands:
Code:
27/05/1999; 2000.20.05; 02.05.2001; 04.02.2002; 2002.22.02; 23.03.03 ;
11/04/2003; 25.04.2003; 22/06/03 ; 24.08.2003; 2003.24.12; 2004.27.05;
4.6.2004 ; 2004.19.07; 08-12-04 ; 16/01/2005; 01/11/2005; 19/11/2005
MySQL allows to store dates with day or day and month set to zero (I call that dates incomplete) to indicate that day or both day and month are not known. MySQL depents on settings will or will not consider as error inserting incomlete dates to tables. So, here is the list ("good" incompletes):
Code:
1856.02.00; 1856.06.00; 1894.11.00; 1896.09.00; 1897.03.00; 1898.02.00;
1907.00.00; 1910.12.00; 1911.08.00; 1937.06.00; 1949.06.00; 1949.12.00;
1951.11.00; 1958.09.00; 1959.06.00; 1959.12.00; 1960.01.00; 1960.03.00;
1960.05.00; 1960.08.00; 1960.11.00; 1960.12.00; 1961.01.00; 1961.02.00;
1961.03.00; 1961.04.00; 1961.05.00; 1961.07.00; 1961.09.00; 1961.10.00;
1961.11.00; 1962.02.00; 1962.03.00; 1964.03.00; 1964.04.00; 1964.12.00;
1965.05.00; 1965.09.00; 1965.11.00; 1966.03.00; 1966.07.00; 1967.01.00;
1967.11.00; 1968.06.00; 1968.11.00; 1969.09.00; 1969.12.00; 1970.09.00;
1971.01.00; 1971.04.00; 1971.07.00; 1973.03.00; 1973.06.00; 1974.00.00;
1974.04.00; 1975.01.00; 1976.00.00; 1976.01.00; 1977.03.00; 1977.09.00;
1977.12.00; 1978.02.00; 1978.04.00; 1978.05.00; 1978.09.00; 1980.00.00;
1980.08.00; 1980.10.00; 1981.04.00; 1981.06.00; 1981.07.00; 1981.08.00;
1981.09.00; 1981.12.00; 1982.00.00; 1982.03.00; 1982.06.00; 1982.09.00;
1982.10.00; 1983.00.00; 1983.04.00; 1983.05.00; 1983.06.00; 1983.07.00;
1983.09.00; 1983.11.00; 1984.00.00; 1984.01.00; 1984.03.00; 1984.04.00;
1984.05.00; 1984.07.00; 1984.08.00; 1984.09.00; 1984.10.00; 1984.11.00;
1985.00.00; 1985.01.00; 1985.02.00; 1985.03.00; 1985.04.00; 1985.05.00;
1985.06.00; 1985.07.00; 1985.09.00; 1985.10.00; 1985.11.00; 1985.12.00;
1986.00.00; 1986.01.00; 1986.02.00; 1986.03.00; 1986.04.00; 1986.05.00;
1986.06.00; 1986.07.00; 1986.10.00; 1986.11.00; 1987.00.00; 1987.01.00;
1987.04.00; 1987.05.00; 1987.06.00; 1987.07.00; 1987.08.00; 1987.09.00;
1987.10.00; 1987.11.00; 1987.12.00; 1988.00.00; 1988.01.00; 1988.02.00;
1988.04.00; 1988.05.00; 1988.06.00; 1988.07.00; 1988.08.00; 1988.09.00;
1988.10.00; 1988.11.00; 1988.12.00; 1989.00.00; 1989.01.00; 1989.02.00;
1989.03.00; 1989.05.00; 1989.06.00; 1989.07.00; 1989.08.00; 1989.09.00;
1989.10.00; 1989.11.00; 1989.12.00; 1990.00.00; 1990.01.00; 1990.02.00;
1990.03.00; 1990.04.00; 1990.05.00; 1990.06.00; 1990.09.00; 1990.10.00;
1990.12.00; 1991.00.00; 1991.01.00; 1991.02.00; 1991.03.00; 1991.04.00;
1991.05.00; 1991.06.00; 1991.08.00; 1991.09.00; 1991.10.00; 1991.11.00;
1991.12.00; 1992.0.0 ; 1992.00.00; 1992.01.00; 1992.02.00; 1992.03.00;
1992.06.00; 1992.07.00; 1992.08.00; 1992.09.00; 1992.10.00; 1992.11.00;
1992.12.00; 1993.00.00; 1993.01.00; 1993.02.00; 1993.03.00; 1993.04.00;
1993.05.00; 1993.06.00; 1993.09.00; 1993.10.00; 1993.11.00; 1993.12.00
1994.00.00; 1994.01.00; 1994.03.00; 1994.05.00; 1994.06.00; 1994.07.00;
1994.08.00; 1994.09.00; 1994.10.00; 1994.11.00; 1994.12.00; 1995.00.00;
1995.01.00; 1995.02.00; 1995.03.00; 1995.04.00; 1995.05.00; 1995.06.00;
1995.07.00; 1995.08.00; 1995.09.00; 1995.10.00; 1995.11.00; 1995.12.00;
1996.00.00; 1996.01.00; 1996.02.00; 1996.03.00; 1996.04.00; 1996.05.00;
1996.06.00; 1996.07.00; 1996.08.00; 1996.09.00; 1996.10.00; 1996.11.00;
1996.12.00; 1997.00.00; 1997.01.00; 1997.02.00; 1997.03.00; 1997.04.00;
1997.05.00; 1997.06.00; 1997.07.00; 1997.08.00; 1997.09.00; 1997.10.0 ;
1997.10.00; 1997.11.00; 1997.12.00; 1998.00.00; 1998.01.00; 1998.02.00;
1998.03.00; 1998.04.00; 1998.05.00; 1998.06.00; 1998.07.00; 1998.08.00;
1998.09.00; 1998.10.00; 1998.11.00; 1998.12.00; 1999-09-00; 1999.00.00;
1999.00.04; 1999.01.00; 1999.02.00; 1999.03.00; 1999.04.00; 1999.05.00;
1999.06.00; 1999.07.00; 1999.08.00; 1999.09.00; 1999.10.00; 1999.11.00;
1999.12.00; 2000.00.00; 2000/09/00; 2000.01.00; 2000.02.00; 2000.03.00;
2000.04.00; 2000.05.00; 2000.06.00; 2000.07.00; 2000.08.00; 2000.09.00;
2000.10.00; 2000.11.00; 2000.12.00; 2001.00.00. 2001/07/00. 2001.01.00;
2001.02.00; 2001.03.00; 2001.04.00; 2001.05.00; 2001.06.00; 2001.07.00;
2001.08.00; 2001.09.00; 2001.10.00; 2001.11.00; 2001.12.00; 2002.00.00;
2002/00/00; 2002.01.00; 2002.02.00; 2002.03.00; 2002.04.00; 2002.05.00
2002.06.00; 2002.07.00; 2002.08.00; 2002.09.00; 2002.10.00; 2002.11.00;
2002.12.00; 2003.00.00; 2003.01.00; 2003.02.00; 2003.03.00; 2003.04.00;
2003.05.00; 2003.06.00; 2003.07.00; 2003.08.00; 2003.09.00; 2003.10.00;
2003.11.00; 2003.12.00; 2004.00.00; 2004.01.00; 2004.02.00; 2004.03.00;
2004.04.00; 2004.05.00; 2004.06.00; 2004.07.00; 2004.08.00; 2004.09.00;
2004.10.00; 2004.11.00; 2004.12.00; 2005-09-00; 2005.00.00; 2005.01.00;
2005.02.00; 2005.03.00; 2005.04.00; 2005.05.00; 2005.06.00; 2005.07.00;
2005.08.00; 2005.09.00; 2005.10.00; 2005.11.00; 2005.12.00; 2006.00.00;
2006.02.00
But there are many incomplete dates which cannot be recognized by MySQL because of missing day number or both day and month. Here is the list ("bad" incompletes):
Code:
1971-02 ; 1980.06 ; 1983.09. ; 1984.05...; 1987.09 ; 1987.12 ;
1988.01...; 1990.08...; 1991......; 1992.01...; 1992.06 ; 1993.07...;
1994......; 1994.05...; 1995 ; 1995.04 ; 1995..10.3; 1996 ;
1996......; 1995.05 ; 1995.10...; 1996.04 ; 1996.05 ; 1996.10 ;
1996.10...; 1996.11 ; 1997 ; 1997.07. ; 1997.08...; 1997.12 ;
1997.12...; 1998 ; 1998-01 ; 1998. ; 1998......; 1998.01 ;
1998.01..1; 1998.03 ; 1998.04. ; 1998.04...; 1998.05 ; 1998.06 ;
1998.10...; 1998.12 ; 1999 ; 1999. ; 1999.01...; 1999.02 ;
1999.03 ; 1999.04 ; 1999.07 ; 1999.07. ; 1999.07...; 1999.08 ;
1999.08...; 1999.09 ; 1999.10 ; 1999.12 ; 2000-06 ; 2000......;
2000.03 ; 2000.04 ; 2000.05 ; 2000.06...; 2000.08 ; 2000.10 ;
2000.10...; 2000.11 ; 2000.11...; 2000.12 ; 2001 ; 2001- ;
2001......; 2001.01...; 2001.02...; 2001.03 ; 2001.03. ; 2001.04 ;
2001.04...; 2001.04.??; 2001.05...; 2001.07...; 2001.08 ; 2001.08...;
2001.09...; 2001.10...; 2001.11 ; 2001.11...; 2001.12...; 2002......;
2002.01...; 2002.02...; 2002.06. ; 2002.06...; 2002.08 ; 2002.10 ;
2002.11 ; 2002.11...; 2002.12 ; 2002.12...; 2003 ; 2003-03 ;
2003. ; 2003......; 2003.01...; 2003..08.2; 2003.02 ; 2003.04 ;
2003.08 ; 2003.09...; 2004......; 2005 ; 2005.04
Also I have seen many dates in the past (starting from 1795.12.31). I cannot check whether it is incorrect date or the date of some event in the past. I will not list them here.
Next, I have found dates in the future. It may be errors or dates of future events (but I don't know whether you store information about events or not). And here is the list (near future with incompletes):
Code:
2006.05.00; 2006.07.00; 2006.08.00; 2007.00.00; 2007.01.01; 2007.01.02;
2007.04.11; 2007.05.01; 2007.05.28; 2007.06.15; 2007.06.30; 2007.07.05;
2007.07.07; 2007.07.25; 2007.07.28; 2007.08.31; 2007.09.22; 2007.12.30;
2008.06.01; 2008.08.29; 2008.11.00; 2009.00.00; 2009.08.01; 2010.09.01
Later I have found dates in far future. I'm not sure will I be alive at that times. Do you have a time machine? ;-) So, this is the list (far future with incomplete):
Code:
2201.11.30; 2204.01.16; 2991.00.00
And the last, I have found some dates in "YYYYMMDD" format without separators. MySQL can handle them correctly but they are not in common format. So I list them here:
Code:
19990121; 20010919; 20011201; 20030225; 20030227; 20050306; 20050325;
20050408; 20050508; 20050604; 20050912; 20051213
The total number of external pages with dates which MySQL cannot understand is 18070.
Thanks for your attention.
P.S. Maybe you need at least validate new values of the form? ;-)