Hi People!
I just noticed something unholy with the import system
It's noticable only when importing very large tables. In my case I am importing a vb2 forum and the post table has 5.5 million posts.
anyway, the mysql query being used uses a "LIMIT $start, $perpage"
The reality is that when using just a LIMIT $start,offset, mysql actually selects ALL rows, then displays just the ones described in the LIMIT condition.
IN my case it was selecting everyithing from the post table - all 5.5 million records and then just returning the 3000 that I set 'per page'
If you don't beleive me, just try the explain select * from post LIMIT $start,$offset and see for yourself
Actually the same problem is mentioned on mysql site in the comments section here: http://dev.mysql.com/doc/refman/5.0/...imization.html
THe much better way (I actually changed it in 001.php file in systems/vb2/
is to use the "where" clause like where postid>$start limit $perpage
After I made that change the importer is about 50 faster for me.
This is a very important issue because in most cases will go unnoticed especially if table has fewer than a million records.
Also, if gets even more interesting: when you first start the import process, it will go pretty fast, then you may noticed that it is slowing down as it imports more and more posts. This is because of the way mysql does the SELECT with LIMIT clause. For example if you have 5000000 rows in post table, at first your query will look like
SELECT * from post LIMIT 10000,3000
In this case mysql will start selecting ALL records but will stop as soon as it gets the required rows (required by the LIMIT clause), so in this case it will stop the full row scan as soon as it gets the posts from 10000 to 13000 (after only 13000 rows)
but as your importer progresses, and the number or rows Mysql has to scan to satisfy the select will grow. So a statement like SELECT * from post LIMIT 1000000,3000 will result in a scan of one million 3 thousand rows before it gets all the required rows.
You can read carefully that info on mysql website for the description of this behaviour.
This is a MySQL way of doing things. Pretty dumb, but that's the way it is - it does not use index when using only LIMIT clause. That's why you need the add the "where" clause.
and you can't even fool the mysql by adding the 'use index (PRIMARY)' - it will still not use index!
I just noticed something unholy with the import system
It's noticable only when importing very large tables. In my case I am importing a vb2 forum and the post table has 5.5 million posts.
anyway, the mysql query being used uses a "LIMIT $start, $perpage"
The reality is that when using just a LIMIT $start,offset, mysql actually selects ALL rows, then displays just the ones described in the LIMIT condition.
IN my case it was selecting everyithing from the post table - all 5.5 million records and then just returning the 3000 that I set 'per page'
If you don't beleive me, just try the explain select * from post LIMIT $start,$offset and see for yourself
Actually the same problem is mentioned on mysql site in the comments section here: http://dev.mysql.com/doc/refman/5.0/...imization.html
THe much better way (I actually changed it in 001.php file in systems/vb2/
is to use the "where" clause like where postid>$start limit $perpage
After I made that change the importer is about 50 faster for me.
This is a very important issue because in most cases will go unnoticed especially if table has fewer than a million records.
Also, if gets even more interesting: when you first start the import process, it will go pretty fast, then you may noticed that it is slowing down as it imports more and more posts. This is because of the way mysql does the SELECT with LIMIT clause. For example if you have 5000000 rows in post table, at first your query will look like
SELECT * from post LIMIT 10000,3000
In this case mysql will start selecting ALL records but will stop as soon as it gets the required rows (required by the LIMIT clause), so in this case it will stop the full row scan as soon as it gets the posts from 10000 to 13000 (after only 13000 rows)
but as your importer progresses, and the number or rows Mysql has to scan to satisfy the select will grow. So a statement like SELECT * from post LIMIT 1000000,3000 will result in a scan of one million 3 thousand rows before it gets all the required rows.
You can read carefully that info on mysql website for the description of this behaviour.
This is a MySQL way of doing things. Pretty dumb, but that's the way it is - it does not use index when using only LIMIT clause. That's why you need the add the "where" clause.
and you can't even fool the mysql by adding the 'use index (PRIMARY)' - it will still not use index!
Comment