Many cfqueries vs 1 Uber-query...which is faster?
I am working on a project where we need to import data from a file into the database. This data will either be inserted as a new row, or used to update a row if if a row contains a matching column. Sounds easy, right? Well, one problem is that each file imported can have 10's of thousands of rows.
The original code actually called several stored procedures for each row. As you might expect, this caused both CF and the SQK server to grind to a halt. Looking at the stored procedures, I knew we could get more performance if I could take the logic of all the required SPs and combine them. I was able to rip out the logic and put it into 1 query. Now, my next issue was, would it be faster to loop over each row in the file and do a <cfquery> or, create the query inside of a <cfsavecontent> and then use that inside of one <cfquery> tag.
Each presented potential performance boosts and drawbacks. Using multiple queries, I could use <cfqueryparam> which could speed up the queries, but there would be thousands of calls to the DB. Using 1 Uber-query, I would only need to make 1 call to the database, but I would lose some performance due to the lack of <cfqueryparam>.
So, I did some tests. I imported 1000 rows of data at a time using each of the above mentioned methods. I made sure to test each against an identical DB, initially empty (to test inserts) and then with some data already in the DB (to test inserts and updates).
The winner? Looping over each row, and running a <cfquery> for each row. To be honest, I really wasn't sure which one would be faster, but what surprised me was how much faster multiple <cfquery> calls was. Running multiple <cfquery> calls was 2-3 times faster than running one big Uber-query, depending on the dataset. If it wasn't for time constraints in getting this project done, I would run another test, ripping out the <cfueryparam> tags from the multiple-query method and see if that is the major difference between the two. I might revist that if i get a chance.
The original code actually called several stored procedures for each row. As you might expect, this caused both CF and the SQK server to grind to a halt. Looking at the stored procedures, I knew we could get more performance if I could take the logic of all the required SPs and combine them. I was able to rip out the logic and put it into 1 query. Now, my next issue was, would it be faster to loop over each row in the file and do a <cfquery> or, create the query inside of a <cfsavecontent> and then use that inside of one <cfquery> tag.
Each presented potential performance boosts and drawbacks. Using multiple queries, I could use <cfqueryparam> which could speed up the queries, but there would be thousands of calls to the DB. Using 1 Uber-query, I would only need to make 1 call to the database, but I would lose some performance due to the lack of <cfqueryparam>.
So, I did some tests. I imported 1000 rows of data at a time using each of the above mentioned methods. I made sure to test each against an identical DB, initially empty (to test inserts) and then with some data already in the DB (to test inserts and updates).
The winner? Looping over each row, and running a <cfquery> for each row. To be honest, I really wasn't sure which one would be faster, but what surprised me was how much faster multiple <cfquery> calls was. Running multiple <cfquery> calls was 2-3 times faster than running one big Uber-query, depending on the dataset. If it wasn't for time constraints in getting this project done, I would run another test, ripping out the <cfueryparam> tags from the multiple-query method and see if that is the major difference between the two. I might revist that if i get a chance.




Thanks,
Also, here's something you could consider trying out that I found sped things up ENORMOUSLY, even over cfquery: asynchronous gateways. There's a decent article over on HouseofFusion's website about it (http://www.fusionauthority.com/Techniques/4609-The...). Oh, I just also happened to write it. :P But it is a good idea to explore, ESPECIALLY for situations like the one you described.
Doug out.
I can bet that you have quite a few indexes on the table that you are updating / inserting the data into. Remember that the more indexes that you have on a table, the slower your actions will be because the database must update the statistics. This can be increase by using CHECK CONSTRAINTS and creating VIEW for inserts and updates.
The bottom line is that when you make statements like this and there are people out there that will take this as the word of GOD, you need to provide a Database Schema and the CF code used so that someone else can test it properly. Too many times (especially with database stuff) I have seen people claim that method X is faster than method Y, only to see that there is an underline factor in play.
If you use the BULK INSERT command, by default it will not fire insert trigger unless you specify the FIRE_TRIGGERS parameters.
@Andy
Assuming you are using SQL 2000. An easier way would be to use either DTS or a stored procedure to BULK INSERT the data into a temp table. Then you can use a joined update or select insert statement to load the data:
[Assuming you loaded your data into a temp table called #temp (by the way table variables could work)]
-- updating existing records
UPDATE mytable
set col1 = t.col1, col2 = t.col2
FROM #temp t INNER JOIN mytable m
ON t.pk = m.pk
-- insert new records
INSERT INTO mytable
(col1, col2)
SELECT col1, col2
FROM #temp
WHERE pk NOT IN(SELECT pk FROM mytable)
By the way Andy, you might want to look at using SQL 2000 built in XML support.
Once the hurricane is gone, I need to do a blog post about this issue since I see it coming up on some blogs latly
Send an email to my email address and I will send you example code on how to pull a file and bulk insert from a share on a remote server. You can also do it with DTS, but I think a stored procedure is easier.
Plus if you want I will respond with my phone number so you can call me and I can walk you through some stuff if needed.
We have a hurricane coming down here, so I need something to do. :P
your blog comment emails are bouncing:
Reason: Remote host said: 550
No such user here