rss
twitter
    Twitter feed not available

Many cfqueries vs 1 Uber-query...which is faster?

I am working on a project where we need to import data from a file into the database.  This data will either be inserted as a new row, or used to update a row if if a row contains a matching column.  Sounds easy, right? Well, one problem is that each file imported can have 10's of thousands of rows. 

The original code actually called several stored procedures for each row.  As you might expect, this caused both CF and the SQK server to grind to a halt.  Looking at the stored procedures, I knew we could get more performance if I could take the logic of all the required SPs and combine them.  I was able to rip out the logic and put it into 1 query.  Now, my next issue was, would it be faster to loop over each row in the file and do a <cfquery> or, create the query inside of a <cfsavecontent> and then use that inside of one <cfquery> tag.

Each presented potential performance boosts and drawbacks.  Using multiple queries, I could use <cfqueryparam> which could speed up the queries, but there would be thousands of calls to the DB.  Using 1 Uber-query, I would only need to make 1 call to the database, but I would lose some performance due to the lack of <cfqueryparam>.

So, I did some tests.  I imported 1000 rows of data at a time using each of the above mentioned methods.  I made sure to test each against an identical DB, initially empty (to test inserts) and then with some data already in the DB (to test inserts and updates).

The winner?  Looping over each row, and running a <cfquery> for each row.  To be honest, I really wasn't sure which one would be faster, but what surprised me was how much faster multiple <cfquery> calls was.  Running multiple <cfquery> calls was 2-3 times faster than running  one big Uber-query, depending on the dataset.  If it wasn't for time constraints in getting this project done, I would run another test, ripping out the <cfueryparam> tags from the multiple-query method and see if that is the major difference between the two.  I might revist that if i get a chance.

Related Blog Entries

14 comments

(Comment Moderation is enabled. Your comment will not appear until approved.)
Qasim Rasheed said...
I have been using JDBC bulk insert/update funtionality to load data into an Oracle database and it does make a huge difference.

Thanks,
Ben Nadel said...
That is good to know (about the multiple vs. single query). Maybe it has something to do with execution plans and optimization. With a single query per row, it's probably much faster to parse. Who knows... anyway, very cool test.
Doug Boude said...
Hi there. I've had to deal with similar situations in the past, and although sometimes the results were better going with cfquery, it wasn't always the case. I think that every situation will have to be evaluated the same way you did it to really know for sure.
Also, here's something you could consider trying out that I found sped things up ENORMOUSLY, even over cfquery: asynchronous gateways. There's a decent article over on HouseofFusion's website about it (http://www.fusionauthority.com/Techniques/4609-The...). Oh, I just also happened to write it. :P But it is a good idea to explore, ESPECIALLY for situations like the one you described.
Doug out.
Scott Stroz said...
Doug - I have mentioned the asynchronous gateway, but the client is refusing to listen to my suggestion.
Doug Boude said...
wow, kinda sad seeing how much performance increase could be had. Oh well then. Even so though, if you have Enterprise MX around, I highly suggest experimenting with it yourself if for no other reason than to have that experience under your belt and have it ready for the next similar situation.
Scott Stroz said...
I use the aysnchronous gateway all the time. Matter of fact the process that indexes all my entries is run on an asynchronous gateway.
Tony Petruzzi said...
I would love to see the CF code and the stored procedure. I have done this before and using bulk insert from a stored procedure, temp tables and some optimized update and insert statements, I can insert or update 200,000 rows in 18 seconds.

I can bet that you have quite a few indexes on the table that you are updating / inserting the data into. Remember that the more indexes that you have on a table, the slower your actions will be because the database must update the statistics. This can be increase by using CHECK CONSTRAINTS and creating VIEW for inserts and updates.

The bottom line is that when you make statements like this and there are people out there that will take this as the word of GOD, you need to provide a Database Schema and the CF code used so that someone else can test it properly. Too many times (especially with database stuff) I have seen people claim that method X is faster than method Y, only to see that there is an underline factor in play.
Andy J said...
I had to run a SQL replace/update on a table using 200,000 records from a series of large XML files recently. I was actually finding this being way to process heavy, especially as CF doesnt like processing large XML files at all. In the end I just used CF to write me a big .sql file with all the replace/updates statements, and then from the cmd line run the script for MySQL.
Scott Stroz said...
Tony - Another issue I am running into is that there are several insert/update triggers on the table that is being written to. Reworking the less-than-optimal stored procedures increased performance, but the queries are still dog slow. Today we are running tests to see how much performance we can earn by turuning off the triggers during a bulk insert.
Tony Petruzzi said...
@Scott

If you use the BULK INSERT command, by default it will not fire insert trigger unless you specify the FIRE_TRIGGERS parameters.

@Andy

Assuming you are using SQL 2000. An easier way would be to use either DTS or a stored procedure to BULK INSERT the data into a temp table. Then you can use a joined update or select insert statement to load the data:

[Assuming you loaded your data into a temp table called #temp (by the way table variables could work)]

-- updating existing records
UPDATE mytable
set col1 = t.col1, col2 = t.col2
FROM #temp t INNER JOIN mytable m
ON t.pk = m.pk

-- insert new records
INSERT INTO mytable
(col1, col2)
SELECT col1, col2
FROM #temp
WHERE pk NOT IN(SELECT pk FROM mytable)

By the way Andy, you might want to look at using SQL 2000 built in XML support.

Once the hurricane is gone, I need to do a blog post about this issue since I see it coming up on some blogs latly
Scott Stroz said...
Tony - Thanx for the great info. Unfortunately, we cannot just INSERT every row. Some rows will be inserted, some will be used to UPDATE. And, one requirement is that when the data is imported, the user can specify which columns will be used when updating rows that already exist.
Scott Stroz said...
Tony - Is there a way to do a BULK INSERT with a remote file?
Tony Petruzzi said...
@scott

Send an email to my email address and I will send you example code on how to pull a file and bulk insert from a share on a remote server. You can also do it with DTS, but I think a stored procedure is easier.

Plus if you want I will respond with my phone number so you can call me and I can walk you through some stuff if needed.

We have a hurricane coming down here, so I need something to do. :P
Tony Petruzzi said...
@scott

your blog comment emails are bouncing:

Reason: Remote host said: 550
No such user here