Announcement

Collapse
No announcement yet.

Attachments in dB or file system, is there really a performance difference?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Forum] Attachments in dB or file system, is there really a performance difference?

    Im curious if someone with a large site and lots of attachments has actually tried if there is really a performance difference in storing attachments and images in the file system or in the database?

    I guess it all comes down to which storage, search and cashing is most efficient. I saw some MS SQL and IIS performace measurements where the result was that the difference was marginal. The SQL solution actually had a small performance benefit with average file sizes under 50K while the file system solution was slightly faster for larger files.

    A decade or more ago, the storage of binary database was very inefficient but this has changed. So I wonder if there still is a performance advantage with file system storage or if this is based on old knowledge. From an organisational and backup point of view I really prefer the database storage model. My own database is so far too small to really see any difference.

    So anyone really tried comparing?

  • #2
    I just made some measurements
    • Apache Jmeter application
    • Over the internet, not local network
    • On a live server with typically 500 visitors 5000 page views per day
    • Around 500 files in the file attachment area
    • Five pages with an average size of 50K content
      • The forum home
      • The CMS home
      • The Blog home
      • One page with 10 images
      • One section page with 20 article abstracts, each with an image. The slowest page
    • Two test cases
      • Tree users doing a total of 2-3 requests per second, total 500 page loads
      • Ten usesr doing a total of 6-7 request per secons, totally 500 page loads
    • Two storage models
      • File system storage
      • mySQL storage
    • The tests where redone a couple of times, with a few hundred page downloads between each test
    Result
    • Average response time 800mS with 2-3 requests per second
    • Average response time 1700mS with 6-7 requests per second
    • Hardly measurable performance difference between storage types (1% better with files on file system) for both loads
    • Much bigger difference between each download and load level than between storage types
    More tests
    • More tests where done ona a single page with 10 images only and the counclusion is the same; no measurable difference in performance between storage types.
    • Other tests done with requesting the full download every time, no cashing. Result; much slower download but hardly measurable dfference between storage models, with about 3% download time advantage for the file system storage type over dB storage.
    Last edited by janaf; Sat 5 Jun '10, 1:47pm.

    Comment


    • #3
      There should be a difference. Storing attachments in the database would have an effect on performance, compared to storing them in the file system (folder). Plus it will bloat the size of your database much faster as well.

      You need to think in terms of mass people calling attachments from the database, not just yourself. What would be the case if lets say, you had 100 people all calling attachments from the database at the same time?

      Comment


      • #4
        Originally posted by MRGTB View Post
        There should be a difference.
        That's what everybody keeps saying but my question is if it is really so.

        The simulation program I use, Jmeter, is designed specifically for load stress simulations and can simulates loads from a number of different users at the same time. I was using 10 users doing an average up to an average of 20 page downloads per second which is not that low, equivalent to around 70.000 pages per hour. I could of course set it to 100 user each doing a call every few seconds and see if it makes a difference. Will do that tomorrow. I could also set it up on 4-5 machines doing the same thing, but it is still a simulation.

        A deacde or more ago, image BLOBS where often stored as raw bitmaps internally i databases and converted on inport or export Very inefficient. But now they are usually stored in their native JPEG, tiff or similar format. Much more efficient. There has been a lot of development in database performance.

        So again, I wonder if others have really tested or if thisis more of an old truth that keeps being repeated, but is not really valid any more? Any published results, test of any kind?

        Comment


        • #5
          It's kind of difficult for most of us to test. What site owner that has a lot of attachments is going to go through the steps to put the attachments back into the database just to test this out? My site is running fine with them in the filesystem and I really have no reason to go move them back into the database just for a test.

          Please don't PM or VM me for support - I only help out in the threads.
          vBulletin Manual & vBulletin 4.0 Code Documentation (API)
          Want help modifying your vbulletin forum? Head on over to vbulletin.org
          If I post CSS and you don't know where it goes, throw it into the additional.css template.

          W3Schools <- awesome site for html/css help

          Comment


          • #6
            Lynne: this is precisely why I ask. I would not expect owners of large sites in operation to do this. I and probably others would like to know when setting up their sites, before they grow large and busy.

            The work time for moving files between dB and file system is marginal but I would not want to do it on a large site that is working. If it ain't broken don't fix it. What if things do go wrong.

            The default vBulletin setting is storage in the dB. IF there is no performance downside, I would keep it that way for backup, maintenance and scripting reasons. Others may of course still prefer file storage.

            If there are there no other comparisons done, I will do some more tests with different test tools and get back with whatever results I get.

            Comment


            • #7
              Lynne,

              It would be helpful if you can help us with the below points:

              1) What are the Pros and Cons of using the two storage options - Database and Filesystem.
              2) Going through couple of threads, I see that the database size increases significantly (or gets bloated) if attachments are stored in the database. Is this true ?
              3) When I take "Manage attachments" to upload a new attachment, I see that some "question marks" are seein instead of attachments, I rebuild the cache, it worked, but again after after a couple of days, I'm seeing questions marks. I'm storing the images in the database.

              Answers to these questions can help a lot of people who have just setup a VB forum, to decide on the option that is better in the long run.

              - Thanks
              Leo

              Comment


              • #8
                Slow drives, overburdened shared hosting and small attachments will probably have better results In the database.

                Fast drives, VPS/Dedicated, files over 1 MB will probably be better in the file system.

                I say probably because you would need to run thousands of page loads to notice significant differences in most cases.

                Really the options are there to allow users to use the best environment for their situation. IF you're on VPS or Dedicated then in the long run you will be better off storing all uploads in the filesystem. If your on a shared host with any company besides 1 & 1 then you're better off in the database for the time being. 1 & 1 users should always use the filesystem even if performance is worse because of their abnormal 100 Megabyte database size restrictions.
                Translations provided by Google.

                Wayne Luke
                The Rabid Badger - a vBulletin Cloud demonstration site.
                vBulletin 5 API

                Comment


                • #9
                  Originally posted by leo.prasanth View Post
                  1) What are the Pros and Cons of using the two storage options - Database and Filesystem
                  2) Going through couple of threads, I see that the database size increases significantly (or gets bloated) if attachments are stored in the database. Is this true ?
                  1) I can think of a few, not related to vB
                  • pro dB:
                    • easy to manage by SQL scripting
                    • easy to back up via database manager or scripting
                    • less stress on OS
                    • easier to check versions, duplicates, unused / unlinked attchments
                    • easier to fix if things go wrong?
                    • simpler rights management for web application
                  • pro FS:
                    • easy to manage manually, by OS scripting, by FTP, file manager or similar
                    • easy to back up via file system
                    • less stress on app (mySQL)
                    • file system management is more intuitive
                    • easier to fix if things go wrong?
                  2) If you add 1GB of attachments, the dB will grow by 1GB and you will recover about 1GB from the file system, so net, no difference in space needed, not taking minimum block sizes in cosideration. If you are on a hosted environment you can run dB maintenance via ACP but not for the file system.

                  3) Please post in another thread

                  I think the performance would be dependent on what type of site you have. If you have a large number of files that are accessed more or less randomly, (bad for caching) or if you have a low number of files that are accessed all the time (good for caching). But what I can see from my limited tests, is that the performace difference is very small or none. Definitly smaller that I had expected from all that has been said about the performance advantages of disk storage.

                  Comment


                  • #10
                    I can tell you from a practical stand point. I used to store files in DB, it was painfully slow. My site has about 12-15K unique users with about 120-150K page views and lots of downloads. It's basically impossible to have the attachments stored in DB with this kind of load. No need for benchmarks etc....As soon as I moved my attachments to the filesystem, the site started working normal again.
                    Speed up your member list page by at least 5x (4.x.x)

                    http://nicknameregister.com/files/20...0/ShyGuy82.jpg

                    Comment


                    • #11
                      Many hosts now limit db users. And site backups are hard enough to do monthly, so why would you want pics in the db and have to download them all daily? Unless you run a tiny forum, put them in the filesystem.

                      Comment


                      • #12
                        I think Wayne kind off really hit it actually, being the option depending on the outcome. I know off some forums that are extremely image heavy, ie. cartoon forums, sketching, drawing, photography, etc... these type forums do not run well when storing images in the database vs. filesystem, because you begin to talk about images that are 5 - 10Mb in size, and extracting them from the database in a loaded environment shuts down access to others under load. Just trying to get the pages loaded can be a headache when you have 500 users online all pulling massive images from the database. This is where the file system comes into its own.

                        Quite honestly, tests I have done has seen little to marginal change if albums / images are rarely used, ie. very text chatty type forums. Honestly your better to just leave the minority of images in the DB under such cases. For image intensive sites, download type forums, absolutely disagree with anyone who says leave them in the database.... your going to kill your site if its busy by doing so.

                        Comment


                        • #13
                          Ok, thanks for your input guys.

                          It seems there is no straight and simple awnser. I have seen a couple of sites with millons of images that rerport better performance with dB storage. One was a bank that had several million scanned signatures from customers. Another was a web site that reported they ran into problems with file system storage when the number of files exceeded a million. They got better performance by moving them back to the dB. The explanation was that the OS FS was running low on file handles / pointers / resources. I guess it all comes down to which resource is the bottleneck. A dB usually want o store as much as possible in RAM so the advise on not storing large blob in dB make good sense.

                          The sites I have been working with have had a few thousand images and attachments only and I have never seen any performance downside with dB storage. With larger sites I guess it would have to be evaluated on a case to case basis. Possibly a system with file system system storage is less sensitive to resource bottlenecks than a dB stored system. But until I see problems I will stick to database storage.

                          Comment


                          • #14
                            There is a pretty simple answer, though you have kind off gone away from VB software though... when referring to banks with signature images. Nobody is saying pulling images from a DB is bad in general, just bad with VB if a heavily imaged site. How a bank sets up to pull images from a DB is totally different than comparing the principle with VB software. Databases can be very effective tools, and if all you are doing with the one DB is pull signature images, then that would seem to be a pretty effective means IMHO. But when you are talking about forum software, that is pulling postbit data and has multiple users at once all doing different things, ie. pushing and pulling all different types off data from the one DB... that is a little different than say a bank DB just to store signature images and nothing else.

                            If you ran a large image site using VB, then you could remove the image tables and put them within a slave and draw them uniquely, which would then provide back the performance as you are no longer accessing one DB, but now two!

                            Comment


                            • #15
                              1. Keeping the images separate from the database in the event of a DB crash you still have your images. If they're in the DB, they're gone as well.
                              2. Reduces DB bloat. Your DB can be backed up / restored a lot quicker.
                              3. Images in the file system can be backed up separately. Although you can't just put them back in the folder they came out of, at least you have them.
                              Washing them through irfanview in a bulk conversion gives you the images back in a non(in-house)VB usable format.

                              I have over 4000 LARGE images in my anime forum. The performance difference is minimal. But having suffered through 2 fatal crashes with the images in the DB, I find that having the images outside the DB a lot safer.
                              I can ftp in and nab the image folder then cpanel in and nab the database. I'm a lot more comfortable with my site set up like this.
                              ...

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...
                              X