On Fri, Mar 21, 2008 at 03:10:23AM -0500, James Bennett wrote: > > On Fri, Mar 21, 2008 at 3:03 AM, David Cramer wrote: > > It provides uniqueness for all instances of an object throughout > > memory. Which, right now, causes major headaches ;) > > For you ;) > > My blog runs on mod_python, with (at peak) maybe a half-dozen > processes running. Each of those processes might have its own copy or > even multiple copies of a given Entry object. And I don't care that > that happens; whether there are a bunch of copies or only one is > utterly irrelevant to that use case. I mean this nicely, but a 6 process blog setup isn't quite representative of some of the things folks are using django for. Bear in mind I like django, but I'm strongly of the opinion (after near a year of of beating django's runtime and memory usage down) that while the "it works fine for my blog" notion is good, it's pretty easy to get kicked in the groin by django when you try to scale it up to some serious req/s, or stray away from minor/quicky webapps. Hell, even a quicky/minor webapp (displaying QA reports for gentoo) resulted in django breaking down, resulting in the patch on #17- my first real usage of django was a bit of a "spanish inquisition" (malcolm's label) and that patch alone accounted for 30% of runtime- mem usage was a rough drop from >100mb to 60mb for rendering also. For the sake of argument, lets just pretend that I'm insane and that usage scenario is something django doesn't care to optimize for; lets examine #17 in the context of running a rather large django site that doesn't have a massive # of objects in memory, essentially is nothing more then a fairly dynamic (meaning you're a bit screwed in caching if you're not doing post render js replacement) set of a large # of pages. Try it with ~40 processes running, each sucking down about ~40 mb (mod_py), ~30 mb (mod_wsgi). That's just for having the instances hanging around *initialized*- meaning they've served one request already, iow majority of the code is loaded in memory. Now imagine if at any given time, half of those processes are active- meaning that their memory usage isn't sitting nicely at the "waiting for a connection" state, that they are doing queries, building results in memory; 1.6gb already is pretty damn high, if you're actually *doing* something with django beyond leaving a bunch of inited mod_py workers around the running mem usage is going to exceed 1.6gb pretty quickly. One of the things that makes that actually managable is ticket #17. Yay, it makes memory usage a fair bit saner- what you're ignoring is that it also makes django usage of foreignkeys a fair bit more behaved db wise via determining in process if it can duck reaching out to the db "for a bunch of copies". Upshot, you get a consistant view w/in your process too. Realize the stats being thrown around here might sound insane, but they're honest to goodness truth- what get's particularly fun is that the stats stated above are average running case, I didn't even shoot into the occasional week long increase in traffic where # of mod_py workers required running at a given time hits 100 (4gb flat out) for just init'd, let alone running (~6gb). To be clear, obviously things could be done to push memory usage down further and further [1], but for the most part there isn't anything else available that delivers the same kick in the ass for performance/mem-usage with so little change required. > And it's irrelevant to a whole lot of use cases, really, because there > are only two specific situations where it does cause any problems > you'd be likely to notice: > > 1. You have staggering numbers of objects in memory, in which case > multiple copies hurt resource usage (and, to be perfectly honest, > using a lighter-weight data structure than an instance of Model > becomes a much better solution for this most of the time). You realize you're advocating ducking parts of core django as a solution for memory usage issues, right? Why not... just fix it instead? ;) > While these are real use cases, they are not the majority of use > cases, and so identity mapping in the ORM is not necessarily a > must-have feature for 1.0. They are important for you, yes, and they > are important for a number of situations, yes, but we have much bigger > fish to fry over in the big belly ;) If there were massive work to be done for #17, I'd concur- that said, what remains? Top of the head, it lacks __getstate__/__setstate__ (think memcache usage) and... that's pretty much it. The phrase "bigger fish to fry" is only really applicable to me when it requires work of you; when the work is done (and 9 months in production usage on one of the largest django sites out there), I question what work remains, more critically, why ignore that the work's been done when you can lift it for free. It's y'alls call of course, and I realize this is a bit of a scary change, but frankly I'm a bit amazed that folks are even arguing over whether or not to do it- seems fairly obvious to me that it's a required step. ~brian [1] modifying django so that it's able to run multiple site's from w/in a single process image is another fairly serious kick in the ass performance wise- that ~40mb figure is with that patchset active, without it the mem usage grows pretty nastily if you have multiple vhosts (clearing 200 was unfortunately common). Obviously you could limit the # of reqs/child to force a re-start (thus clearing the mem usage), but that's duct taping it to try and address the symptoms instead of the causative agent. Plus it's ignoring that if you get *really* unlucky and have >20 vhosts active at a given time and a seperate vhost request hits that process you'll jack up the memory well before it hits the req/child limit. Either way, if some adventurous soul is interested in adapting that patchset, cleaning it up and pushing it upstream, kindly contact me. Same if someone is after cleaning up an implementation of master/slave (changes the orm adding force_(master|slave), and some backend cleanup). Personally not interested in doing the cleanup work, but if someone else is....