Wednesday, 2009-05-13

mwilkes|phoneSRabbelier: sorry about that mail, fat fingers + andy = fuckups.12:57
SRabbeliermwilkes|phone: aah, you were sending it from your andy? :)13:10
SRabbeliermwilkes|phone: don't worry about it, I tried to send a proper email from my andy once, but it's nigh impossible :)13:10
mwilkes|phoneSRabbelier: even replied to the wrong thing! shall have to write a new client…13:17
SRabbeliermwilkes|phone: yes, I was guessing that it was in reply to the other thread, was really confused for a moment though :P13:18
mwilkes|phoneSRabbelier: useful sometimes though. scheduled an exam while waiting for another to start. yay.13:26
SRabbeliermwilkes|phone: hehe, pretty sweet, how did your examn go?13:28
mwilkes|phoneSRabbelier: apparently it was "really difficult", but I can't have got much below 80%, so either I missunderstood thre point of it, or really well13:33
mwilkes|phonenow in the library "studying"13:33
SRabbeliermwilkes|phone: I totally know what you mean, had that with my last Ethics test :P13:34
mwilkes|phoneI've just noticed that this library looks a lot like the memorial to the jewish victims of the holocaust in berlin…13:35
* James--Crook waves at Erant16:50
* SRabbelier waves at James--Crook16:51
* James--Crook waves to Sverre16:51
James--CrookErant: take a look at - very well written...16:51
tpbTitle: Articles - doctype - Google Code (at
SRabbelierJames--Crook: fancy!16:52
James--CrookHow is Sverre?16:52
SRabbelierJames--Crook: watching google's Shareholder's Day 2009, at 11:00 there's some guy causing ruckus :P16:54
solydzajsSRabbelier: I stopped watching after 5 minutes :-)16:55
SRabbeliersolydzajs: I'm waiting for this meeting to start :P16:55
dandersonSRabbelier: what about?16:55
solydzajsSRabbelier: ah ok ;-)16:55
SRabbelierdanderson: he wanted to ask a question or something16:55
SRabbelierdanderson: but the Q&A was "later on" :P16:56
SRabbelierdanderson: and he wouldn't STFU :P16:56
dandersonand escort him out16:57
solydzajsJames--Crook: 3 more minutes and we can start16:57
SRabbelierdanderson: they did :D16:57
James--Crooksolydzajs: fine with that.  Anything I should be reading first (I think not, but just checking)16:58
dandersonI see we share the same "no bullshit kthx" attitude :P16:58
solydzajsJames--Crook: nope this is quick meeting ;-) no agenda, you wanted to talk :-)16:58
James--Crooksolydzajs: cool.16:58
SRabbelierdanderson: heheh, lol :D16:59
MerioI'm here too, cheers all ^__^16:59
SRabbelierMerio: O HI :D16:59
solydzajsHey Merio :-)16:59
solydzajsMerio: how is it going ?;-) how is Dublin ?16:59
James--Crookhi Mario!16:59
James--CrookSo, I think we can start...17:00
solydzajsyep :-)17:00
solydzajsJames--Crook: the meeting is all yours :-)17:00
James--Crooksolydzajs: I'm looking at the fact that we have two strong GSoC students both doing statistics.17:00
MerioHi all ^__^ solydzajs: :P after meeting :P17:00
solydzajsJames--Crook: yes17:00
James--CrookSo there are two main things to consider....17:01
solydzajsI'm listening :-)17:01
James--CrookOne is that they don't tread on each others toes.17:01
solydzajsThat for sure17:01
James--CrookThe other is that getting essntial stats done could probably be done by one student by the mid term...17:01
James--CrookBy essential stats I mean basic pie and bar cahrts.17:02
James--CrookIt's not as if google charting is difficult to use.17:02
solydzajsyes, but it is not jsut pie and bar charts17:02
James--CrookSo, we need to start thinking about this right now.17:02
solydzajsit's also result tables17:02
James--Crooksolydzajs: please go on...17:02
James--CrookThe hard part of results tables is the 1000 problem and what Dan Bentley is doing...17:03
solydzajsusage of both Visualization API and Chart API, and writing in a way that can be easily extended with new charts17:03
solydzajsnew stats17:03
solydzajsJames--Crook: no it's not17:03
James--CrookFrom Mario's point of view a table is just another view of the data...17:03
Meriosolydzajs: IIRC you can serve the same JSON to a table visualization, to a pie chart and to a bar chart as well17:03
Meriosolydzajs: Never really played with that, but it seems so from the description of the API I've read before applying17:04
James--CrookWell...  so we're seeing gathering stats and we're seeing presenting stats....17:04
solydzajsMerio: yes we can but like you just pointed out you never played with that and I'm sure it will take sometime to make it work nicely17:04
Meriosolydzajs: if you want I can try something before 23 to have a better idea17:05
solydzajsJames--Crook: 1000 limitation is not a problem here because we will gather the data using jobs for example 300 entities per job and just start from last key in next job17:05
James--CrookNow in presenting stats we can (and probably will) go off into the wilder reaches with charts with a slider so that you can see changes over time....17:06
solydzajsJames--Crook: we don't need any specific order of entities for stats17:06
James--CrookAh, so you're reckoning the sorting might be tricky?17:06
solydzajsMerio: well we should focus first on getting the data out from datastore using Jobs17:06
James--CrookOr at least the flexible specification of it?17:06
James--CrookAbsolutely, getting the data is a vital first step.17:07
James--CrookSorting is needed for some kinds of chart too.  Not just tables.17:07
solydzajsJames--Crook: gathering data in sorted format yes it will be a problem, but we don't need that data to be sorted17:07
James--CrookNo, that is not a problem.17:07
solydzajsJames--Crook: we need sorted results and that we can do17:07
James--CrookGathering it unsorted is fine.17:07
solydzajsyep exactly17:08
James--CrookPresenting it sorted is fine too.17:08
James--Crookin javascript.17:08
James--CrookIt works quite well :-)17:08
solydzajsyep depending where we are going to sort :-)17:08
solydzajsthis is doable17:08
solydzajswithout any problems17:08
MerioJames--Crook: actually not always ^__^17:08
James--CrookMerio: go on...17:08
James--CrookMerio: what is the problem with sort?17:09
James--CrookI think our largest 'dataset' is for all applications (not just accepted ones)17:10
MerioJames--Crook: well you would end up doing custom sorting functions17:10
solydzajsJames--Crook: what other questions do you have regarding Stats projects ? We need to start with gathering data using Jobs, I will review all the recent changes to wiki pages tomorrow and also add there my description of how it should work and where to start17:10
solydzajsJames--Crook: our largest dataset is all Students17:10
solydzajsJames--Crook: actually it's all Users but that we don't need17:11
solydzajsJames--Crook: well you are right all applications is the largest one17:11
James--CrookOh, yes, because many students don't submit an application.  Numbers (roughly?)17:11
MerioJames--Crook: IIRC the sort array function in JavaScript sorts alphabetically by default. Furthermore sometimes sorting can be more difficult than just an alphabetical order, if we want multiple sorting17:11
solydzajsJames--Crook: I think we had 3600 students and 6000 proposals ? or something like that I would have to look it up17:12
James--CrookMerio: OK, so it's not an instant thing, but it's well known territory.17:12
James--Crooksolydzajs: ballpark is fine.17:12
James--CrookWe're not going to be CPU limitted.17:12
solydzajsnope :-)17:12
MerioJames--Crook: oh certainly... is just something that with JavaScript is odd :P17:12
James--CrookI do have other questions, but they are less important than understanding the flavours of the two projects more.17:13
solydzajsif we use datatable jquery plugin we can feed it with the result data and we will get sorting for free17:13
*** dhaun has quit IRC17:14
solydzajsso lets leave sorting issue for now :-)17:14
Meriosolydzajs: yes for sure ^__^ Actually I was thinking of sorting before feeding visualizations, but yes.. it's not a primary concern now :P17:14
James--Crookso we're back to...  we could have good simple stats by mid term with one students work.17:15
solydzajsMerio: yep we can always do it on backend side, we will figure it out :-) that is our smallest problem :-)17:15
James--CrookTo me what that says is17:15
Meriosolydzajs: agree :)17:15
James--Crookthat we need to be planning for stats features that have spin offs in other areas.17:15
James--CrookFor example17:15
solydzajsJames--Crook: well have you read Merio proposal ?17:15
James--CrookSomething that is really cool is alex picos classification of orgs17:15
solydzajsJames--Crook: he has a lot of ideas of stats.17:16
solydzajsJames--Crook: and I'm sure I will come up with more too :-)17:16
James--Crookyes.  But...  Mario could do that all on his own.  And we'd have timelines and word clouds and lots of other very cool stuff.17:16
James--CrookNow what I see... is that the org classification is an interesting thing to start deriving stats off.17:17
James--CrookSo....  do assembler projects tend (by and large) to be more successful than php projects?17:18
solydzajsJames--Crook: what we need is backend (Jobs), storing results in some model, caching results, displaying results in different formats (Chart API, Visualization API, Table), export of results, Stats settings view where we can configure what kind of stats are supported and what we need to gather, and all the dashboards for different Roles.17:18
solydzajsJames--Crook: it might look like it's simple but it's not that trivial17:18
solydzajsJames--Crook: maybe frontend yes, but not backend, Merio you agree ?17:19
* James--Crook waits for response from Merio17:19
MerioJames--Crook: well frontend and backend might be complex or trivial as anyone wants17:20
solydzajsJames--Crook: I hate both asm and php so it's not a question for me17:20
solydzajsJames--Crook: well I hate PHP more, so well I will go with asm :-)17:20
James--CrookSo, if we KISS.  If we just go for something that kicks ass with regards to simple pie charts and bar charts and tables?17:22
solydzajsMerio: if you look at the big picture of this project I don't think it's trivial, if you want to make it work smothly, be extensible and manageable then it requires more work then a trivial "Oh I will fetch the data save it as whatever I come up with and I'm done"17:22
James--Crooksolydzajs: please say some more...17:22
Meriosolydzajs: for example if you want a frontend with an iGoogle interface.. than it would be a little bit more complex. The thing I'm worried the most that we will end up with X different stats with "X" being any high number. Doing a job for every different statistic... without having something really extensible without some programming17:23
solydzajsJames--Crook: if we get the gathering of data part done then I think frontend will not be a problem for either Daniel or Merio17:23
James--Crooksolydzajs: agree...17:23
solydzajsJames--Crook: so lets focus on solving this part first17:23
James--Crooksolydzajs: and i'd add that that is the part that we can elaborate safely.  What we need now is a clearer understanding of collecting the data.17:24
James--Crooksolydzajs: great minds :-)17:24
solydzajsJames--Crook: yes agree, so I will try to describe it in more details till Friday on wiki page17:24
James--Crookso...  on the stats page and in e-mails we outline the spike solution...17:24
James--Crookwhich is a job that loads a json object, adds to it, and writes back to it...17:25
James--CrookThat much looks simple (provided you know about models, pyton and memcache) :-)17:25
solydzajsyep saves it in some model, cache the results17:25
solydzajsindicates whether stat is complete or not17:25
MerioJames--Crook: studying them during these days ^__^17:25
James--CrookI could see us having a first cut of that by this time next week.  (with zero flexibility to it)17:25
solydzajshas a previous stats always saved too17:25
James--Crooksolydzajs: even better.17:26
solydzajsso while it processes new stats you can always access them17:26
James--CrookI think that is going to work smoothly.17:26
James--CrookAt this stage we don't worry about that cache being invalidated by new changes in the underlying data.17:26
solydzajswe cache only the stats that are finished17:27
James--CrookSo...  I want to check with you Pawel that we are not worried about getting to that stage.17:27
solydzajsthe JSON object that is completed might be cached17:27
solydzajsbut partial JSON won't be17:27
James--Crookwhy not?17:27
MerioIf something goes wrong we've a complete JSON cached anyway I think :)17:28
solydzajswhy would you want to cache something that will change with next Job ?17:28
James--CrookBecause job is adding to it.17:28
solydzajsJames--Crook: imagine the scenario:17:28
James--CrookTerminology issue.17:28
solydzajswe set that stats A will be generated every 1h17:29
James--CrookI'm actually talking about one 'timeslice' of the job.17:29
solydzajsthe gathering of data required for stats A takes about 5 minutes17:29
James--Crookso it takes 5 timeslices.17:29
James--Crookso at timeslice 3 there is a partial object cached.17:30
solydzajsit's 11:01 so the jobs started to work on new stats17:30
solydzajsbut we still need to access the old ones at this time17:30
James--CrookYes, correct, but that is a different instance.17:30
solydzajsthe jobs are appending data to new JSON object17:30
solydzajswhile we are still accessing the old data17:31
James--CrookIt's only when the partial object is complete that we write it to the reference copy. (or whatever name we give to it)17:31
solydzajsand the old data are cached of course17:31
solydzajsyep reference copy is one thing17:31
solydzajsbut memcache is other thing17:31
James--Crookbut solydzajs we can have multiple objects in memcache...17:31
* SRabbelier thinks this is pretty obvious17:32
SRabbelierwe have 2 different cached objects17:32
solydzajswell true17:32
SRabbelier1 "in progress"17:32
SRabbelier2 "done, and serving"17:32
solydzajsyep ok17:32
* SRabbelier goes back to lurking17:32
James--CrookI'm beginning to see why you were worried.17:32
solydzajsok that way Jobs will have quicker access to the partial JSON17:32
James--CrookI think we do need clarity in the design.17:32
solydzajsgot it17:32
solydzajsah it's late :D17:32
James--Crookno problem.17:33
James--CrookWell, I am still looking at exactly what will be in the JSON object.17:33
James--CrookBut I think we have enough to get started there.17:33
James--CrookAnd I think this means that the kind of stats (andf a lot more) that Danderson produced last year17:34
MerioJames--Crook: my guess in the stats wiki page is based basically only in what's need for visualization API, perhaps more is needed for dashboard and other interactions17:34
James--Crookwill really be a very realistic goal for mid term.  Even comfortable, baring major GAE FUs.17:34
solydzajsMerio: I will put more info there in the upcoming days17:34
solydzajsJames--Crook: we will see what Google announces during Google I/O :-)17:34
solydzajsJames--Crook: maybe some more fancy stuff ;-)17:35
Meriosolydzajs: ok, I'll try to do something too17:35
solydzajsMerio: ok thanks :-)17:35
James--CrookSo I want to come back to how tags tie into stats.  because I do see some things that need thinking about there.  It would be a disaster if they were designed without stats in mind.17:35
solydzajsJames--Crook: tags meeting is tomorrow17:35
solydzajsJames--Crook: please join us17:36
James--CrookI know, but stats discussion is today.17:36
solydzajsJames--Crook: depending on where we decide to use tags, we can later on do statistics based on the tags17:36
solydzajsJames--Crook: for example like I did for GHOP last year17:36
Merio(solydzajs: when will the tags meeting be?)17:37
solydzajsJames--Crook: tasks based on category, like documentation ,code , translation etc17:37
James--CrookAs I see it the stats experts need to be involved in thought about the design of tags.17:37
solydzajsMerio: melange-soc-dev :-)17:37
James--Crooksolydzajs: interesting, and how did you collect your tables?17:37
James--Crook(i.e. the tables from which you generated the charts)17:38
Merio(solydzajs: ops, didn't download new mail :P)17:38
solydzajsJames--Crook: Oh I was parsing Issue tracker html output :-)17:38
solydzajsJames--Crook: that was the only way ;-)17:38
solydzajsJames--Crook: with Melange it's going to be much simpler :-)17:38
* James--Crook nods17:38
solydzajsJames--Crook: no it was because we used Issue Tracker for last GHOP17:38
solydzajsJames--Crook: it was pain :-)17:38
James--Crookindeed :-)17:39
James--Crookso I bet there were issues beyond just the screen scraping.17:39
solydzajsyep :-)17:39
James--CrookSo, decision, should stats people be involved in a solution this time round?17:40
solydzajsJames--Crook: I was producing those charts :17:40
tpbTitle: GHOP Statistics (at
James--Crook(stats people = Mario and Daniel)17:41
James--Crooksolydzajs: opinion?17:41
SRabbelierJames--Crook: yes, I think they should be involved17:41
solydzajsJames--Crook: yes they should be17:41
* Merio is happy with that :P17:41
James--CrookSRabbelier: I'm not interested in your opinion (well I am) this is a project decision.17:41
James--CrookOK.  (sorry srabbelier, didn't mean to be rude)17:42
SRabbelierJames--Crook: np :)17:42
James--CrookOK.  That clears one very important thing up for me.17:42
James--CrookThe other one is a little bit related.17:43
James--CrookLooking at how LH used stats last year.17:43
James--CrookThey were always embedded in docs.17:43
solydzajsThis project is a team work and we treat our GSoC Students as team members :-)17:43
James--CrookNow I don't think it is a good idea to have lots of chart parameters flying around in a doc.17:43
solydzajsJames--Crook: in what docs ? have you been involved in GHOP ? and what stats are you talking about right now ?17:44
James--CrookI'm talking about the participation-by-region for GSoC that appeared in the blog.17:44
James--CrookGSoC did not have any publishing ability at that time.17:45
solydzajsJames--Crook: yep ok and what about that ? Leslie will be able to export those charts or save them17:45
James--CrookThere was also a later one that Mario pointed me to that had the participation top-ten schools.17:45
solydzajsJames--Crook: stats are not only used for blog posts17:45
solydzajsJames--Crook: ok I don't see any problem ?17:46
James--CrookRather than add a brand new way to put text boxes beside a chart.17:46
James--CrookIt seems to me that a string substitution $Pie(ChartName) will do the job very nicely.17:46
James--CrookIt seems to give us a lot with two or three lines of python code.17:47
solydzajsVisualization API is really powerful, we can even provide a button where when LH click on it she will get nice HTML that she can embed on the blog post which will generate the same chart output17:47
solydzajsthis is something we can worry about later I think :-)17:48
solydzajswe need basics first17:48
James--CrookI think probably the other things I'm thinking about can be taken off line.17:48
James--CrookYes.  Basics first.17:48
solydzajscool ;-) agreed here :-)17:49
James--CrookOK.  So I think I am done for the moment.17:49
solydzajsJames--Crook: ok thanks a lot for this discussion17:49
MerioWell.. nothing more to say ATM17:49
James--CrookThank you Pawel.  I wanted to get 'in sunc'.17:50
solydzajsJames--Crook: sure thing :-)17:50
solydzajsMerio: thanks for participating17:50
James--Crookso I think that is meeting done then.17:50
solydzajsttyl guys and I'm looking forward to your review of my wiki page changes :-)17:50
James--Crookho ho :-)17:51
solydzajsand g'night :-)17:51
Meriocheers :)17:51
