Monday, 2019-09-09

*** tpb has joined #symbiflow00:00
*** Bertl is now known as Bertl_zZ01:49
*** proteusguy has joined #symbiflow05:47
*** citypw has joined #symbiflow09:28
*** Bertl_zZ is now known as Bertl09:29
*** craigo has joined #symbiflow09:43
*** proteusguy has quit IRC10:13
*** proteusguy has joined #symbiflow10:27
*** proteusguy has quit IRC11:20
*** adjtm has quit IRC11:38
*** craigo has quit IRC11:47
*** citypw has quit IRC12:00
*** freemint has joined #symbiflow12:10
*** adjtm has joined #symbiflow12:13
*** freemint has quit IRC12:33
*** freemint has joined #symbiflow12:33
*** freemint has quit IRC12:36
*** freemint has joined #symbiflow12:37
*** freemint has quit IRC13:29
*** freemint has joined #symbiflow13:32
*** adjtm has quit IRC13:44
*** adjtm has joined #symbiflow13:56
*** freemint has quit IRC14:07
*** rvalles has quit IRC14:24
*** adjtm has quit IRC15:20
*** adjtm has joined #symbiflow15:20
*** Bertl is now known as Bertl_oO15:23
*** rvalles has joined #symbiflow15:35
mithroduck2 set up some benchmarking for the VPR parsing code at https://github.com/duck2/vpr-rrgraph-benchmark15:49
tpbTitle: GitHub - duck2/vpr-rrgraph-benchmark (at github.com)15:49
*** freemint has joined #symbiflow16:16
litghostmithro: Assuming I'm reading that correctly, duck2 XML -> 18 seconds, duck2 capnprot -> 12 seconds16:24
litghostoph16:24
litghostGiven that capnproto doesn't really parse a lot, I'm guessing there is a some headroom in the copying for improvment16:30
litghostIf not, the mmap -> in memory datastructures is the next step16:30
hzeller[m]As long as the capnprot data structure is copied to the local data structure, there will still be a lot of overhead. Ideally, we can use the capnproto structs directly; so though that might need an abstraction of the access patterns first.16:51
litghosthzeller: I agree there will be some overhead, but 12 seconds seems excessive.16:52
mithrolitghost / hzeller[m]: duck2 is a good person to ask16:56
hackerfooAssuming the amount read >= peak memory usage, that's >= 160MB/s, which seems reasonable for random-ish access. Averge random 4k reads for my high end SSD are only ~50MB/s.17:01
hackerfooHow big is the capnproto rr_graph?17:02
hackerfooI guess it shouldn't be random reads, though.17:02
duck2the current copying code is mostly "translated" from the xml reading code. From my past callgrinds, I think the most time is taken by copying the edges, where the gap between the data representations is wide.17:03
litghostduck2: Edges are something we should strongly consider specializing17:04
litghostduck2: e.g. store the edges in a dense blob of ints/shorts17:04
hackerfooduck2: Can you try putting the rr_graph in a ramdisk?17:04
litghostduck2: Because that data is basically just a giant 2D matrix17:04
duck2hackerfoo: The cap'n proto graph is ~600MB. I do a warmup run in the benchmark, so the file should be in the cache when measuring 12s17:05
*** freemint has quit IRC17:14
*** freemint has joined #symbiflow17:18
duck2litghost: in the file or in memory? Even if we store the edges as such in the file, we still need to do rr_node::add_edge or vpr::add_edge_metadata which do allocations.17:45
litghostduck2: the allocation strategy of the edges is something that should be examined anyways17:51
litghostduck2: now might be a good time17:52
duck2litghost: is the rr_graph read-only enough to try arena allocation? currently every node manages its small vector-ish of edges. I don't know how to deal with metadata, since it's not a simple vector. however currently every node&edge has a t_metadata_dict of its own, which takes an allocation to create and another allocation to populate17:59
litghost> " rr_graph read-only enough to try arena allocation"18:00
litghostThis is true when reading an rr graph18:00
litghostIt is completely read-only18:00
*** adjtm has quit IRC18:07
hackerfooYou could try a simple bump allocator: `void *malloc(size_t size) { void *p = mem; mem += size; return p; }` and `void free(void *p) {}`18:07
hackerfooWhere `mem` points to a large chunk of preallocated memory, and see how much it speeds it up.18:08
hackerfooYou might also want to check for overflow in `malloc`.18:08
hackerfooYou can use `mmap` to get a large chunk of RAM, and even resize it. `sbrk` is not recommended.18:13
duck2also note that a t_metadata_dict is a std::unordered_map<std::string, std::vector<std::string>> and that's allocated for every edge18:14
hackerfooYuck, strings.18:16
duck2hackerfoo: is it possible to just override malloc like that?18:16
hackerfooduck2: In C, yeah. `malloc` is defined in stdlib.h: https://en.cppreference.com/w/c/memory/malloc18:18
tpbTitle: malloc - cppreference.com (at en.cppreference.com)18:18
hackerfooThere's a way to replace `new` in C++ as well, but I haven't done it.18:19
litghosthackerfoo: Overriding malloc like that is a bad idea in C++.  duck2: Just replace the allocator for the relevant objects18:19
litghostduck2: Also unordered_map is likely overkill, a flat_hash_map is likely superior given that we do not mutate the metadata during runtime18:20
hackerfoohttps://en.cppreference.com/w/cpp/language/new#Allocation18:21
tpbTitle: new expression - cppreference.com (at en.cppreference.com)18:21
litghostduck2: VTR does provide a chunk malloc, which is approximately a bump allocator, but accommodates unbounding arena size18:21
litghostunbounded18:21
duck2litghost: also would be good if we don't create std::strings for every metadata string, they come as const char *s. can we provide a hash fn for them?18:22
mithrohttps://github.com/YosysHQ/yosys/pull/136318:23
tpbTitle: Add tests for Xilinx UG901 examples by SergeyDegtyar · Pull Request #1363 · YosysHQ/yosys · GitHub (at github.com)18:23
*** craigo has joined #symbiflow18:23
litghostduck2: Yes, you can define a hash fn, unordered_map has a "class Hash" template parameter that can point to a hash function class thingy18:24
litghostduck2: By default it is std::hash<T>, but it can be anything18:25
hackerfooIf you mmap a file, you can use char * + size and not allocate anything else. That's what I do for immutable strings. Probably not an option here, though.18:25
hackerfooIf you dedup the strings, you can just use the pointer as a hash and for comparisons.18:27
hackerfooAnd then the strings just become numbers, essentially.18:27
litghostAnd the number of string keys is low (think order ~10), so it might also make sense to do vector<string> -> sort -> unique -> hand out ids18:27
litghoste.g. a simple interning schema18:28
hackerfoo^ Yeah, that's more predictable than pointers, and you can put the sizes in the table.18:29
litghostTo be clear, I'm talking about the key to the std::unordered_map18:31
litghostThey are keys like "fasm_feature", "fasm_prefix", etc18:31
litghostThe data is basically line noise18:31
duck2wouldn't assuming ~10 unique keys be too specializing to fasm? not that I'm aware of any other use of metadata but...18:34
*** adjtm has joined #symbiflow18:35
litghostduck2: You could do a fallback schema if the string intern map gets too big18:36
litghostduck2: E.g. intern up to 100 strings, and then fallback to a straight forward hash18:36
litghostduck2: And the key becomes a tagged union, either string intern pointer or std::string18:37
hackerfooduck2: And here's the initial implementation for the fallback: assert(false); /// TODO handle this case18:37
litghostduck2: I think it is safe to assume that most uses of metadata will use a limit number of identifying keys18:37
hackerfooBecause there's no reason to implement the slow path if the fast path isn't fast enough.18:38
duck2litghost: makes sense18:41
*** lopsided98 has quit IRC18:50
*** lopsided98 has joined #symbiflow18:52
*** lopsided98 has quit IRC18:56
*** lopsided98 has joined #symbiflow18:58
*** freemint has quit IRC19:36
*** freemint has joined #symbiflow21:31
*** freemint has quit IRC22:04
mithroduck2: Post your end year report here please :-)22:22
duck2eh, apparently I didn't post before. here it is: https://docs.google.com/document/d/1SZm44g-9Bo-xD2lDfXcxb8bpzYjewGyK7bVyOYj5Gqs/edit?usp=sharing22:43
tpbTitle: GSoC2019 - SymbiFlow - Final Report - Google Docs (at docs.google.com)22:43
*** freemint has joined #symbiflow22:47

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!