*** tpb has joined #symbiflow | 00:00 | |
*** Bertl is now known as Bertl_zZ | 01:49 | |
*** proteusguy has joined #symbiflow | 05:47 | |
*** citypw has joined #symbiflow | 09:28 | |
*** Bertl_zZ is now known as Bertl | 09:29 | |
*** craigo has joined #symbiflow | 09:43 | |
*** proteusguy has quit IRC | 10:13 | |
*** proteusguy has joined #symbiflow | 10:27 | |
*** proteusguy has quit IRC | 11:20 | |
*** adjtm has quit IRC | 11:38 | |
*** craigo has quit IRC | 11:47 | |
*** citypw has quit IRC | 12:00 | |
*** freemint has joined #symbiflow | 12:10 | |
*** adjtm has joined #symbiflow | 12:13 | |
*** freemint has quit IRC | 12:33 | |
*** freemint has joined #symbiflow | 12:33 | |
*** freemint has quit IRC | 12:36 | |
*** freemint has joined #symbiflow | 12:37 | |
*** freemint has quit IRC | 13:29 | |
*** freemint has joined #symbiflow | 13:32 | |
*** adjtm has quit IRC | 13:44 | |
*** adjtm has joined #symbiflow | 13:56 | |
*** freemint has quit IRC | 14:07 | |
*** rvalles has quit IRC | 14:24 | |
*** adjtm has quit IRC | 15:20 | |
*** adjtm has joined #symbiflow | 15:20 | |
*** Bertl is now known as Bertl_oO | 15:23 | |
*** rvalles has joined #symbiflow | 15:35 | |
mithro | duck2 set up some benchmarking for the VPR parsing code at https://github.com/duck2/vpr-rrgraph-benchmark | 15:49 |
---|---|---|
tpb | Title: GitHub - duck2/vpr-rrgraph-benchmark (at github.com) | 15:49 |
*** freemint has joined #symbiflow | 16:16 | |
litghost | mithro: Assuming I'm reading that correctly, duck2 XML -> 18 seconds, duck2 capnprot -> 12 seconds | 16:24 |
litghost | oph | 16:24 |
litghost | Given that capnproto doesn't really parse a lot, I'm guessing there is a some headroom in the copying for improvment | 16:30 |
litghost | If not, the mmap -> in memory datastructures is the next step | 16:30 |
hzeller[m] | As long as the capnprot data structure is copied to the local data structure, there will still be a lot of overhead. Ideally, we can use the capnproto structs directly; so though that might need an abstraction of the access patterns first. | 16:51 |
litghost | hzeller: I agree there will be some overhead, but 12 seconds seems excessive. | 16:52 |
mithro | litghost / hzeller[m]: duck2 is a good person to ask | 16:56 |
hackerfoo | Assuming the amount read >= peak memory usage, that's >= 160MB/s, which seems reasonable for random-ish access. Averge random 4k reads for my high end SSD are only ~50MB/s. | 17:01 |
hackerfoo | How big is the capnproto rr_graph? | 17:02 |
hackerfoo | I guess it shouldn't be random reads, though. | 17:02 |
duck2 | the current copying code is mostly "translated" from the xml reading code. From my past callgrinds, I think the most time is taken by copying the edges, where the gap between the data representations is wide. | 17:03 |
litghost | duck2: Edges are something we should strongly consider specializing | 17:04 |
litghost | duck2: e.g. store the edges in a dense blob of ints/shorts | 17:04 |
hackerfoo | duck2: Can you try putting the rr_graph in a ramdisk? | 17:04 |
litghost | duck2: Because that data is basically just a giant 2D matrix | 17:04 |
duck2 | hackerfoo: The cap'n proto graph is ~600MB. I do a warmup run in the benchmark, so the file should be in the cache when measuring 12s | 17:05 |
*** freemint has quit IRC | 17:14 | |
*** freemint has joined #symbiflow | 17:18 | |
duck2 | litghost: in the file or in memory? Even if we store the edges as such in the file, we still need to do rr_node::add_edge or vpr::add_edge_metadata which do allocations. | 17:45 |
litghost | duck2: the allocation strategy of the edges is something that should be examined anyways | 17:51 |
litghost | duck2: now might be a good time | 17:52 |
duck2 | litghost: is the rr_graph read-only enough to try arena allocation? currently every node manages its small vector-ish of edges. I don't know how to deal with metadata, since it's not a simple vector. however currently every node&edge has a t_metadata_dict of its own, which takes an allocation to create and another allocation to populate | 17:59 |
litghost | > " rr_graph read-only enough to try arena allocation" | 18:00 |
litghost | This is true when reading an rr graph | 18:00 |
litghost | It is completely read-only | 18:00 |
*** adjtm has quit IRC | 18:07 | |
hackerfoo | You could try a simple bump allocator: `void *malloc(size_t size) { void *p = mem; mem += size; return p; }` and `void free(void *p) {}` | 18:07 |
hackerfoo | Where `mem` points to a large chunk of preallocated memory, and see how much it speeds it up. | 18:08 |
hackerfoo | You might also want to check for overflow in `malloc`. | 18:08 |
hackerfoo | You can use `mmap` to get a large chunk of RAM, and even resize it. `sbrk` is not recommended. | 18:13 |
duck2 | also note that a t_metadata_dict is a std::unordered_map<std::string, std::vector<std::string>> and that's allocated for every edge | 18:14 |
hackerfoo | Yuck, strings. | 18:16 |
duck2 | hackerfoo: is it possible to just override malloc like that? | 18:16 |
hackerfoo | duck2: In C, yeah. `malloc` is defined in stdlib.h: https://en.cppreference.com/w/c/memory/malloc | 18:18 |
tpb | Title: malloc - cppreference.com (at en.cppreference.com) | 18:18 |
hackerfoo | There's a way to replace `new` in C++ as well, but I haven't done it. | 18:19 |
litghost | hackerfoo: Overriding malloc like that is a bad idea in C++. duck2: Just replace the allocator for the relevant objects | 18:19 |
litghost | duck2: Also unordered_map is likely overkill, a flat_hash_map is likely superior given that we do not mutate the metadata during runtime | 18:20 |
hackerfoo | https://en.cppreference.com/w/cpp/language/new#Allocation | 18:21 |
tpb | Title: new expression - cppreference.com (at en.cppreference.com) | 18:21 |
litghost | duck2: VTR does provide a chunk malloc, which is approximately a bump allocator, but accommodates unbounding arena size | 18:21 |
litghost | unbounded | 18:21 |
duck2 | litghost: also would be good if we don't create std::strings for every metadata string, they come as const char *s. can we provide a hash fn for them? | 18:22 |
mithro | https://github.com/YosysHQ/yosys/pull/1363 | 18:23 |
tpb | Title: Add tests for Xilinx UG901 examples by SergeyDegtyar · Pull Request #1363 · YosysHQ/yosys · GitHub (at github.com) | 18:23 |
*** craigo has joined #symbiflow | 18:23 | |
litghost | duck2: Yes, you can define a hash fn, unordered_map has a "class Hash" template parameter that can point to a hash function class thingy | 18:24 |
litghost | duck2: By default it is std::hash<T>, but it can be anything | 18:25 |
hackerfoo | If you mmap a file, you can use char * + size and not allocate anything else. That's what I do for immutable strings. Probably not an option here, though. | 18:25 |
hackerfoo | If you dedup the strings, you can just use the pointer as a hash and for comparisons. | 18:27 |
hackerfoo | And then the strings just become numbers, essentially. | 18:27 |
litghost | And the number of string keys is low (think order ~10), so it might also make sense to do vector<string> -> sort -> unique -> hand out ids | 18:27 |
litghost | e.g. a simple interning schema | 18:28 |
hackerfoo | ^ Yeah, that's more predictable than pointers, and you can put the sizes in the table. | 18:29 |
litghost | To be clear, I'm talking about the key to the std::unordered_map | 18:31 |
litghost | They are keys like "fasm_feature", "fasm_prefix", etc | 18:31 |
litghost | The data is basically line noise | 18:31 |
duck2 | wouldn't assuming ~10 unique keys be too specializing to fasm? not that I'm aware of any other use of metadata but... | 18:34 |
*** adjtm has joined #symbiflow | 18:35 | |
litghost | duck2: You could do a fallback schema if the string intern map gets too big | 18:36 |
litghost | duck2: E.g. intern up to 100 strings, and then fallback to a straight forward hash | 18:36 |
litghost | duck2: And the key becomes a tagged union, either string intern pointer or std::string | 18:37 |
hackerfoo | duck2: And here's the initial implementation for the fallback: assert(false); /// TODO handle this case | 18:37 |
litghost | duck2: I think it is safe to assume that most uses of metadata will use a limit number of identifying keys | 18:37 |
hackerfoo | Because there's no reason to implement the slow path if the fast path isn't fast enough. | 18:38 |
duck2 | litghost: makes sense | 18:41 |
*** lopsided98 has quit IRC | 18:50 | |
*** lopsided98 has joined #symbiflow | 18:52 | |
*** lopsided98 has quit IRC | 18:56 | |
*** lopsided98 has joined #symbiflow | 18:58 | |
*** freemint has quit IRC | 19:36 | |
*** freemint has joined #symbiflow | 21:31 | |
*** freemint has quit IRC | 22:04 | |
mithro | duck2: Post your end year report here please :-) | 22:22 |
duck2 | eh, apparently I didn't post before. here it is: https://docs.google.com/document/d/1SZm44g-9Bo-xD2lDfXcxb8bpzYjewGyK7bVyOYj5Gqs/edit?usp=sharing | 22:43 |
tpb | Title: GSoC2019 - SymbiFlow - Final Report - Google Docs (at docs.google.com) | 22:43 |
*** freemint has joined #symbiflow | 22:47 |
Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!