Monday, 2019-09-09

*** tpb has joined #symbiflow		00:00
*** Bertl is now known as Bertl_zZ		01:49
*** proteusguy has joined #symbiflow		05:47
*** citypw has joined #symbiflow		09:28
*** Bertl_zZ is now known as Bertl		09:29
*** craigo has joined #symbiflow		09:43
*** proteusguy has quit IRC		10:13
*** proteusguy has joined #symbiflow		10:27
*** proteusguy has quit IRC		11:20
*** adjtm has quit IRC		11:38
*** craigo has quit IRC		11:47
*** citypw has quit IRC		12:00
*** freemint has joined #symbiflow		12:10
*** adjtm has joined #symbiflow		12:13
*** freemint has quit IRC		12:33
*** freemint has joined #symbiflow		12:33
*** freemint has quit IRC		12:36
*** freemint has joined #symbiflow		12:37
*** freemint has quit IRC		13:29
*** freemint has joined #symbiflow		13:32
*** adjtm has quit IRC		13:44
*** adjtm has joined #symbiflow		13:56
*** freemint has quit IRC		14:07
*** rvalles has quit IRC		14:24
*** adjtm has quit IRC		15:20
*** adjtm has joined #symbiflow		15:20
*** Bertl is now known as Bertl_oO		15:23
*** rvalles has joined #symbiflow		15:35
mithro	duck2 set up some benchmarking for the VPR parsing code at https://github.com/duck2/vpr-rrgraph-benchmark	15:49
tpb	Title: GitHub - duck2/vpr-rrgraph-benchmark (at github.com)	15:49
*** freemint has joined #symbiflow		16:16
litghost	mithro: Assuming I'm reading that correctly, duck2 XML -> 18 seconds, duck2 capnprot -> 12 seconds	16:24
litghost	oph	16:24
litghost	Given that capnproto doesn't really parse a lot, I'm guessing there is a some headroom in the copying for improvment	16:30
litghost	If not, the mmap -> in memory datastructures is the next step	16:30
hzeller[m]	As long as the capnprot data structure is copied to the local data structure, there will still be a lot of overhead. Ideally, we can use the capnproto structs directly; so though that might need an abstraction of the access patterns first.	16:51
litghost	hzeller: I agree there will be some overhead, but 12 seconds seems excessive.	16:52
mithro	litghost / hzeller[m]: duck2 is a good person to ask	16:56
hackerfoo	Assuming the amount read >= peak memory usage, that's >= 160MB/s, which seems reasonable for random-ish access. Averge random 4k reads for my high end SSD are only ~50MB/s.	17:01
hackerfoo	How big is the capnproto rr_graph?	17:02
hackerfoo	I guess it shouldn't be random reads, though.	17:02
duck2	the current copying code is mostly "translated" from the xml reading code. From my past callgrinds, I think the most time is taken by copying the edges, where the gap between the data representations is wide.	17:03
litghost	duck2: Edges are something we should strongly consider specializing	17:04
litghost	duck2: e.g. store the edges in a dense blob of ints/shorts	17:04
hackerfoo	duck2: Can you try putting the rr_graph in a ramdisk?	17:04
litghost	duck2: Because that data is basically just a giant 2D matrix	17:04
duck2	hackerfoo: The cap'n proto graph is ~600MB. I do a warmup run in the benchmark, so the file should be in the cache when measuring 12s	17:05
*** freemint has quit IRC		17:14
*** freemint has joined #symbiflow		17:18
duck2	litghost: in the file or in memory? Even if we store the edges as such in the file, we still need to do rr_node::add_edge or vpr::add_edge_metadata which do allocations.	17:45
litghost	duck2: the allocation strategy of the edges is something that should be examined anyways	17:51
litghost	duck2: now might be a good time	17:52
duck2	litghost: is the rr_graph read-only enough to try arena allocation? currently every node manages its small vector-ish of edges. I don't know how to deal with metadata, since it's not a simple vector. however currently every node&edge has a t_metadata_dict of its own, which takes an allocation to create and another allocation to populate	17:59
litghost	> " rr_graph read-only enough to try arena allocation"	18:00
litghost	This is true when reading an rr graph	18:00
litghost	It is completely read-only	18:00
*** adjtm has quit IRC		18:07
hackerfoo	You could try a simple bump allocator: `void malloc(size_t size) { void p = mem; mem += size; return p; }` and `void free(void *p) {}`	18:07
hackerfoo	Where `mem` points to a large chunk of preallocated memory, and see how much it speeds it up.	18:08
hackerfoo	You might also want to check for overflow in `malloc`.	18:08
hackerfoo	You can use `mmap` to get a large chunk of RAM, and even resize it. `sbrk` is not recommended.	18:13
duck2	also note that a t_metadata_dict is a std::unordered_map<std::string, std::vector<std::string>> and that's allocated for every edge	18:14
hackerfoo	Yuck, strings.	18:16
duck2	hackerfoo: is it possible to just override malloc like that?	18:16
hackerfoo	duck2: In C, yeah. `malloc` is defined in stdlib.h: https://en.cppreference.com/w/c/memory/malloc	18:18
tpb	Title: malloc - cppreference.com (at en.cppreference.com)	18:18
hackerfoo	There's a way to replace `new` in C++ as well, but I haven't done it.	18:19
litghost	hackerfoo: Overriding malloc like that is a bad idea in C++. duck2: Just replace the allocator for the relevant objects	18:19
litghost	duck2: Also unordered_map is likely overkill, a flat_hash_map is likely superior given that we do not mutate the metadata during runtime	18:20
hackerfoo	https://en.cppreference.com/w/cpp/language/new#Allocation	18:21
tpb	Title: new expression - cppreference.com (at en.cppreference.com)	18:21
litghost	duck2: VTR does provide a chunk malloc, which is approximately a bump allocator, but accommodates unbounding arena size	18:21
litghost	unbounded	18:21
duck2	litghost: also would be good if we don't create std::strings for every metadata string, they come as const char *s. can we provide a hash fn for them?	18:22
mithro	https://github.com/YosysHQ/yosys/pull/1363	18:23
tpb	Title: Add tests for Xilinx UG901 examples by SergeyDegtyar · Pull Request #1363 · YosysHQ/yosys · GitHub (at github.com)	18:23
*** craigo has joined #symbiflow		18:23
litghost	duck2: Yes, you can define a hash fn, unordered_map has a "class Hash" template parameter that can point to a hash function class thingy	18:24
litghost	duck2: By default it is std::hash<T>, but it can be anything	18:25
hackerfoo	If you mmap a file, you can use char * + size and not allocate anything else. That's what I do for immutable strings. Probably not an option here, though.	18:25
hackerfoo	If you dedup the strings, you can just use the pointer as a hash and for comparisons.	18:27
hackerfoo	And then the strings just become numbers, essentially.	18:27
litghost	And the number of string keys is low (think order ~10), so it might also make sense to do vector<string> -> sort -> unique -> hand out ids	18:27
litghost	e.g. a simple interning schema	18:28
hackerfoo	^ Yeah, that's more predictable than pointers, and you can put the sizes in the table.	18:29
litghost	To be clear, I'm talking about the key to the std::unordered_map	18:31
litghost	They are keys like "fasm_feature", "fasm_prefix", etc	18:31
litghost	The data is basically line noise	18:31
duck2	wouldn't assuming ~10 unique keys be too specializing to fasm? not that I'm aware of any other use of metadata but...	18:34
*** adjtm has joined #symbiflow		18:35
litghost	duck2: You could do a fallback schema if the string intern map gets too big	18:36
litghost	duck2: E.g. intern up to 100 strings, and then fallback to a straight forward hash	18:36
litghost	duck2: And the key becomes a tagged union, either string intern pointer or std::string	18:37
hackerfoo	duck2: And here's the initial implementation for the fallback: assert(false); /// TODO handle this case	18:37
litghost	duck2: I think it is safe to assume that most uses of metadata will use a limit number of identifying keys	18:37
hackerfoo	Because there's no reason to implement the slow path if the fast path isn't fast enough.	18:38
duck2	litghost: makes sense	18:41
*** lopsided98 has quit IRC		18:50
*** lopsided98 has joined #symbiflow		18:52
*** lopsided98 has quit IRC		18:56
*** lopsided98 has joined #symbiflow		18:58
*** freemint has quit IRC		19:36
*** freemint has joined #symbiflow		21:31
*** freemint has quit IRC		22:04
mithro	duck2: Post your end year report here please :-)	22:22
duck2	eh, apparently I didn't post before. here it is: https://docs.google.com/document/d/1SZm44g-9Bo-xD2lDfXcxb8bpzYjewGyK7bVyOYj5Gqs/edit?usp=sharing	22:43
tpb	Title: GSoC2019 - SymbiFlow - Final Report - Google Docs (at docs.google.com)	22:43
*** freemint has joined #symbiflow		22:47

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!