full refactoring of conv2d dispatch, peeling and support for pad 0
- refactor (simplify) the dispatch of the possible conv2d
- simplify the peeling for conv2d ixj
- add support for padding 0 by adding a template offset on the output
- reduce memory usage (but forbid multi-thread at layer level for now)
- cleaning
- add a macro to trace function calls (in conv2d only for now)
Edited by Franck Galpin