Shrinking the feature size of transistors in modern integrated circuits has been led to serious timing problems. Asynchronous digital circuits alleviate these problems to some extent. The most important part of as asynchronous digital circuits is controlling part including 33% of circuit's area, approximately. In fine-grained circuits, each gate and its corresponding controlling part is a pipeline stage. To decrease the area overhead of controlling part, one idea is the clustering of as many as possible gates in a single cluster with only one handshaking and controlling unit. In this paper two clustering algorithms are presented which reduce the area overhead of controlling part as well as maintaining functionality, throughput and latency of the circuit. These algorithms reduce controlling part’s area 12.6% at average for some ISCAS benchmark circuits. Furthermore considering slack matching there is more area overhead reduction (7.194%). The algorithms reduce the clustering run time 42.41% at average in comparison to previous clustering method.