After series of TLE, I finally solved this problem by DP combining with bitmask.
We can view each bitmask as a set of "used" sticks (bit = 1 is used, 0 is not). So, bitmask "10011" means a subset of sticks {sticks[0], sticks[1], sticks[4]}.
Here comes the meaty DP part: for each bitmask B, we store:
1. value[B] : which is the sum of used sticks' length in B.
2. dp[B] : the max number of "edges" that B can achieve as part of a square. Each edge is formed by a set of sticks, with sum equals to square's side length. edge (= sum(sticks) / 4).
Let's use an example to see the meaning dp[]. Say sum(sticks) = 20, so
edge = 5. For a bitmask B meaning sticks with length {2, 3, 4, 5, 6}, we can group into 2 edges: {2,3} and {5}, with 4 and 6 remaining not grouped. We name the sum of such ungrouped sticks "residue". In this example, the residue is 10 (4 + 6).
You can observe that when a residue of B > edge, it is possible to form a square starting from B by adding more sticks.
If we flip any bit which is '1' in B to '0', we will get another bitmask B'. A bitmaks B could have multiple such flipped bitmask, as the number of '1' in B. Lets called all this flipped bitmasks "child-bitmask". The index of flipped bit is "flipped-bit" fb', it connects to a stick[fb']
The main idea of DP solution is, for every B, we try all its child bitmask B', to see what is the best dp[B] we can get by adding stick[fb']. This will needs to refer dp[B'] and residue[B'], lets list different cases:
Do this for all B' of B:
1. residue[B'] > edge :
There is no way to construct a square by adding sticks in B', Prune it!
2. If residue[B'] == edge :
This never happens (think about it!)
3. If residue[B'] < edge :
use temp variables new_dp, new_res
There could another three possibilities:
a. residue[B'] + stick[fb'] < edge:
new_dp = dp[B'], new_res = residue[B'] + stick[fb']
b. residue[B'] + stick[fb'] == edge:
new_dp = dp[B'] + 1, new_res = 0
c. residue[B'] + stick[fb'] > edge:
Same as 1. Prune it!
Finally, pick the max new_dp in B' for dp[B], and use the corresponding new_res in residue[B].
In implementation, we can scan the DP array for all B by nature ordering (0000, 0001, 0010, ...1111), since every child bitmask B' will be encountered before its parent B. (think about it!)
In order to get AC, some optimization tips:
1. We only need to try half of all bitmask -- we can always assume stick[m - 1] is not used, because of symmetry (think about it!)
2. Don't precompute for whole bitmask -- this prohibits any potential pruning, and will get you TLE.