Iterationcomplexity of a Jacobitype nonEuclidean ADMM for multiblock linearly constrained nonconvex programs
Abstract
This paper establishes the iterationcomplexity of a Jacobitype nonEuclidean proximal alternating direction method of multipliers (ADMM)
for solving multiblock linearly constrained nonconvex programs. The subproblems of this ADMM
variant can be solved in parallel and hence the method has great potential to solve large scale multiblock linearly constrained nonconvex programs.
Moreover, our analysis allows the Lagrange multiplier to be updated with a relaxation parameter in the interval .
2000 Mathematics Subject Classification:
47J22, 49M27, 90C25, 90C26, 90C30, 90C60,
65K10.
Key words: Jacobi multiblock ADMM, nonconvex program, iterationcomplexity, firstorder methods, nonEuclidean distances.
1 Introduction
This paper considers the following linearly constrained optimization problem
(1) 
where , , are proper lower semicontinuous functions, , , and .
Optimization problems such as (1) appear in many important applications such as distributed matrix factorization, distributed clustering, sparse zero variance discriminant analysis, tensor decomposition, matrix completion, and asset allocation (see, e.g., [1, 6, 24, 39, 40, 42]). Recently, some variants of the alternating direction method of multipliers (ADMM) have been successfully applied to solve some instances of the previous problem despite the lack of convexity.
In this paper we analyze the Jacobitype proximal ADMM for solving (1), which recursively computes a sequence as
(2) 
where is a penalty parameter, is a relaxation parameter, is a Bregman distance, and
(3) 
is the augmented Lagrangian function for problem (1). An important feature of this ADMM variant is that the subproblems (2) can be solved in parallel and hence the method has great potential to solve large scale multiblock linearly constrained nonconvex programs. Under the assumption that is full row rank and is a differentiable function whose gradient is Lipschitz continuous, we establish an iterationcomplexity bound for the Jacobitype ADMM (2) to obtain satisfying
(4)  
(5) 
where denotes the limiting subdifferential (see for example [32, 34]).
We briefly discuss in this paragraph the development of ADMM in the convex setting. The standard ADMM (i.e., where , for and is obtained as above but with replaced by ) was introduced in [7, 8] and its complexity analysis was first carried out in [31]. Since then several papers have obtained iterationcomplexity results for various ADMM variants (see for example [2, 9, 12, 16, 18, 3, 5, 11, 14, 25, 33]). Multiblock ADMM variants have also been extensively studied (see for example [15, 19, 26, 27, 28, 17, 4, 37, 23]). In particular, papers [17, 4, 37, 23] study the convergence and/or complexity of Jacobitype ADMM variants.
Recently, there have been a lot of interest on the study of ADMM variants for nonconvex problems (see, e.g., [13, 20, 21, 22, 35, 36, 38, 41, 10, 30, 29]). Papers [13, 22, 35, 36, 38, 41] establish convergence of the generated sequence to a stationary point of (1) under conditions which guarantee that a certain potential function associated with the augmented Lagrangian (3) satisfies the KurdykaLojasiewicz property. However, these papers do not study the iteration complexity of the proximal ADMM although their theoretical analysis are generally halfway towards accomplishing such goal. Paper [20] analyzes the convergence of variants of the ADMM for solving nonconvex consensus and sharing problems and establishes the iteration complexity of ADMM for the consensus problem. Paper [21] studies the iterationcomplexity of two linearized variants of the multiblock proximal ADMM applied to a more general problem than (1) where a coupling term is also present in its objective function. Paper [10] studies the iterationcomplexity of a proximal ADMM for the two block optimization problem, i.e., , and the relaxation parameter is arbitrarily chosen in the interval , contrary to the previous related literature where this parameter is considered as one or at most . Paper [30] analyzes the iterationcomplexity of a multiblock proximal ADMM via a general linearization scheme. Finally, while the authors were in the process of finalizing this paper, they have learned of the recent paper [29] which studies the asymptotic convergence of a Jacobitype linearized ADMM for solving nonconvex problems. The latter paper though does not deal with the issue of iterationcomplexity and considers the case of only.
Our paper is organized as follows. Subsection 1.1 contains some notation and basic results used in the paper. Section 2 describes our assumptions and contains two subsections. Subsection 2.1 introduces the concept of distance generating functions (and its corresponding Bregman distances) considered in this paper, and formally states the nonEuclidean Jacobitype ADMM. Section 2.2 is devoted to the convergence rate analysis of the latter method. Our main convergence rate result is in this subsection (Theorem 2.11). The appendix contains proofs of some results stated in the paper.
1.1 Notation and basic results
The domain of a function is the set . Moreover, is said to be proper if for some .
Lemma 1.1.
Let be a nonzero matrix and let denote the smallest positive eigenvalue of . Then, for every , there holds
Definition 1.2.
Let be a proper lower semicontinuous function.

The Fréchet subdifferential of at , denoted by , is the set of all elements satisfying
When , we set .

The limiting subdifferential of at , denoted by , is defined as

A critical (or stationary) point of is a point satisfying .
The following result presents some properties of the limiting subdifferential.
Proposition 1.3.
Let be a proper lower semicontinuous function.

If is a local minimizer of , then ;

If is a continuously differentiable function, then .
2 Jacobitype nonEuclidean proximal ADMM and its convergence rate
We start by recalling the definition of critical points of (1).
Definition 2.1.
An element is a critical point of problem (1) if
Under some mild conditions, it can be shown that if is a global minimum of (1), then there exists such that is a critical point of (1).
The augmented Lagrangian associated with problem (1) and with penalty parameter is defined as
(6) 
We assume that problem (1) satisfies the following set of conditions:

The functions , , are proper lower semicontinuous;

and ;

is differentiable with gradient Lipschitz continuous.

there exists such that where
2.1 The nonEuclidean proximal Jacobi ADMM
In this subsection, we introduce a class of distance generating functions (and its corresponding Bregman distances) which is suitable for our study. We also formally describe the nonEuclidean proximal Jacobi ADMM for solving problem (1).
Definition 2.2.
For given set and scalars , we let denote the class of realvalued functions which are differentiable on and satisfy
(7)  
(8) 
A function with is referred to as a distance generating function and its associated Bregman distance is defined as
(9) 
For every , the function will be denoted by so that
Clearly,
(10) 
We now state the nonEuclidean proximal Jacobi ADMM based on the class of distance generating functions introduced in
Definition 2.2.
In its statement and in some technical results, we denote the block of variables simply by
and the block of variables simply by
. Hence, the whole vector can also be denoted as
when there is a need to emphasize the th block. For convenience, we also extend the above for notation for and . Hence,
.
NonEuclidean Proximal Jacobi ADMM (NEPJADMM)

Define for and let be as in (A3). Let an initial point . Choose scalars , , , , and a stepsize parameter such that
(11) where (resp., ) denotes the smallest eigenvalue (resp., positive eigenvalue) of , and is given by
(12) Set and go to step 1.

For each , choose and compute an optimal solution of
(13) 
Set
(14) , and go to step (1).
end
Some comments about the NEPJADMM are in order. First, it is always possible to choose the constants , i=1,…,p, sufficiently large so as to guarantee that , i=1,…,p, are strictly positive. Second, one of the main features of NEPJADMM is that its subproblems (13) are completely independent of one another. As a result, they can all be solved in parallel which shows the potential of NEPJADMM as a suitable ADMM variant to solve large instance of (1). Third, as in the papers [10, 30], NEPJADMM allows the choice of a relaxation parameter .
2.2 Convergence Rate Analysis of the NEPJADMM
This subsection is dedicated to the convergence rate analysis of the NEPJADMM.
We first present some technical lemmas which are useful to prove our main result (Theorem 2.11). To simplify the notation, we denote by the vector generated by the NEPJADMM.
Lemma 2.3.
Consider the sequence generated by the NEPJADMM. For every , define
(15) 
and
(16) 
where
(17) 
Then, for every , we have:
(18)  
(19) 
where .
Proof.
Next result presents a recursive relation involving the displacements and .
Lemma 2.4.
Consider the sequence generated by the NEPJADMM and define
(20) 
Then, for every , we have
(21) 
where
(22) 
and are as in Lemma 2.3.
Proof.
From (15) and (19), we obtain the following relation
Using this relation and (18) with , we have
(23) 
Hence, in view of (22), relation (21) holds for every . Now, note that (20) is equivalent to . This relation combined with (22) and (23), both with , yield
Hence, in view of , relation (21) also holds for . ∎
Next we consider an auxiliary result to be used to compare consecutive terms of the sequence . See the comments immediately before the NEPJADMM about the notation used hereafter.
Lemma 2.5.
For every , and , we have
Proof.
It is easy to see thatf the gradient of the function
(24) 
is given by
and its Hessian equal to zero everywhere in . Hence, the function given in (24) is affine. The conclusion of the lemma now follows by noting that
∎
The next result compares consecutive terms of the sequence .
Lemma 2.6.
For every , we have
Proof.
First note that (13) together with the fact that imply that
Lemma 2.6 is essential to show that a certain sequence associated to is monotonically decreasing. This sequence is defined as
(26) 
where
(27)  
(28) 
(29) 
Before establishing the monotonicity property of the sequence {, we first present an upper bound on in terms of some quantities related to , and .
Proof.
From Lemma 2.6 and definitions of and , we obtain
(33)  
(34)  
(35)  
(36)  
(37) 
where the inequalities are due to CauchySchwarz inequality and by using the relation , for and , respectively.
∎
The next result compares with , defined in (31) and (22), respectively, and provides an upper bound for both elements in terms of .
Lemma 2.8.
Proof.
The proof of this lemma is given in Appendix A. ∎
The next proposition shows, in particular, that the sequence is decreasing and bounded below.
Proposition 2.9.
Proof.
Next proposition presents some convergence rate bounds for the displacements , , and in terms of some initial parameters. Our main result will follow easily from this proposition, due to the fact that the residual generated by in order to satisfy the Lagrangian system (2.1) (see Lemma 2.3) can be controlled by these displacements.
Proposition 2.10.
Let , i=1,…,p, be as in (2.1) and define