Interleaved Agile Combinatorial Factor Decomposition (IAFD)
Modified 4/24/2019
Version 1.2
================================================
Copyright 2017 Institute for Computational Sustainability
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
================================================
*** Citation ***
This software accompanies the following publications:
Gomes, C. P., Bai, J., Xue, Y., Bjorck, J., Rappazzo, B., Ament, S., Bernstein, R., Suram, S. K., van Dover, R. B., Gregoire, J. M. (2019). CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures. MRS Communications, 1-9. DOI: 10.1557/mrc.2019.50
Bai, J., Bjorck, J., Xue, Y., Suram, S. K., Gregoire, J., & Gomes, C. (2017). Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery. Fourteenth International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming (CPAIOR), 104-112. DOI: 10.1007/978-3-319-59776-8_9
Bibtex:
@article{Gomes2019,
author = {Gomes, Carla P. and Bai, Junwen and Xue, Yexiang and Bj{\"{o}}rck, Johan and Rappazzo, Brendan and Ament, Sebastian and Bernstein, Richard and Kong, Shufeng and Suram, Santosh K. and van Dover, R. Bruce and Gregoire, John M.},
doi = {10.1557/mrc.2019.50},
issn = {2159-6859},
journal = {MRS Communications},
month = {apr},
pages = {1--9},
title = {{CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures}},
url = {https://www.cambridge.org/core/product/identifier/S2159685919000508/type/journal{\_}article},
year = {2019}
}
@article{Bai2017,
author = {Bai, Junwen and Bjorck, Johan and Xue, Yexiang and Suram, Santosh K. and Gregoire, John and Gomes, Carla},
doi = {10.1007/978-3-319-59776-8_9},
journal = {Fourteenth International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming (CPAIOR)},
pages = {104--112},
title = {{Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery}},
url = {http://link.springer.com/10.1007/978-3-319-59776-8{\_}9},
year = {2017}
}
*** Introduction and Prerequisites ***
IAFD has two external library dependencies which must be installed:
1) Armadillo, see: http://arma.sourceforge.net
2) ILOG/CPLEX Optimization Studio: https://www.ibm.com/developerworks/downloads/ws/ilogcplex/
IAFD also uses TCLAP (http://tclap.sourceforge.net/) to process command-line arguments and Tino Kluge's spline implementation (http://kluge.in-chemnitz.de/opensource/spline/). These are header-only includes and are provided in the source distribution.
*** Compilation ***
An example Makefile for GNU make and g++ is included, but will need to be modified to reflect the installation location of CPLEX, and the BLAS library (like openblas) used with Armadillo. Additional changes will be needed for alternate compilers.
*** Usage ***
./iafd [-h] [OPTIONS] --inst INSTANCE_FILENAME --m M --k K --sol SOLUTION_FILENAME
Options:
--pointCorrect
correct or further refine a single point without changing anything
else
--initH
the file for H initialization
--MatchSigmaTol
sigma boundary when matching with ICSD patterns
--MatchShiftTol
shift tolerance when matching with ICSD patterns
--ConnectAgent
whether to enforce Connectivity rule or not
--AlloyAgent
whether to enforce Alloy rule or not
--AGrounds
the rounds of AgileFD-Gibbs loop
--sticks
the file containing all the stick patterns
--AlloyTol
the tolerance for calculating shifts
--neighbors
(required) the file telling the neighbors of every sample point
--oneVersion
whether to enforce one-used-shifted-version constraint
--slice
specify the slice constraint
--Gibbs
whether to enforce Gibbs phase rule
--mipgap
(required) mipgap for MIP
--sparsity
The overall sparsity coefficient
--stepsize
Initial stepsize. default shift value 0 means the user just want to
use the std stepsize
--sampleInit
The filename containing initialzation from single-phase sample points
--valueInit
Initialization file containing seeds for phases and phase freezing
--rec
Whether output reconstructed signals
--humanInput
Human Input txt file
--shiftInfo
Whether you want to create a text file that contains information of
shifts of each sample point
--beta
The weighting coefficient of the sparsity term
--seed
random seed for the random number generator.(The default value -1
means time(0))
--c
Related to termination criterion: In one iteration, if
(old_cost-new_cost)
The maximum time(seconds) spent to train the model that you could
accept
--m
(required) The number of possible different shifts
--k
(required) The number of phases
--sol
(required) The output file name of the solution
--inst
(required) Input instance file
--addNoise
whether to add noise or not
--noiseStd
define the standard deviation if add noise
--, --ignore_rest
Ignores the rest of the labeled arguments following this flag.
--version
Displays version information and exits.
-h, --help
Displays usage information and exits.
*** Command Example ***
./iafd --inst input/Ta-Rh-Pd/Ta-Rh-Pd_inst.txt --m 30 --k 6 --time 10 --sol output/Ta-Rh-Pd_output.txt --c 1e-5 --beta 1.0 --mipgap 0.1 --Gibbs --sparsity 0.1 --neighbors input/Ta-Rh-Pd/Ta-Rh-Pd_edges.txt --AGrounds 3 --MatchShiftTol 0.1 --MatchSigmaTol 2.0 --ConnectAgent --AlloyAgent --AlloyTol 0.003 --sticks input/Ta-Rh-Pd/sticks/sticks.txt
*** Instance File Format ***
For more information, see this publication and associated datasets:
Le Bras, R., Bernstein, R., Gregoire, J. M., Suram, S. K., Gomes, C. P., Selman, B., & van Dover, R. B. (2014). A Computational Challenge Problem in Materials Discovery: Synthetic Problem Generator and Real-World Datasets. In Twenty-Eighth International Conference on Artificial Intelligence (AAAI'14).
Example:
Description=Human readable description of the instance, including origin and any preprocessing
UUID=unique-identifier
Format_Version=1.0
//Number of elements
M=3
//Element labels
Elements=Ta,Rh,Pd
//Sample count
N=197
//Coordinate labels for coordinate systems, e.g. substrate deposition and composition
Deposition=X,Y
Composition=Ta,Rh,Pd
// Coordinate values data: lists of length N
X=-4.7857,-6.81E-05,4.7857,......
Y=-33.5,-33.5,-33.5,.......
Ta=0.8072182,0.71555525,0.5822375,......
Rh=0.10984136,0.11910693,0.1169005,......
Pd=0.08294043,0.16533777,0.30086198,......
//Q values: data domain: length L
Q=16.000000,16.150000,16.300000,16.450000,16.600000,......
//Intensity measurements: length L for each of N samples
I1=3.508038,6.712338,5.096438,......
......
I197=0.000000,18.186840,19.199840,......
*** Value Initialization File Format ***
FLAG: --valueInit
A value initialization file specifies the basis patterns to use for initialization, as well as related configuration options. The format is as follows:
// Comments can be included as complete lines beginning with two slashes
// Q values corresponding to the basis vectors, in the same units as in the instance file (e.g. nm^-1 or A^-1).
// Basis vectors are resampled, so these values do not need to exactly match the ones in the instance file.
Q=1,1.1,1.2,...
// Basis patterns are only shifted to the right (positive shift) in IAFD,
// so initial or frozen basis patterns should be specified as far
// to the left as expected. The V parameter is a multiplicative shift which
// is applied to the Q vector and affects all basis patterns in the same way.
// The following effectively shifts all basis patterns 1% to the left.
V=0.99
// B1...BK specify the basis patterns you wish to initialize. The indices must be
// sequential, but fewer than K can be specified if desired.
B1=0.3123,0.545234,......
B2=0.4324,0.454243,......
B3=0.42345,1.42344,......
...
// Whether to freeze each basis pattern: 0=seed, 1=freeze
F1=1
F2=0
F3=1
...
// Lists sample indices in which each phase is allowed to appear. For example,
// phase 1 could appear at sample points 1,2,54,65,76......, etc. If S is not
// specified for an initialized basis, it can appear at any sample by default.
S1=1,2,54,65,76,......
S2=1,2,4,7,111,......
S3=67,89,111,......
...
*** Sample Initialization File Format ***
FLAG: --sampleInit
Sample initialization is an alternative to value initialization, where the desired basis
patterns are taken from samples in the instance file. To initialize phase 1 using sample
point 123 and phase 2 from sample point 56:
B1=123
B2=56
Q should not be specified because the patterns come from the instance file. However,
V, F, and S are set the same as in a value initialization file.
*** H tensor Initialization File Format ***
FLAG: --initH
Usually, users could use SOLUTION(s) generated by IAFD as the initialization file for initializing H tensor.
If users want to use an independent initialization file for intializing H tensor, please follow the following format:
H[*](1,1)=0.000000,0.000000,0.000000,0.321602,0.357336,0.168312,0.000000,0.000000,0.000000,0.000000
......
H[*](9,1)=0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
H[*](1,2)=0.000000,0.000000,0.000000,0.299825,0.333138,0.110080,0.000000,0.000000,0.000000,0.000000
......
H[*](9,2)=0.000000,0.000000,0.000000,0.056948,0.063276,0.023566,0.000000,0.000000,0.000000,0.000000
......
The m-th value in "H[*](k,n)=......" denotes the activation of the m-th shifted version of phase k at sample n.
*** ICDD sticks ***
FLAG: --sticks
Matching basis patterns with icdd sticks is embedded into the code. But the sticks should be specified in advance. Users can use FLAG "--sticks" to tell the path of the sticks file.
The file should follow the format:
Q0=6.467400,8.588200,9.204500,12.101500,12.934900,13.392000,16.587100,17.176300,17.400900,17.765100,...
P0=0.126100,0.200200,0.035000,0.032000,0.102100,0.068100,0.010000,0.006000,0.121100,0.211200,1.00000,...
Q1=17.020800,17.102500,21.050000,21.116100,21.590100,24.770400,24.890200,26.424900,29.183400,30.0547,...
P1=0.203200,0.202200,1.000000,0.983000,0.191200,0.264300,0.329300,0.119100,0.001000,0.084100,0.08410,...
...
...
Each pair (Qi,Pi) specifies one ICDD pattern (q values and corresponding intensities).
*** Neighbors ***
FLAG: --neighbors
This flag is used to specify neighbors:
0,1,7,8,9,18
1,0,9,2
2,9,10,3,1
3,2,4,10,11,12,21
4,13,3,12,5,23
5,36,6,13,24,4
6,5,14,15,24,25
7,16,17,28,0,8
......
Each line tells the neighbors of one sample. For example, "0,1,7,8,9,18" means sample 0 has neighbors 1,7,8,9,18. Note that all the edges are bidirectional. Thus sample 1 also has one neighbor sample 0.
*** Solution File Format ***
For more information, see this publication and associated datasets:
Le Bras, R., Bernstein, R., Gregoire, J. M., Suram, S. K., Gomes, C. P., Selman, B., & van Dover, R. B. (2014).
A Computational Challenge Problem in Materials Discovery: Synthetic Problem Generator and Real-World Datasets.
In Twenty-Eighth International Conference on Artificial Intelligence (AAAI'14).
Example:
Description=Human-readable description of solution method, parameters, and corresponding instance
UUID=unique-identifier-of-instance
Format_Version=1.0
// Number of phases
K=5
//List of solution models: each entry lists the variable prefixes associated with a particular mode
//[Q,R,C] or [Q,R] is required in order to provide an algorithm-agnostic representation (Q assumed to
//match instance file if not listed).
Params=[Q,R,C],[Q,B,C,S],[Q,B,H]
//Values for the listed parameters at each sample; representations for most parameters can be algorithm-specific
//Q values typically match the instance file
Q=......
//Basis vector representation
B1=......
......
B5=......
// Phase concentrations at each sample
C1=......
......
C197=......
//Representation of each phase as reconstructed at each sample
R1_1=......
......
R1_5=......
......
R_197_5=......
//Per-phase multiplicative (scalar) shift at each sample,
S1=......
......
S197=......
//H[*](k,n) Tensor (specific to IAFD - S is computed as a weighted aveage using these as weights)
H[*](1,1)=......
......
H[*](5,197)=......
*** Contact me ***
If you have any questions or suggestions, please feel free to contact me. This is my email address: jb2467@cornell.edu.