深沉的松鼠 · SQL、Pandas和Spark:常用数据查 ...· 2 月前 · |
帅气的酱牛肉 · Using multiple ...· 2 月前 · |
从容的鼠标 · 国家金融监督管理总局· 3 月前 · |
爱笑的瀑布 · 前端性能分析工具-Lighthouse-腾讯 ...· 10 月前 · |
(2Q19)
eXist-db provides a modular mechanism to index data. This eases index development and the development of more-or-less related custom functions (in Java).
eXist ships with four index types based on this mechanism:
An index providing range and field based searches on an index built with Apache Lucene..
A Full text search index which uses Apache Lucene for indexing and query.
An NGram index will store the N-grams contained in the data's characters.
For instance, if the index is configured to index 3-grams,
<data>abcde</data>
will generate these index
entries:
A spatial index that stores some of the geometric characteristics of Geography Markup Language geometries (tested against GML version 2.1.2). For instance:
<gml:Polygon xmlns:gml="http://www.opengis.net/gml" srsName="osgb:BNG">
<gml:outerBoundaryIs>
<gml:LinearRing>
<gml:coordinates>
278515.400,187060.450 278515.150,187057.950 278516.350,187057.150 278546.700,187054.000 278580.550,187050.900 278609.500,187048.100 278609.750,187051.250 278574.750,187054.650 278544.950,187057.450 278515.400,187060.450
</gml:coordinates>
</gml:LinearRing>
</gml:outerBoundaryIs>
</gml:Polygon>
This will generate several index entries. The most important ones
The
spatial referencing system
(
osgb:BNG
for
this polygon)
The polygon itself, stored in a binary form ( Well-Known Binary )
The coordinates of its bounding box
Below is a technical explanation of the java classes that lets you interact with exist's indexing features.
The indexing architecture introduces a package,
org.exist.indexing
, responsible for index management. It is
created by
org.exist.storage.BrokerPool
which allocates
org.exist.storage.DBBroker
s to each process accessing each DB
instance. Each time a DB instance is created (most installations generally have only
one, most often called
exist
), the
initialize()
method contructs an
IndexManager
that will be available through the
getIndexManager()
method of
org.exist.storage.BrokerPool
.
public IndexManager(BrokerPool pool, Configuration config)
This constructor keeps track of the
BrokerPool
that has created
the instance and receives the database's configuration object, usually defined in an
XML file called
conf.xml
. This new entry is expected in the
configuration file:
<modules>
<module id="ngram-index" class="org.exist.indexing.ngram.NGramIndex" file="ngram.dbx" n="3"/>
<module id="spatial-index" class="org.exist.indexing.spatial.GMLHSQLIndex" connectionTimeout="100000" flushAfter="300"/>
</modules>
This defines two indexes, backed-up by their specific classes
(
class
attribute; these classes implement the
org.exist.indexing.Index
interface as will be seen below),
eventually assigns them a human-readable (writable even) identifier and passes them
custom parameters which are implementation-dependant. Then, it configures (by
calling their
configure()
method), opens (by calling their
open()
method) and keeps track of each of them.
org.exist.indexing.IndexManager
also provides these public
methods:
public BrokerPool getBrokerPool()
This returns the
org.exist.storage.BrokerPool
for which this
IndexManager
was created.
public synchronized Index getIndexById(String indexId)
A method that returns an
Index
given its class identifier (see
below). Allows custom functions to access
Index
es whatever their
human-defined name is. This is probably the only method in this class that will be
really needed by a developer.
public synchronized Index getIndexByName(String indexName)
The counterpart of the previous method. Pass the human-readable name of the
Index
as defined in the configuration.
public void shutdown()
This method is called when eXist shuts down.
close()
will be
called for every registered
Index
. That allows them to free the
resources they have allocated.
public void removeIndexes()
This method is called when
repair()
is called from
org.exist.storage.NativeBroker
.
repair()
reconstructs every index (including the
structural one) from what is contained in the persistent DOM (usually
dom.dbx
).
remove()
will be called for every registered
Index
. That allows each index to destroy its persistent storage
if it wants to do so (but it is probably suitable given that
repair()
is called when the DB and/or its indexes are
corrupted).
public void reopenIndexes()
This method is called when
repair()
is called from
org.exist.storage.NativeBroker
.
open()
will be called for every registered
Index
. That allows each index to (re)allocate the resources it
needs for its persistent storage.
Another important class is
org.exist.indexing.IndexController
which, as its name suggests, controls the way data to be indexed are dispatched to
the registered indexes, using
org.exist.indexing.IndexWorker
s
that will be described below. Each
org.exist.storage.DBBroker
constructs such an
IndexController
when it is itself constructed,
using this constructor:
public IndexController(DBBroker broker)
This registers the
broker
's
IndexWorker
s,
once for each registered
Index
. These
IndexWorker
s, that will be described below, are returned by the
getWorker()
method in
org.exist.indexing.Index
, which is usually a good place to create
such an
IndexWorker
, at least the first time it is called.
This
IndexController
will be available through the
getIndexController()
method of
org.exist.storage.DBBroker
.
Here are the other public methods:
public Map configure(NodeList configNodes, Map namespaces)
This method receives the database's configuration object, usually defined in an
XML file called
conf.xml
. Both configuration nodes and namespaces
(remember that some configuration settings including e.g. pathes need namespaces to
be defined) will be passed to the
configure()
method of each
IndexWorker
. The returned object is a
java.util.Map
that will be available from
collection.getIndexConfiguration(broker).getCustomIndexSpec(INDEX_CLASS_IDENTIFIER)
.
public IndexWorker getWorkerByIndexId(String indexId)
A method that returns an
IndexWorker
given the class identifier
of its associated
Index
identifier. Very useful to the developer
since it allows custom functions to access
IndexWorker
s whatever
the human-defined name of their
Index
is. This is probably the
only method in this class that will be really needed by a developer.
public IndexWorker getWorkerByIndexName(String indexName)
The counterpart of the previous method. For the human-readable name of the
Index
as defined in the configuration.
public void setDocument(DocumentImpl doc)
This method sets the
org.exist.dom.DocumentImpl
on which the
IndexWorker
s shall work. Calls
setDocument(doc)
on each registered
IndexWorker
.
public void setMode(int mode)
This method sets the operating mode in which the
IndexWorker
s
shall work. See below for further details on operating modes. Calls
setMode(mode)
on each registered
IndexWorker
.
public void setDocument(DocumentImpl doc, int mode)
A convenience method that sets both the
org.exist.dom.DocumentImpl
and the operating mode. Calls
setDocument(doc, mode)
on each registered
IndexWorker
.
public DocumentImpl getDocument()
Returns the
org.exist.dom.DocumentImpl
on which the
IndexWorker
s will have to work.
public int getMode()
Returns the operating mode in which the
IndexWorker
s will have
to work.
public void flush()
Called in various places when pending operations, obviously data insertion, update
or removal, have to be completed. Calls
flush()
on each
registered
IndexWorker
.
public void removeCollection(Collection collection, DBBroker broker)
Called when a collection is to be removed. That allows to delete index entries for
this collection in a single operation. Calls
removeCollection()
on each registered
IndexWorker
.
public void reindex(Txn transaction, StoredNode reindexRoot, int mode)
Called when a document is to be reindexed. Only the
reindexRoot
node and its descendants will have their index entries updated or removed depending
of the
mode
parameter.
public StoredNode getReindexRoot(StoredNode node, NodePath path)
Determines the node which should be reindexed together with its descendants. Calls
getReindexRoot()
on each registered
IndexWorker
. The top-most node will be the actual node from which
the
DBBroker
will start reindexing.
public StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
Same as above, with more parameters.
public StreamListener getStreamListener()
Returns the first
org.exist.indexing.StreamListener
in the
StreamListener
s pipeline. There is at most one
StreamListener
per
IndexWorker
that will
intercept the (re)indexed nodes stream.
IndexWorker
s that are not
interested in the data (depending of e.g. the document and/or the operating mode)
may return
null
through their
getListener()
method and thus not participate in the (re)indexing process. In other terms, they
will not listen to the indexed nodes.
public void indexNode(Txn transaction, StoredNode node, NodePath path, StreamListener listener)
Index any kind of indexable node (currently elements, attributes and text nodes ; comments and especially processing instructions might be considered in the future).
public void startElement(Txn transaction, ElementImpl node, NodePath path, StreamListener listener)
More specific than
indexNode()
. For an element. Will call
startElement()
on
listener
if it is not
null
. This is similar to Java SAX events.
public void attribute(Txn transaction, AttrImpl node, NodePath path, StreamListener listener)
More specific than
indexNode()
. For an attribute. Will call
attribute()
on
listener
if it is not
null
.
public void characters(Txn transaction, TextImpl node, NodePath path, StreamListener listener)
More specific than
indexNode()
. For a text node. Will call
characters()
on
listener
if it is not
null
.
public void endElement(Txn transaction, ElementImpl node, NodePath path, StreamListener listener)
Signals end of indexing for an element node. Will call
endElement()
on
listener
if it is not
public MatchListener getMatchListener(NodeProxy proxy)
Returns a
org.exist.indexing.MatchListener
for the given
node.
The two classes aim to be essentially used by eXist itself. As a programmer you will probably need to use just one or two of the above methods.
Now let's get into the interfaces and classes that will need to be extended by the
index programmer. The first of them is the interface
org.exist.indexing.Index
which will maintain the index
itself.
As described above, a new instance of the interface will be created by the
constructor of
org.exist.indexing.IndexManager
which calls the
interface's
newInstance()
method. No need for a constructor
then.
Here are the methods that have to be implemented in an implementation:
String getIndexId()
Returns the class identifier of the index.
String getIndexName()
Returns the human-defined name of the index, if one was defined in the configuration file.
BrokerPool getBrokerPool()
Returns the
org.exist.storage.BrokerPool
that has created the
index.
void configure(BrokerPool pool, String dataDir, Element config)
Notifies the
Index
a data directory (normally
${EXIST_HOME}/data
) and the configuration element in which it is
declared.
void open()
Method that is executed when the
Index
is opened, whatever it
means. Consider this method as an initialization and allocate the necessary
resources here.
void close()
Method that is executed when the
Index
is closed, whatever it
means. Consider this method as a finalization and free the allocated resources
here.
void sync()
Unused.
void remove()
Method that is executed when eXist requires the index content to be entitrely deleted, e.g. before repairing a corrupted database.
IndexWorker getWorker(DBBroker broker)
Returns the
IndexWorker
that operates on this
Index
on behalf of
broker
. One may want to
create a new
IndexWorker
here or pick one form a pool.
boolean checkIndex(DBBroker broker)
To be called by applications that want to implement a consistency check on the
Index
.
There is also an abstract class that implements
org.exist.indexing.Index
,
org.exist.indexing.AbstractIndex
that can be used a a basis for
most
Index
implementations. Most of its methods are abstract and
still have to be implemented in the concrete classes. These are the few concrete
methods:
public String getDataDir()
Returns the directory in which this
Index
operates. Usually
defined by
configure()
which itself receives eXist's
configuration settings.
There might be some
Index
es for which the concept of data
directory isn't accurate.
public void configure(BrokerPool pool, String dataDir, Element config)
Its minimal implementation retains the
org.exist.storage.BrokerPool
, the data directory and the
human-defined name, if defined in the configuration file (in an attribute called
id
). Sub-classes may call
super.configure()
to
retain this default behaviour.
This member is protected:
protected static String ID = "Give me an ID !"
This is where the class identifier of the
Index
is defined.
Override this member with, say,
MyClass.class.getName()
to
provide a reasonably unique identifier within your system.
The next important interface that will need to be implemented is
org.exist.indexing.IndexWorker
which is responsible for managing
the data in the index. Remember that each
org.exist.storage.DBBroker
will have such an
IndexWorker
at its disposal and that their
IndexController
will know what method of
IndexWorker
to call and when to call it.
Here are the methods that have to be implemented in the concrete implementations:
public String getIndexId()
Returns the class identifier of the index.
public String getIndexName()
Returns the human-defined name of the index, if one was defined in the configuration file.
Object configure(IndexController controller, NodeList configNodes, Map namespaces)
This method receives the database's configuration object, usually defined in an
XML file called
conf.xml
. Both configuration nodes and namespaces
(remember that some configuration settings including e.g. pathes need namespaces to
be defined) will be passed to the
configure()
method of the
IndexWorker
's
IndexController
. The
IndexWorker
can use this method to retain custom configuration
options in a custom object that will be available in the
java.util.Map
returned by
collection.getIndexConfiguration(broker).getCustomIndexSpec(INDEX_CLASS_IDENTIFIER)
.
The return type is free but will probably generally be an implementation of
java.util.Collection
in order to retain several parameters.
void setDocument(DocumentImpl doc)
This method sets the
org.exist.dom.DocumentImpl
on which this
IndexWorker
will have to work.
void setMode(int mode)
This method sets the operating mode in which this
IndexWorker
will have to work. See below for further details on operating modes.
void setDocument(DocumentImpl doc, int mode)
A convenience method that sets both the
org.exist.dom.DocumentImpl
and the operating mode.
DocumentImpl getDocument()
Returns the
org.exist.dom.DocumentImpl
on which this
IndexWorker
will have to work.
int getMode()
Returns the operating mode in which this
IndexWorker
will have
to work.
void flush()
Called periodically by the
IndexController
or by any other
process. That is where data insertion, update or removal should actually take
place.
void removeCollection(Collection collection, DBBroker broker)
Called when a collection is to be removed. That allows to delete index entries for
this collection in a single operation without a need for a
StreamListener
(see below) or a call to
setMode()
nor
setDocument()
.
StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
Determines the node which should be reindexed together with its descendants. This
will give a hint to the
IndexController
to determine from which
node reindexing should start.
StreamListener getListener()
Returns a
StreamListener
that will intercept the (re)indexed
nodes stream.
IndexWorker
s that are not interested in the data
(depending of e.g. the document and/or the operating mode) may return
null
here.
MatchListener getMatchListener(NodeProxy proxy)
Returns a
org.exist.indexing.MatchListener
for the given
node.
boolean checkIndex(DBBroker broker)
To be called by applications that want to implement a consistency check on the index.
Occurrences[] scanIndex(DocumentSet docs)
Returns an array of
org.exist.dom.DocumentImpl.Occurrences
that
is an
ordered
list of the index entries, in a textual form,
associated with the number of occurences for the entries and a list of the documents
containing them.
For some indexes, the concept of ordered or textual occurrences might not be meaningful.
The interface
org.exist.indexing.StreamListener
has these
public members:
public final static int UNKNOWN = -1;
public final static int STORE = 0;
public final static int REMOVE_ALL_NODES = 1;
public final static int REMOVE_SOME_NODES = 2;
Obviously, they are used by the
setMode()
method in
org.exist.indexing.IndexController
which is istself called by the
different
org.exist.storage.DBBroker
s when they have to (re)index
a node and its descendants. As their name suggests, there is a mode for storing
nodes and two modes for removing them from the indexes. The difference between
StreamListener.REMOVE_ALL_NODES
and
StreamListener.REMOVE_SOME_NODES
is that the former removes all
the nodes from a document whereas the latter removes only some nodes from a
document, usually the descendants of the node returned by
getReindexRoot()
. We thus have the opportunity to trigger a
process that will directly remove all the nodes from a given document without having
to listen to each of them. Such a technique is described below.
Here are the methods that must be implement by an implemetation:
IndexWorker getWorker()
Returns the
IndexWorker
that owns this
StreamListener
.
void setNextInChain(StreamListener listener);
Should not be used. Used to specify which is the next
StreamListener
in the
IndexController
's
StreamListener
s pipeline.
StreamListener getNextInChain();
Returns the next
StreamListener
in the
IndexController
's
StreamListener
s pipeline.
Very important because it is the responsability of the
StreamListener
to forward the event stream to the next
StreamListener
in the pipeline.
void startElement(Txn transaction, ElementImpl element, NodePath path)
Signals the start of an element to the listener.
void attribute(Txn transaction, AttrImpl attrib, NodePath path)
Passes an attribute to the listener.
void characters(Txn transaction, TextImpl text, NodePath path)
Passes some character data to the listener.
void endElement(Txn transaction, ElementImpl element, NodePath path)
Signals the end of an element to the listener. Allow to free any temporary
resource created since the matching
startElement()
has been
called.
Beside the
StreamListener
interface, each custom listener
should extend
org.exist.indexing.AbstractStreamListener
.
This abstract class provides concrete implementations for
setNextInChain()
and
getNextInChain()
that
should normally never be overridden.
It also provides dummy
startElement()
,
attribute()
,
characters()
,
endElement()
methods that do nothing but forwarding the node to
the next
StreamListener
in the
IndexController
's
StreamListener
s
pipeline.
public abstract IndexWorker getWorker()
These remain abstract, since we still can not know what
IndexWorker
will own the
Listener
until we
have a concrete implementation.
To demonstrate how modular eXist
Index
es are, we have decided to
show how a spatial
Index
could be implemented. What makes its design
interesting is that this kind of
Index
doesn't store character data
from the document, nor does it use a
org.exist.storage.index.BFile
to
store the index entries. Instead, we will store WKB index entries in a JDBC database,
namely a
HSQLDB
to keep the distribution as
light as possible and reduce the number of external dependencies, but it wouldn't be too
difficult to use another one like
PostGIS
given that the implementation has itself been designed in a quite
modular way.
In eXist's Git repository, the modularized
Index
es code is in
extensions/indexes
and the file system's architecture is designed to
follow eXist's core architecture, i.e.
org.exist.indexing.*
for the
Index
es and
org.exist.xquery.*
for their
associated
Module
s. There is also a dedicated location for required
external libraries and for the test cases. The build system should normally be able to
download the required libraries and build all the files automatically, and even launch
the tests provided that the DB's configuration file declares the
Index
es (see above) and their associated
Module
s
(see below).
The described spatial
Index
heavily relies on the excellent open
source librairies provided by the
Geotools
project. We have experienced a few problems that will be mentioned
further, but since feedback has been provided, the situation will unquestionably improve
in the future, making current workarounds redundant.
The
Index
has been tested with only one file which is available
from the
Ordnance Survey
of Great-Britain
, a topography layer of Port-Talbot, which is available as
sample data
.
Well, in fact we will start by writing an abstract implementation first. As said
above, we have planned a modular JDBC spatial
Index
, which will
be abstract, and that will be extended by a concrete HSQLDB
Index
.
Let's start with this:
package org.exist.indexing.spatial; public abstract class AbstractGMLJDBCIndex extends AbstractIndex { public final static String ID = AbstractGMLJDBCIndex.class.getName(); private final static Logger LOG = Logger.getLogger(AbstractGMLJDBCIndex.class); protected HashMap workers = new HashMap(); protected Connection conn = null;Here we define an abstract class that extends
org.exist.indexing.AbstractIndex
and thus implementsorg.exist.indexing.Index
. We also define a few members likeID
that will be returned by the unoverridengetIndexId()
fromorg.exist.indexing.AbstractIndex
, aLogger
, ajava.util.HashMap
that will be a "pool" ofIndexWorker
s (one for eachorg.exist.storage.DBBroker
) and ajava.sql.Connection
that will handle the database operations at the index level.Let' now introduce this general purpose interface:
public interface SpatialOperator { public static int UNKNOWN = -1; public static int EQUALS = 1; public static int DISJOINT = 2; public static int INTERSECTS = 3; public static int TOUCHES = 4; public static int CROSSES = 5; public static int WITHIN = 6; public static int CONTAINS = 7; public static int OVERLAPS = 8;This defines the spatial operators that will be used by spatial queries. For more information about the semantics, see the JTS documentation (chapter 11). We will use this wonderful library everytime a spatial computation is required. So does the Geotools project by the way.
Here are a few concrete methods that should be usable by any JDBC-enabled database:
public AbstractGMLJDBCIndex() { public void configure(BrokerPool pool, String dataDir, Element config) throws DatabaseConfigurationException { super.configure(pool, dataDir, config); try { checkDatabase(); } catch (ClassNotFoundException e) { throw new DatabaseConfigurationException(e.getMessage()); } catch (SQLException e) { throw new DatabaseConfigurationException(e.getMessage()); public void open() throws DatabaseConfigurationException { public void close() throws DBException { Iterator i = workers.values().iterator(); while (i.hasNext()) { AbstractGMLJDBCIndexWorker worker = (AbstractGMLJDBCIndexWorker)i.next(); worker.flush(); worker.setDocument(null, StreamListener.UNKNOWN); shutdownDatabase(); public void sync() throws DBException { public void remove() throws DBException { Iterator i = workers.values().iterator(); while (i.hasNext()) { AbstractGMLJDBCIndexWorker worker = (AbstractGMLJDBCIndexWorker)i.next(); worker.flush(); worker.setDocument(null, StreamListener.UNKNOWN); removeIndexContent(); shutdownDatabase(); deleteDatabase(); public boolean checkIndex(DBBroker broker) { return getWorker(broker).checkIndex(broker);First, an empty constructor, not even necessary since the
Index
is created through thenewInstance()
method of its interface (see above).Then, a configuration method that calls its ancestor, whose behaviour fullfills our needs. This method calls a
checkDatabase()
method whose semantics will be dependant of the underlying DB. The basic idea is to prevent eXist to continue its initialization if there is a problem with the DB.Then we will do nothing during
open()
. No need to open a database, which is costly, if we dont need it.The
close()
will flush any pending operation currently queued by theIndexWorker
s and resets their state in order to prevent them to start any further operation, which should never be possible if eXist is their only user. Then it will call ashutdownDatabase()
method whose semantics will be dependant of the underlying DB. They can be fairly simple for DBs that shut down automatically when the virtual machine shuts down.The
sync()
is never called by eXist. It's here to make the interface happy.The
remove()
method is similar toclose()
. It then calls two database-dependant methods that are pretty redundant.deleteDatabase()
will probably not be able to do what its name suggests if eXist doesn't own the admin rights. Conversely,removeIndexContent()
wiould probably have nothing to do if eXist owns the admin rights since physically destroying the DB would probably be more efficient than deleteing table contents.
checkIndex()
will delegate the task to thebroker
'sIndexWorker
.The remaining methods are DB-dependant and thus abstract:
public abstract IndexWorker getWorker(DBBroker broker); protected abstract void checkDatabase() throws ClassNotFoundException, SQLException; protected abstract void shutdownDatabase() throws DBException; protected abstract void deleteDatabase() throws DBException; protected abstract void removeIndexContent() throws DBException; protected abstract Connection acquireConnection(DBBroker broker) throws SQLException; protected abstract void releaseConnection(DBBroker broker) throws SQLException;Let's see now how a HSQL-dependant implementation would be going by describing the concrete class:
package org.exist.indexing.spatial; public class GMLHSQLIndex extends AbstractGMLJDBCIndex { private final static Logger LOG = Logger.getLogger(GMLHSQLIndex.class); public static String db_file_name_prefix = "spatial_index"; public static String TABLE_NAME = "SPATIAL_INDEX_V1"; private DBBroker connectionOwner = null; private long connectionTimeout = 100000L; public GMLHSQLIndex() {Of course, we extend
org.exist.indexing.spatial.AbstractGMLJDBCIndex
, then a few members are defined: aLogger
, a file prefix (which will be required by the files required by HSQLDB storage, namelyspatial_index.lck
,spatial_index.log
,spatial_index.script
andspatial_index.properties
), then a table name in which the spatial index data will be stored, then a variable that will hold theorg.exist.storage.DBBroker
that currently holds a connection to the DB (we could have used anIndexWorker
here, given their 1:1 relationship). The problem is that we run HSQLDB in embedded mode and that only one connection is available at a given time.A more elaborated DBMS, or HSQLDB running in server mode would permit the allocation of one connection per
IndexWorker
, but we have chosen to keep things simple for now. Indeed, ifIndexWorker
s are thread-safe (because eachorg.exist.storage.DBBroker
operates within its own thread), a single connection will have to be controlled by theIndex
which is controlled by theorg.exist.storage.BrokerPool
. See below how we will handle concurrency, given such perequisites.The last member is the timeout when a
Connection
to the DB is requested.As we can see, we have an empty constructor again.
The next method calls its ancestor's
configure()
method and just retains the content of theconnectionTimeout
attribute as defined in the configuration file.public void configure(BrokerPool pool, String dataDir, Element config) throws DatabaseConfigurationException { super.configure(pool, dataDir, config); String param = ((Element)config).getAttribute("connectionTimeout"); if (param != null) { try { connectionTimeout = Long.parseLong(param); } catch (NumberFormatException e) { LOG.error("Invalid value for 'connectionTimeout'", e);The next method is also quite straightforward:
public IndexWorker getWorker(DBBroker broker) { GMLHSQLIndexWorker worker = (GMLHSQLIndexWorker)workers.get(broker); if (worker == null) { worker = new GMLHSQLIndexWorker(this, broker); workers.put(broker, worker); return worker;This picks an
IndexWorker
(more precisely aorg.exist.indexing.spatial.GMLHSQLIndexWorker
that will be described below) for the givenbroker
from the "pool". If needed, namely the first time the method is called with with parameter, it creates one. Notice that thisIndexWorker
is DB-dependant. It will be described below.Then come a few general-purpose methods:
protected void checkDatabase() throws ClassNotFoundException, SQLException { Class.forName("org.hsqldb.jdbcDriver"); protected void shutdownDatabase() throws DBException { try { if (conn != null) { Statement stmt = conn.createStatement(); stmt.executeQuery("SHUTDOWN"); stmt.close(); conn.close(); if (LOG.isDebugEnabled()) LOG.debug("GML index: " + getDataDir() + "/" + db_file_name_prefix + " closed"); } catch (SQLException e) { throw new DBException(e.getMessage()); } finally { conn = null; protected void deleteDatabase() throws DBException { File directory = new File(getDataDir()); File[] files = directory.listFiles( new FilenameFilter() { public boolean accept(File dir, String name) { return name.startsWith(db_file_name_prefix); boolean deleted = true; for (int i = 0; i < files.length ; i++) { deleted &= files[i].delete(); protected void removeIndexContent() throws DBException { try { //Let's be lazy here : we only delete th index content if we have a connection if (conn != null) { Statement stmt = conn.createStatement(); int nodeCount = stmt.executeUpdate("DELETE FROM " + GMLHSQLIndex.TABLE_NAME + ";"); stmt.close(); if (LOG.isDebugEnabled()) LOG.debug("GML index: " + getDataDir() + "/" + db_file_name_prefix + ". " + nodeCount + " nodes removed"); } catch (SQLException e) { throw new DBException(e.getMessage());
checkDatabase()
just checks that we have a suitable driver in the CLASSPATH. We don't want to open the database right now. It costs too much.
shutdownDatabase()
is just one of the many ways to shutdown a HSQLDB.
deleteDatabase()
is just a file system management problem ; remember that the database should be closed at that moment no file locking issues.
removeIndexContent()
deletes the table that contains spatial data. As explained above, this is less efficient than deleteing the whole databse though!The two next methods are totally JDBC-specific and, given the way they are implemented, are totally embedded HSQLDB-specific. The current code is directly adapted from
org.exist.storage.lock.ReentrantReadWriteLock
to show that connection management should be severely controlled given the concurrency context induced by using manyorg.exist.storage.DBBroker
. Despite the factDBBroker
s are thread-safe, access to shared storage must be concurrential, in particular whenflush()
is called.
org.exist.storage.index.BFile
users would callgetLock()
to acquire and release locks on the index files. Our solution is thus very similar.However, since most JDBC databases are able to work in a concurrential context, it would then be better to never call these
Index
-level methods from theIndexWorker
s and let eachIndexWorker
handle its connection to the underlying DB.protected Connection acquireConnection(DBBroker broker) throws SQLException { synchronized (this) { if (connectionOwner == null) { connectionOwner = broker; if (conn == null) initializeConnection(); return conn; } else { long waitTime = connectionTimeout; long waitTime = timeOut_; long start = System.currentTimeMillis(); try { for (;;) { wait(waitTime); if (connectionOwner == null) { connectionOwner = broker; if (conn == null) //We should never get there since the connection should have been initialized //by the first request from a worker initializeConnection(); return conn; } else { waitTime = timeOut_ - (System.currentTimeMillis() - start); if (waitTime <= 0) { LOG.error("Time out while trying to get connection"); } catch (InterruptedException ex) { notify(); throw new RuntimeException("interrupted while waiting for lock"); protected synchronized void releaseConnection(DBBroker broker) throws SQLException { if (connectionOwner == null) throw new SQLException("Attempted to release a connection that wasn't acquired"); connectionOwner = null;
acquireConnection()
acquires an exclusive JDBCConnection
to the storage engine for anIndexWorker
(or aorg.exist.storage.DBBroker
, which roughly means the same thing). This is where aConnection
is created if necessary (see below) and makes the first connection's performance cost due only when needed.
releaseConnection()
marks the connection as being unused. It will thus become available when requested again.The last method concentrates the index-level DB-dependant code in just one place (
removeIndexContent()
is relatively DB-independant).private void initializeConnection() throws SQLException { System.setProperty("hsqldb.cache_scale", "11"); System.setProperty("hsqldb.cache_size_scale", "12"); System.setProperty("hsqldb.default_table_type", "cached"); //Get a connection to the DB... and keep it this.conn = DriverManager.getConnection("jdbc:hsqldb:" + getDataDir() + "/" + db_file_name_prefix, "sa", ""); try { ResultSet rs = this.conn.getMetaData().getTables(null, null, TABLE_NAME, new String[] { "TABLE" }); rs.last(); if (rs.getRow() == 1) { if (LOG.isDebugEnabled()) LOG.debug("Opened GML index: " + getDataDir() + "/" + db_file_name_prefix); //Create the data structure if it doesn't exist } else if (rs.getRow() == 0) { Statement stmt = conn.createStatement(); stmt.executeUpdate("CREATE TABLE " + TABLE_NAME + "(" + /*1*/ "DOCUMENT_URI VARCHAR, " + /*2*/ "NODE_ID_UNITS INTEGER, " + /*3*/ "NODE_ID BINARY, " + /*26*/ "IS_VALID BOOLEAN, " + //Enforce uniqueness "UNIQUE (" + "DOCUMENT_URI, NODE_ID_UNITS, NODE_ID" + ")" + stmt.executeUpdate("CREATE INDEX DOCUMENT_URI ON " + TABLE_NAME + " (DOCUMENT_URI);"); stmt.close(); if (LOG.isDebugEnabled()) LOG.debug("Created GML index: " + getDataDir() + "/" + db_file_name_prefix); } else { throw new SQLException("2 tables with the same name ?"); } finally { if (rs != null) rs.close();This method opens a
Connection
and, if it is a new one (the new one since we only have one), checks that we have a SQL table for the spatial data. If not, i.e. if the spatial index doesn't exist yet, a table is created with the following structure:Uniqueness will be enforced on a
(DOCUMENT_URI, NODE_ID_UNITS, NODE_ID)
basis. Indeed, we can have at most one index entry for a given node in a given document.Also, indexes are created on these fields to help queries:
DOCUMENT_URI
NODE_ID
GEOMETRY_TYPE
EPSG4326_WKB
EPSG4326_MINX
EPSG4326_MAXX
EPSG4326_MINY
EPSG4326_MAXY
EPSG4326_CENTROID_X
EPSG4326_CENTROID_Y
Every geometry will be internally stored in both its original SRS and in the epsg:4326 SRS. Having this kind of common, world-wide applicable, SRS for all geometries in the index allows to make operations on them even if they are originally defined in different SRSes.
Important:
By default, eXist's build will download the lightweight
gt2-epsg-wkt-XXX.jar
library which lacks some parameters, the Bursa-Wolf ones. A better accuracy for geographic transformations might be obtained by using a heavier library likegt2-epsg-hsql-XXX.jar
which is documented here.Writing the concrete implementation of org.exist.indexing.IndexWorker
Just like for
org.exist.indexing.spatial.AbstractGMLJDBCIndex
, we will start to design a database-independant abstract class. This class should normally be the basis of every JDBC spatial index. It will handle most of the hard work.Let's start by a few members and a few general-purpose public methods:
package org.exist.indexing.spatial; public abstract class AbstractGMLJDBCIndexWorker implements IndexWorker { public static String GML_NS = "http://www.opengis.net/gml"; protected final static String INDEX_ELEMENT = "gml"; private static final Logger LOG = Logger.getLogger(AbstractGMLJDBCIndexWorker.class); protected IndexController controller; protected AbstractGMLJDBCIndex index; protected DBBroker broker; protected int currentMode = StreamListener.UNKNOWN; protected DocumentImpl currentDoc = null; private boolean isDocumentGMLAware = false; protected Map geometries = new TreeMap(); NodeId currentNodeId = null; Geometry streamedGeometry = null; boolean documentDeleted = false; int flushAfter = -1; protected GMLHandlerJTS geometryHandler = new GeometryHandler(); protected GMLFilterGeometry geometryFilter = new GMLFilterGeometry(geometryHandler); protected GMLFilterDocument geometryDocument = new GMLFilterDocument(geometryFilter); protected TreeMap transformations = new TreeMap(); protected boolean useLenientMode = false; protected GMLStreamListener gmlStreamListener = new GMLStreamListener(); protected GeometryCoordinateSequenceTransformer coordinateTransformer = new GeometryCoordinateSequenceTransformer(); protected GeometryTransformer gmlTransformer = new GeometryTransformer(); protected WKBWriter wkbWriter = new WKBWriter(); protected WKBReader wkbReader = new WKBReader(); protected WKTWriter wktWriter = new WKTWriter(); protected WKTReader wktReader = new WKTReader(); protected Base64Encoder base64Encoder = new Base64Encoder(); protected Base64Decoder base64Decoder = new Base64Decoder(); public AbstractGMLJDBCIndexWorker(AbstractGMLJDBCIndex index, DBBroker broker) { this.index = index; this.broker = broker; public String getIndexId() { return AbstractGMLJDBCIndex.ID; public String getIndexName() { return index.getIndexName(); public Index getIndex() { return index;Of course,
org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
implementsorg.exist.indexing.IndexWorker
.
GML_NS
is the GML namespace for which the spatial index is specially designed. Use this public member to avoid redundancy and, worse, inconsistencies.
INDEX_ELEMENT
is the configuration's element name which is accurate for ourIndex
configuration. To configure a collection in order to index its GML data, define such a configuration:<collection xmlns="http://exist-db.org/collection-config/1.0"> <index> <gml flushAfter="200"/> </index> </collection>
Got the
<gml>
element? We will shortly see how this information is able to configure ourIndexWorker
.
controller
,index
andbroker
should now be quite straightforward.
currentMode
andcurrentDoc
should also be straightforward.
geometries
is a collection ofcom.vividsolutions.jts.geom.Geometry
instances that are currently held in memory, waiting for being "flushed" to the database. Depending ofcurrentMode
, they're pending for insertion or removal.
currentNodeId
is used to share the ID of the node currently being processed between the different inner classes.
streamedGeometry
is the lastcom.vividsolutions.jts.geom.Geometry
that has been generated by GML parsing. It isnull
if the geometry is topologically not well formed. This latter case is maybe a too restrictive feature of Geotools parser which also throwsNullPointerException
s (!) if the GML is somehow not well-formed. See GEOT-742 for more information on this issue.
documentDeleted
is a flag indicating that the current document has been deleted and that we don't have to process it any more. Remember thatStreamListener.REMOVE_ALL_NODES
send some events for all nodes.
flushAfter
will hold our configuration's setting.
geometryHandler
is our GML geometries SAX handler that will convert GML to acom.vividsolutions.jts.geom.Geometry
instance. It is included in a handler chain composed ofgeometryFilter
andgeometryDocument
.
transforms
will cache a list a transformations between a source and a target SRS.
useLenientMode
will be set totrue
is the transformation libraries that are in the CLASSPATH don't have the Bursa-Wolf parameters. Transformations will be attempted, but with a precision loss (see above).
gmlStreamListener
is our own implementation oforg.exist.indexing.StreamListener
. Since there is a 1:1 (or even 1:0) relationship with theIndexWorker
, it will be implemented as an inner class and will be described below.
coordinateTransformer
will be needed duringGeometry
transformations to other SRSes.
gmlTransformer
will be needed duringGeometry
transformations to XML.
wkbWriter
andwkbReader
will be needed duringGeometry
serialization and deserialization to and from the database.
wktWriter
andwktReader
will be needed duringGeometry
WKT serialization and deserialization to and from the database. WKT could be dynamically generated fromGeometry
but we have chosen to store it in the HSQLDB.
base64Encoder
andbase64Decoder
will be needed to convert binary date, namely WKB, to XML types, namelyxs:base64Binary
.No need to comment the methods, expect maybe
getIndexId()
that will return the static ID of theIndex
. No chance to be wrong with such a design.The next method is a bit specific:
public Object configure(IndexController controller, NodeList configNodes, Map namespaces) throws DatabaseConfigurationException { this.controller = controller; Map map = null; for(int i = 0; i < configNodes.getLength(); i++) { Node node = configNodes.item(i); if (node.getNodeType() == Node.ELEMENT_NODE && INDEX_ELEMENT.equals(node.getLocalName())) { map = new TreeMap(); GMLIndexConfig config = new GMLIndexConfig(namespaces, (Element)node); map.put(AbstractGMLJDBCIndex.ID, config); return map;It is only interested in the
<gml>
element of the configuration. If it finds one, it creates aorg.exist.indexing.spatial.GMLIndexConfig
instance wich is a very simple class:package org.exist.indexing.spatial; public class GMLIndexConfig { private static final Logger LOG = Logger.getLogger(GMLIndexConfig.class); private final static String FLUSH_AFTER = "flushAfter"; private int flushAfter = -1; public GMLIndexConfig(Map namespaces, Element node) throws DatabaseConfigurationException { String param = ((Element)node).getAttribute(FLUSH_AFTER); if (param != null && !"".equals(param)) {) { try { flushAfter = Integer.parseInt(param); } catch (NumberFormatException e) { LOG.info("Invalid value for '" + FLUSH_AFTER + "'", e); public int getFlushAfter() { return flushAfter;This retains the configuration attribute and provides a getter for it.
This configuration object is saved in a Map with the
Index
ID and will be available as shown in the next method:public void setDocument(DocumentImpl document) { isDocumentGMLAware = false; documentDeleted= false; if (document != null) { IndexSpec idxConf = document.getCollection().getIndexConfiguration(document.getBroker()); if (idxConf != null) { Map collectionConfig = (Map) idxConf.getCustomIndexSpec(AbstractGMLJDBCIndex.ID); if (collectionConfig != null) { isDocumentGMLAware = true; if (collectionConfig.get(AbstractGMLJDBCIndex.ID) != null) flushAfter = ((GMLIndexConfig)collectionConfig. get(AbstractGMLJDBCIndex.ID)).getFlushAfter(); if (isDocumentGMLAware) { currentDoc = document; } else { currentDoc = null; currentMode = StreamListener.UNKNOWN;The objective is to determine if
document
should be indexed by the spatialIndex
.For this, we look up its collection configuration and try to find a "custom" index specification for our
Index
. If one is found, ourdocument
will be processed by theIndexWorker
. We also take advantage of this process to set one of our members. Ifdocument
doesn't interest ourIndexWorker
, we reset some members to avoid having an inconsistent sticky state.The next methods don't require any particular comment:
public void setDocument(DocumentImpl doc, int mode) { setDocument(doc); setMode(mode); public DocumentImpl getDocument() { return currentDoc; public int getMode() { return currentMode;The next method is somehow tricky:
public StreamListener getListener() { if (currentDoc == null || currentMode == StreamListener.REMOVE_ALL_NODES) return null; return gmlStreamListener;It doesn't return any
StreamListener
in theStreamListner.REMOVE_ALL_NODES
. It would be totally unnecessary to listen at every node whereas a JDBC database will be able to delete all the document's nodes in one single statement.The next method is a place holder that needs more thinking. How to highlight a geometric information smartly?
public MatchListener getMatchListener(NodeProxy proxy) { return null;The next method computes the reindexing root. We will go bottom-up form the not to be modified until the top-most element in the GML namespace. Indeed, GML allows "nested" or "multi" geometries. If a single part of such
Geometry
is modified, the whole geometry has to be recomputed.public StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf) { if (!isDocumentGMLAware) //Not concerned return null; StoredNode topMost = node; StoredNode currentNode = node; for (int i = path.length() ; i > 0; i--) { currentNode = (StoredNode)currentNode.getParentNode(); if (GML_NS.equals(currentNode.getNamespaceURI())) topMost = currentNode; return topMost;The next method delegates the write operations:
public void flush() { if (!isDocumentGMLAware) //Not concerned return; //Is the job already done ? if (currentMode == StreamListener.REMOVE_ALL_NODES && documentDeleted) return; Connection conn = null; try { conn = acquireConnection(); conn.setAutoCommit(false); switch (currentMode) { case StreamListener.STORE : saveDocumentNodes(conn); break; case StreamListener.REMOVE_SOME_NODES : dropDocumentNode(conn); break; case StreamListener.REMOVE_ALL_NODES: removeDocument(conn); documentDeleted = true; break; conn.commit(); } catch (SQLException e) { LOG.error("Document: " + currentDoc + " NodeID: " + currentNodeId, e); try { conn.rollback(); } catch (SQLException ee) { LOG.error(ee); } finally { try { if (conn != null) { conn.setAutoCommit(true); releaseConnection(conn); } catch (SQLException e) { LOG.error(e);Even though its code looks thick, it proves to be a good way to acquire (then release) a
Connection
whatever the way it is provided by theIndexWorker
(see above for these aspects, concurrency in particular). It then delegates the write operations to dedicated methods, which do not have to care about theConnection
. Write operations are embedded in a transaction. Should an exception occur, it would be logged and swallowed: eXist doesn't like exceptions when it flushes its data.The next method delegates node storage:
private void saveDocumentNodes(Connection conn) throws SQLException { if (geometries.size() == 0) return; try { for (Iterator iterator = geometries.entrySet().iterator(); iterator.hasNext();) { Map.Entry entry = (Map.Entry) iterator.next(); NodeId nodeId = (NodeId)entry.getKey(); SRSGeometry srsGeometry = (SRSGeometry)entry.getValue(); try { saveGeometryNode(srsGeometry.getGeometry(), srsGeometry.getSRSName(), currentDoc, nodeId, conn); } finally { //Help the garbage collector srsGeometry = null; } finally { geometries.clear();It will call
saveGeometryNode()
(see below) passing a container inner class that will not be described given its simplicity.The next two methods are built with the same design. The first one destroys the index entry for the currently processed node and the second one removes the index entries for the whole document.
private void dropDocumentNode(Connection conn) throws SQLException { if (currentNodeId == null) return; try { boolean removed = removeDocumentNode(currentDoc, currentNodeId, conn); if (!removed) LOG.error("No data dropped for node " + currentNodeId.toString() + " from GML index"); else { if (LOG.isDebugEnabled()) LOG.debug("Dropped data for node " + currentNodeId.toString() + " from GML index"); } finally { currentNodeId = null; private void removeDocument(Connection conn) throws SQLException { if (LOG.isDebugEnabled()) LOG.debug("Dropping GML index for document " + currentDoc.getURI()); int nodeCount = removeDocument(currentDoc, conn); if (LOG.isDebugEnabled()) LOG.debug("Dropped " + nodeCount + " nodes from GML index");The next method is a mix of the designs described above. It also previously makes a check:
public void removeCollection(Collection collection, DBBroker broker) { boolean isCollectionGMLAware = false; IndexSpec idxConf = collection.getIndexConfiguration(broker); if (idxConf != null) { Map collectionConfig = (Map) idxConf.getCustomIndexSpec(AbstractGMLJDBCIndex.ID); if (collectionConfig != null) { isCollectionGMLAware = (collectionConfig != null); if (!isCollectionGMLAware) return; Connection conn = null; try { conn = acquireConnection(); if (LOG.isDebugEnabled()) LOG.debug("Dropping GML index for collection " + collection.getURI()); int nodeCount = removeCollection(collection, conn); if (LOG.isDebugEnabled()) LOG.debug("Dropped " + nodeCount + " nodes from GML index"); } catch (SQLException e) { LOG.error(e); } finally { try { if (conn != null) releaseConnection(conn); } catch (SQLException e) { LOG.error(e);Indeed, we have to check if the collection is indexable by the
Index
before trying to delete its index entries.The next methods are built on the same design (
Collection
and exception management) and will thus not be described.public NodeSet search(DBBroker broker, NodeSet contextSet, Geometry EPSG4326_geometry, int spatialOp) public Geometry getGeometryForNode(DBBroker broker, NodeProxy p, boolean getEPSG4326) throws SpatialIndexException { protected Geometry[] getGeometriesForNodes(DBBroker broker, NodeSet contextSet, boolean getEPSG4326) throws SpatialIndexException { public AtomicValue getGeometricPropertyForNode(DBBroker broker, NodeProxy p, String propertyName) throws SpatialIndexException { public ValueSequence getGeometricPropertyForNodes(DBBroker broker, NodeSet contextSet, String propertyName) throws SpatialIndexException {All these methods delegate to the following abstract methods that will have to be implemented by the DB-dependant concrete classes:
protected abstract boolean saveGeometryNode(Geometry geometry, String srsName, DocumentImpl doc, NodeId nodeId, Connection conn) throws SQLException; protected abstract boolean removeDocumentNode(DocumentImpl doc, NodeId nodeID, Connection conn) throws SQLException; protected abstract int removeDocument(DocumentImpl doc, Connection conn) throws SQLException; protected abstract int removeCollection(Collection collection, Connection conn) throws SQLException; protected abstract Connection acquireConnection() throws SQLException; protected abstract void releaseConnection(Connection conn) throws SQLException; protected abstract NodeSet search(DBBroker broker, NodeSet contextSet, Geometry EPSG4326_geometry, int spatialOp, Connection conn) throws SQLException; protected abstract Map getGeometriesForDocument(DocumentImpl doc, Connection conn) throws SQLException; protected abstract AtomicValue getGeometricPropertyForNode(DBBroker broker, NodeProxy p, Connection conn, String propertyName) throws SQLException, XPathException; protected abstract ValueSequence getGeometricPropertyForNodes(DBBroker broker, NodeSet contextSet, Connection conn, String propertyName) throws SQLException, XPathException; protected abstract Geometry getGeometryForNode(DBBroker broker, NodeProxy p, boolean getEPSG4326, Connection conn) throws SQLException; protected abstract Geometry[] getGeometriesForNodes(DBBroker broker, NodeSet contextSet, boolean getEPSG4326, Connection conn) throws SQLException; protected abstract boolean checkIndex(DBBroker broker, Connection conn) throws SQLException, SpatialIndexException;Let's have a look however at this method that doesn't need a DB-dependant implementation:
public Occurrences[] scanIndex(DocumentSet docs) { Map occurences = new TreeMap(); Connection conn = null; try { conn = acquireConnection(); //Collect the (normalized) geometries for each document for (Iterator iDoc = docs.iterator(); iDoc.hasNext();) { DocumentImpl doc = (DocumentImpl)iDoc.next(); //TODO : check if document is GML-aware ? //Aggregate the occurences between different documents for (Iterator iGeom = getGeometriesForDocument(doc, conn).entrySet().iterator(); iGeom.hasNext();) { Map.Entry entry = (Map.Entry) iGeom.next(); Geometry key = (Geometry)entry.getKey(); //Do we already have an occurence for this geometry ? Occurrences oc = (Occurrences)occurences.get(key); if (oc != null) { //Yes : increment occurence count oc.addOccurrences(oc.getOccurrences() + 1); //...and reference the document oc.addDocument(doc); } else { //No : create a new occurence with EPSG4326_WKT as "term" oc = new Occurrences((String)entry.getValue()); //... with a count set to 1 oc.addOccurrences(1); //... and reference the document oc.addDocument(doc); occurences.put(key, oc); } catch (SQLException e) { LOG.error(e); return null; } finally { try { if (conn != null) releaseConnection(conn); } catch (SQLException e) { LOG.error(e); return null; Occurrences[] result = new Occurrences[occurences.size()]; occurences.values().toArray(result); return result;Same design (
Collection
and exception management, delegation mechanism). We probably will add more like this in the future.The following methods are utility methods to stream
Geometry
instances to XML and vice-versa.public Geometry streamNodeToGeometry(XQueryContext context, NodeValue node) throws SpatialIndexException { try { context.pushDocumentContext(); try { node.toSAX(context.getBroker(), geometryDocument, null); } finally { context.popDocumentContext(); } catch (SAXException e) { throw new SpatialIndexException(e); return streamedGeometry; public Element streamGeometryToElement(Geometry geometry, String srsName, Receiver receiver) throws SpatialIndexException { String gmlString = null; try { gmlString = gmlTransformer.transform(geometry); } catch (TransformerException e) { throw new SpatialIndexException(e); try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); InputSource src = new InputSource(new StringReader(gmlString)); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader(); reader.setContentHandler((ContentHandler)receiver); reader.parse(src); Document doc = receiver.getDocument(); return doc.getDocumentElement(); } catch (ParserConfigurationException e) { throw new SpatialIndexException(e); } catch (SAXException e) { throw new SpatialIndexException(e); } catch (IOException e) { throw new SpatialIndexException(e);The first one uses a
org.geotools.gml.GMLFilterDocument
(see below) and the second one uses aorg.geotools.gml.producer.GeometryTransformer
which needs some polishing because, despite it is called a transformer, it doesn't cope easily with aHandler
and returns a...String
! See GEOT-1315.The last method is also a utility method:
public Geometry transformGeometry(Geometry geometry, String sourceCRS, String targetCRS) throws SpatialIndexException { if ("osgb:BNG".equalsIgnoreCase(sourceCRS.trim())) sourceCRS = "EPSG:27700"; if ("osgb:BNG".equalsIgnoreCase(targetCRS.trim())) targetCRS = "EPSG:27700"; MathTransform transform = (MathTransform)transformations.get(sourceCRS + "_" + targetCRS); if (transform == null) { try { try { transform = CRS.findMathTransform(CRS.decode(sourceCRS), CRS.decode(targetCRS), useLenientMode); } catch (OperationNotFoundException e) { LOG.info(e); LOG.info("Switching to lenient mode... beware of precision loss !"); useLenientMode = true; transform = CRS.findMathTransform(CRS.decode(sourceCRS), CRS.decode(targetCRS), useLenientMode); transformations.put(sourceCRS + "_" + targetCRS, transform); LOG.debug("Instantiated transformation from '" + sourceCRS + "' to '" + targetCRS + "'"); } catch (NoSuchAuthorityCodeException e) { LOG.error(e); } catch (FactoryException e) { LOG.error(e); if (transform == null) { throw new SpatialIndexException("Unable to get a transformation from '" + sourceCRS + "' to '" + targetCRS +"'"); coordinateTransformer.setMathTransform(transform); try { return coordinateTransformer.transform(geometry); } catch (TransformException e) { throw new SpatialIndexException(e);It implements a workaround for our test file SRS which isn't yet known by Geotools libraries (see GEOT-1307), then it tries to get the transformation from our cache. If it doesn't succeed, it tries to find one in the libraries that are in the CLASSPATH. Should those libraries lack the Bursa-Wolf parameters, it will make another attempt in lenient mode, which will induce a loss of accuracy. Then, it transforms the
Geometry
from itssourceCRS
to the requiredtargetCRS
.Now, let's study how the abstract methods are implement by the HSQLDB-dependant class:
package org.exist.indexing.spatial; public class GMLHSQLIndexWorker extends AbstractGMLJDBCIndexWorker { private static final Logger LOG = Logger.getLogger(GMLHSQLIndexWorker.class); public GMLHSQLIndexWorker(GMLHSQLIndex index, DBBroker broker) { super(index, broker);The only noticeable point is that we indeed extend our
org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
Now, this method will do something more interesting, store the
Geometry
associated to a node:protected boolean saveGeometryNode(Geometry geometry, String srsName, DocumentImpl doc, NodeId nodeId, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement("INSERT INTO " + GMLHSQLIndex.TABLE_NAME + "(" + /*1*/ "DOCUMENT_URI, " + /*2*/ "NODE_ID_UNITS, " + /*3*/ "NODE_ID, " + /*4*/ "GEOMETRY_TYPE, " + /*5*/ "SRS_NAME, " + /*6*/ "WKT, " + /*7*/ "WKB, " + /*8*/ "MINX, " + /*9*/ "MAXX, " + /*10*/ "MINY, " + /*11*/ "MAXY, " + /*12*/ "CENTROID_X, " + /*13*/ "CENTROID_Y, " + /*14*/ "AREA, " + //Boundary ? /*15*/ "EPSG4326_WKT, " + /*16*/ "EPSG4326_WKB, " + /*17*/ "EPSG4326_MINX, " + /*18*/ "EPSG4326_MAXX, " + /*19*/ "EPSG4326_MINY, " + /*20*/ "EPSG4326_MAXY, " + /*21*/ "EPSG4326_CENTROID_X, " + /*22*/ "EPSG4326_CENTROID_Y, " + /*23*/ "EPSG4326_AREA," + //Boundary ? /*24*/ "IS_CLOSED, " + /*25*/ "IS_SIMPLE, " + /*26*/ "IS_VALID" + ") VALUES (" + "?, ?, ?, ?, ?, " + "?, ?, ?, ?, ?, " + "?, ?, ?, ?, ?, " + "?, ?, ?, ?, ?, " + "?, ?, ?, ?, ?, " + + ")" try { Geometry EPSG4326_geometry = null; try { EPSG4326_geometry = transformGeometry(geometry, srsName, "EPSG:4326"); } catch (SpatialIndexException e) { //Transforms the exception into an SQLException. SQLException ee = new SQLException(e.getMessage()); ee.initCause(e); throw ee; /*DOCUMENT_URI*/ ps.setString(1, doc.getURI().toString()); /*NODE_ID_UNITS*/ ps.setInt(2, nodeId.units()); byte[] bytes = new byte[nodeId.size()]; nodeId.serialize(bytes, 0); /*NODE_ID*/ ps.setBytes(3, bytes); /*GEOMETRY_TYPE*/ ps.setString(4, geometry.getGeometryType()); /*SRS_NAME*/ ps.setString(5, srsName); /*WKT*/ ps.setString(6, wktWriter.write(geometry)); /*WKB*/ ps.setBytes(7, wkbWriter.write(geometry)); /*MINX*/ ps.setDouble(8, geometry.getEnvelopeInternal().getMinX()); /*MAXX*/ ps.setDouble(9, geometry.getEnvelopeInternal().getMaxX()); /*MINY*/ ps.setDouble(10, geometry.getEnvelopeInternal().getMinY()); /*MAXY*/ ps.setDouble(11, geometry.getEnvelopeInternal().getMaxY()); /*CENTROID_X*/ ps.setDouble(12, geometry.getCentroid().getCoordinate().x); /*CENTROID_Y*/ ps.setDouble(13, geometry.getCentroid().getCoordinate().y); /*AREA*/ ps.setDouble(14, geometry.getArea()); /*EPSG4326_WKT*/ ps.setString(15, wktWriter.write(EPSG4326_geometry)); /*EPSG4326_WKB*/ ps.setBytes(16, wkbWriter.write(EPSG4326_geometry)); /*EPSG4326_MINX*/ ps.setDouble(17, EPSG4326_geometry.getEnvelopeInternal().getMinX()); /*EPSG4326_MAXX*/ ps.setDouble(18, EPSG4326_geometry.getEnvelopeInternal().getMaxX()); /*EPSG4326_MINY*/ ps.setDouble(19, EPSG4326_geometry.getEnvelopeInternal().getMinY()); /*EPSG4326_MAXY*/ ps.setDouble(20, EPSG4326_geometry.getEnvelopeInternal().getMaxY()); /*EPSG4326_CENTROID_X*/ ps.setDouble(21, EPSG4326_geometry.getCentroid().getCoordinate().x); /*EPSG4326_CENTROID_Y*/ ps.setDouble(22, EPSG4326_geometry.getCentroid().getCoordinate().y); //EPSG4326_geometry.getRepresentativePoint() /*EPSG4326_AREA*/ ps.setDouble(23, EPSG4326_geometry.getArea()); //For empty Curves, isClosed is defined to have the value false. /*IS_CLOSED*/ ps.setBoolean(24, !geometry.isEmpty()); /*IS_SIMPLE*/ ps.setBoolean(25, geometry.isSimple()); //Should always be true (the GML SAX parser makes a too severe check) /*IS_VALID*/ ps.setBoolean(26, geometry.isValid()); return (ps.executeUpdate() == 1); } finally { if (ps != null) ps.close(); //Let's help the garbage collector... geometry = null;The generated SQL statement should be straightforward. We make a heavy use of the methods provided by
com.vividsolutions.jts.geom.Geometry
, both on the "native"Geometry
and on its EPSG:4326 transformation. Would could probably store other properties here (like, e.g. the geometry's boundary). OtherIndexWorker
s, especially those accessing a spatially-enabled DBMS, might prefer to store fewer properties if they can be computed dynamically at a cheap price.The next method is even much easier to understand:
protected boolean removeDocumentNode(DocumentImpl doc, NodeId nodeId, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "DELETE FROM " + GMLHSQLIndex.TABLE_NAME + " WHERE DOCUMENT_URI = ? AND NODE_ID_UNITS = ? AND NODE_ID = ?;" ps.setString(1, doc.getURI().toString()); ps.setInt(2, nodeId.units()); byte[] bytes = new byte[nodeId.size()]; nodeId.serialize(bytes, 0); ps.setBytes(3, bytes); try { return (ps.executeUpdate() == 1); } finally { if (ps != null) ps.close();This one even more so:
protected int removeDocument(DocumentImpl doc, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "DELETE FROM " + GMLHSQLIndex.TABLE_NAME + " WHERE DOCUMENT_URI = ?;" ps.setString(1, doc.getURI().toString()); try { return ps.executeUpdate(); } finally { if (ps != null) ps.close();This one however, is a little bit trickier:
protected int removeCollection(Collection collection, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "DELETE FROM " + GMLHSQLIndex.TABLE_NAME + " WHERE SUBSTRING(DOCUMENT_URI, 1, ?) = ?;" ps.setInt(1, collection.getURI().toString().length()); ps.setString(2, collection.getURI().toString()); try { return ps.executeUpdate(); } finally { if (ps != null) ps.close();This is because it makes use of a SQL function to filter the correct documents.
The two next methods are straightforward, now that we have explained that
Connection
s had to be requested from theIndex
to avoid concurrency problems on an embedded HSQLDB instance.protected Connection acquireConnection() throws SQLException { return index.acquireConnection(this.broker); protected void releaseConnection(Connection conn) throws SQLException { index.releaseConnection(this.broker);The next method is much more interesting. This is where is the core of the spatial index is:
protected NodeSet search(DBBroker broker, NodeSet contextSet, Geometry EPSG4326_geometry, int spatialOp, Connection conn) throws SQLException { String extraSelection = null; String bboxConstraint = null; switch (spatialOp) { //BBoxes are equal case SpatialOperator.EQUALS: bboxConstraint = "(EPSG4326_MINX = ? AND EPSG4326_MAXX = ?)" + " AND (EPSG4326_MINY = ? AND EPSG4326_MAXY = ?)"; break; //Nothing much we can do with the BBox at this stage case SpatialOperator.DISJOINT: //Retrieve the BBox though... extraSelection = ", EPSG4326_MINX, EPSG4326_MAXX, EPSG4326_MINY, EPSG4326_MAXY"; break; //BBoxes intersect themselves case SpatialOperator.INTERSECTS: case SpatialOperator.TOUCHES: case SpatialOperator.CROSSES: case SpatialOperator.OVERLAPS: bboxConstraint = "(EPSG4326_MAXX >= ? AND EPSG4326_MINX <= ?)" + " AND (EPSG4326_MAXY >= ? AND EPSG4326_MINY <= ?)"; break; //BBox is fully within case SpatialOperator.WITHIN: bboxConstraint = "(EPSG4326_MINX >= ? AND EPSG4326_MAXX <= ?)" + " AND (EPSG4326_MINY >= ? AND EPSG4326_MAXY <= ?)"; break; //BBox fully contains case SpatialOperator.CONTAINS: bboxConstraint = "(EPSG4326_MINX <= ? AND EPSG4326_MAXX >= ?)" + " AND (EPSG4326_MINY <= ? AND EPSG4326_MAXY >= ?)"; break; default: throw new IllegalArgumentException("Unsupported spatial operator:" + spatialOp); PreparedStatement ps = conn.prepareStatement( "SELECT EPSG4326_WKB, DOCUMENT_URI, NODE_ID_UNITS, NODE_ID" + (extraSelection == null ? "" : extraSelection) + " FROM " + GMLHSQLIndex.TABLE_NAME + (bboxConstraint == null ? "" : " WHERE " + bboxConstraint) + ";" if (bboxConstraint != null) { ps.setDouble(1, EPSG4326_geometry.getEnvelopeInternal().getMinX()); ps.setDouble(2, EPSG4326_geometry.getEnvelopeInternal().getMaxX()); ps.setDouble(3, EPSG4326_geometry.getEnvelopeInternal().getMinY()); ps.setDouble(4, EPSG4326_geometry.getEnvelopeInternal().getMaxY()); ResultSet rs = null; NodeSet result = null; try { int disjointPostFiltered = 0; rs = ps.executeQuery(); result = new ExtArrayNodeSet(); //new ExtArrayNodeSet(docs.getLength(), 250) while (rs.next()) { DocumentImpl doc = null; try { doc = (DocumentImpl)broker.getXMLResource(XmldbURI.create(rs.getString("DOCUMENT_URI"))); } catch (PermissionDeniedException e) { LOG.debug(e); //Ignore since the broker has no right on the document continue; //contextSet == null should be use to scan the whole index if (contextSet == null || contextSet.getDocumentSet().contains(doc.getDocId())) { NodeId nodeId = new DLN(rs.getInt("NODE_ID_UNITS"), rs.getBytes("NODE_ID"), 0); NodeProxy p = new NodeProxy((DocumentImpl)doc, nodeId); if (contextSet == null || contextSet.get(p) != null) { boolean geometryMatches = false; if (spatialOp == SpatialOperator.DISJOINT) { //No BBox intersection : obviously disjoint if (rs.getDouble("EPSG4326_MAXX") < EPSG4326_geometry.getEnvelopeInternal().getMinX() || rs.getDouble("EPSG4326_MINX") > EPSG4326_geometry.getEnvelopeInternal().getMaxX() || rs.getDouble("EPSG4326_MAXY") < EPSG4326_geometry.getEnvelopeInternal().getMinY() || rs.getDouble("EPSG4326_MINY") > EPSG4326_geometry.getEnvelopeInternal().getMaxY()) { geometryMatches = true; disjointPostFiltered++; //Possible match : check the geometry if (!geometryMatches) { try { Geometry geometry = wkbReader.read(rs.getBytes("EPSG4326_WKB")); switch (spatialOp) { case SpatialOperator.EQUALS: geometryMatches = geometry.equals(EPSG4326_geometry); break; case SpatialOperator.DISJOINT: geometryMatches = geometry.disjoint(EPSG4326_geometry); break; case SpatialOperator.INTERSECTS: geometryMatches = geometry.intersects(EPSG4326_geometry); break; case SpatialOperator.TOUCHES: geometryMatches = geometry.touches(EPSG4326_geometry); break; case SpatialOperator.CROSSES: geometryMatches = geometry.crosses(EPSG4326_geometry); break; case SpatialOperator.WITHIN: geometryMatches = geometry.within(EPSG4326_geometry); break; case SpatialOperator.CONTAINS: geometryMatches = geometry.contains(EPSG4326_geometry); break; case SpatialOperator.OVERLAPS: geometryMatches = geometry.overlaps(EPSG4326_geometry); break; } catch (ParseException e) { //Transforms the exception into an SQLException. //Very unlikely to happen though... SQLException ee = new SQLException(e.getMessage()); ee.initCause(e); throw ee; if (geometryMatches) result.add(p); if (LOG.isDebugEnabled()) { LOG.debug(rs.getRow() + " eligible geometries, " + result.getItemCount() + "selected" + (spatialOp == SpatialOperator.DISJOINT ? "(" + disjointPostFiltered + " post filtered)" : "")); return result; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();The trick is to filter the geometries on (fast) BBox operations first (intersecting geometries have BBox intersecting as well) which is possible in every case but for the
Spatial.DISJOINT
operator. For the latter case, we will have to fetch the BBox coordinates in order to apply a further filtering. Then, we examine the results and filter out the documents that are not in thecontextSet
.Spatial.DISJOINT
filtering is then applied to avoid the next step in case the BBoxes are themselves disjoint. Only then, we perform the costly operations, namelyGeometry
deserialization from the DB then performing spatial operations on it. Matching nodes are then returned.The next method is quite straightforward:
protected Map getGeometriesForDocument(DocumentImpl doc, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "SELECT EPSG4326_WKB, EPSG4326_WKT FROM " + GMLHSQLIndex.TABLE_NAME + " WHERE DOCUMENT_URI = ?;" ps.setString(1, doc.getURI().toString()); ResultSet rs = null; try { rs = ps.executeQuery(); Map map = new TreeMap(); while (rs.next()) { Geometry EPSG4326_geometry = wkbReader.read(rs.getBytes("EPSG4326_WKB")); //Returns the EPSG:4326 WKT for every geometry to make occurrence aggregation consistent map.put(EPSG4326_geometry, rs.getString("EPSG4326_WKT")); return map; } catch (ParseException e) { //Transforms the exception into an SQLException. //Very unlikely to happen though... SQLException ee = new SQLException(e.getMessage()); ee.initCause(e); throw ee; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();Notice that it will return EPSG:4326
Geometry
ies and that it will rethrow acom.vividsolutions.jts.io.ParseException
as ajava.sql.SQLException
.The next method is a bit more restrictive and modular:
protected Geometry getGeometryForNode(DBBroker broker, NodeProxy p, boolean getEPSG4326, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "SELECT " + (getEPSG4326 ? "EPSG4326_WKB" : "WKB") + " FROM " + GMLHSQLIndex.TABLE_NAME + " WHERE DOCUMENT_URI = ? AND NODE_ID_UNITS = ? AND NODE_ID = ?;" ps.setString(1, p.getDocument().getURI().toString()); ps.setInt(2, p.getNodeId().units()); byte[] bytes = new byte[p.getNodeId().size()]; p.getNodeId().serialize(bytes, 0); ps.setBytes(3, bytes); ResultSet rs = null; try { rs = ps.executeQuery(); if (!rs.next()) //Nothing returned return null; Geometry geometry = wkbReader.read(rs.getBytes(1)); if (rs.next()) { //Should be impossible throw new SQLException("More than one geometry for node " + p); return geometry; } catch (ParseException e) { //Transforms the exception into an SQLException. //Very unlikely to happen though... SQLException ee = new SQLException(e.getMessage()); ee.initCause(e); throw ee; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();It directly selects the right node and allows to return either the original
Geometry
, either its EPSG:4326 transformation.The next method is a generalization of the previous one:
protected Geometry[] getGeometriesForNodes(DBBroker broker, NodeSet contextSet, boolean getEPSG4326, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "SELECT " + (getEPSG4326 ? "EPSG4326_WKB" : "WKB") + ", DOCUMENT_URI, NODE_ID_UNITS, NODE_ID" + " FROM " + GMLHSQLIndex.TABLE_NAME ResultSet rs = null; try { rs = ps.executeQuery(); Geometry[] result = new Geometry[contextSet.getLength()]; int index= 0; while (rs.next()) { DocumentImpl doc = null; try { doc = (DocumentImpl)broker.getXMLResource(XmldbURI.create(rs.getString("DOCUMENT_URI"))); } catch (PermissionDeniedException e) { LOG.debug(e); result[index++] = null; //Ignore since the broker has no right on the document continue; if (contextSet.getDocumentSet().contains(doc.getDocId())) { NodeId nodeId = new DLN(rs.getInt("NODE_ID_UNITS"), rs.getBytes("NODE_ID"), 0); NodeProxy p = new NodeProxy((DocumentImpl)doc, nodeId); if (contextSet.get(p) != null) { Geometry geometry = wkbReader.read(rs.getBytes(1)); result[index++] = geometry; return result; } catch (ParseException e) { //Transforms the exception into an SQLException. //Very unlikely to happen though... SQLException ee = new SQLException(e.getMessage()); ee.initCause(e); throw ee; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();It queries the whole index for the requested
Geometry
, ignoring the documents that are not in thecontextSet
, and it also ignores the nodes that are not in thecontextSet
. After that theGeometry
is deserialized.This method is not yet used by the spatial functions but it is planned to use it in a future optimization effort.
This is the next method, designed like
getGeometryForNode()
:protected AtomicValue getGeometricPropertyForNode(DBBroker broker, NodeProxy p, Connection conn, String propertyName) throws SQLException, XPathException { PreparedStatement ps = conn.prepareStatement( "SELECT " + propertyName + " FROM " + GMLHSQLIndex.TABLE_NAME + " WHERE DOCUMENT_URI = ? AND NODE_ID_UNITS = ? AND NODE_ID = ?" ps.setString(1, p.getDocument().getURI().toString()); ps.setInt(2, p.getNodeId().units()); byte[] bytes = new byte[p.getNodeId().size()]; p.getNodeId().serialize(bytes, 0); ps.setBytes(3, bytes); ResultSet rs = null; try { rs = ps.executeQuery(); if (!rs.next()) //Nothing returned return AtomicValue.EMPTY_VALUE; AtomicValue result = null; if (rs.getMetaData().getColumnClassName(1).equals(Boolean.class.getName())) { result = new BooleanValue(rs.getBoolean(1)); } else if (rs.getMetaData().getColumnClassName(1).equals(Double.class.getName())) { result = new DoubleValue(rs.getDouble(1)); } else if (rs.getMetaData().getColumnClassName(1).equals(String.class.getName())) { result = new StringValue(rs.getString(1)); } else if (rs.getMetaData().getColumnType(1) == java.sql.Types.BINARY) { result = new Base64Binary(rs.getBytes(1)); } else throw new SQLException("Unable to make an atomic value from '" + rs.getMetaData().getColumnClassName(1) + "'"); if (rs.next()) { //Should be impossible throw new SQLException("More than one geometry for node " + p); return result; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();It directly requests the required property from the DB and returns an appropriate XML atomic value.
The next method is a generalization of the previous one:
protected ValueSequence getGeometricPropertyForNodes(DBBroker broker, NodeSet contextSet, Connection conn, String propertyName) throws SQLException, XPathException { PreparedStatement ps = conn.prepareStatement( "SELECT " + propertyName + ", DOCUMENT_URI, NODE_ID_UNITS, NODE_ID" + " FROM " + GMLHSQLIndex.TABLE_NAME ResultSet rs = null; try { rs = ps.executeQuery(); ValueSequence result = new ValueSequence(contextSet.getLength()); while (rs.next()) { DocumentImpl doc = null; try { doc = (DocumentImpl)broker.getXMLResource(XmldbURI.create(rs.getString("DOCUMENT_URI"))); } catch (PermissionDeniedException e) { LOG.debug(e); if (rs.getMetaData().getColumnClassName(1).equals(Boolean.class.getName())) { result.add(BooleanValue.EMPTY_VALUE); } else if (rs.getMetaData().getColumnClassName(1).equals(Double.class.getName())) { result.add(DoubleValue.EMPTY_VALUE); } else if (rs.getMetaData().getColumnClassName(1).equals(String.class.getName())) { result.add(StringValue.EMPTY_VALUE); } else if (rs.getMetaData().getColumnType(1) == java.sql.Types.BINARY) { result.add(Base64Binary.EMPTY_VALUE); } else throw new SQLException("Unable to make an atomic value from '" + rs.getMetaData().getColumnClassName(1) + "'"); //Ignore since the broker has no right on the document continue; if (contextSet.getDocumentSet().contains(doc.getDocId())) { NodeId nodeId = new DLN(rs.getInt("NODE_ID_UNITS"), rs.getBytes("NODE_ID"), 0); NodeProxy p = new NodeProxy((DocumentImpl)doc, nodeId); if (contextSet.get(p) != null) { if (rs.getMetaData().getColumnClassName(1).equals(Boolean.class.getName())) { result.add(new BooleanValue(rs.getBoolean(1))); } else if (rs.getMetaData().getColumnClassName(1).equals(Double.class.getName())) { result.add(new DoubleValue(rs.getDouble(1))); } else if (rs.getMetaData().getColumnClassName(1).equals(String.class.getName())) { result.add(new StringValue(rs.getString(1))); } else if (rs.getMetaData().getColumnType(1) == java.sql.Types.BINARY) { result.add(new Base64Binary(rs.getBytes(1))); } else throw new SQLException("Unable to make an atomic value from '" + rs.getMetaData().getColumnClassName(1) + "'"); return result; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();It queries the whole index for the requested property, ignoring the documents that are not in the
contextSet
, and it also ignores the nodes that are not in thecontextSet
. Finally the property mapped to the appropriate XML atomic value is returned.This method is not yet used by the spatial functions but it is planned to use it in a future optimization effort.
The last method is a utility method and we will only show a part of its body:
protected boolean checkIndex(DBBroker broker, Connection conn) throws SQLException { PreparedStatement ps = conn.prepareStatement( "SELECT * FROM " + GMLHSQLIndex.TABLE_NAME + ";" ResultSet rs = null; try { rs = ps.executeQuery(); while (rs.next()) { Geometry original_geometry = wkbReader.read(rs.getBytes("WKB")); if (!original_geometry.equals(wktReader.read(rs.getString("WKT")))) { LOG.info("Inconsistent WKT : " + rs.getString("WKT")); return false; Geometry EPSG4326_geometry = wkbReader.read(rs.getBytes("EPSG4326_WKB")); if (!EPSG4326_geometry.equals(wktReader.read(rs.getString("EPSG4326_WKT")))) { LOG.info("Inconsistent WKT : " + rs.getString("EPSG4326_WKT")); return false; if (!original_geometry.getGeometryType().equals(rs.getString("GEOMETRY_TYPE"))) { LOG.info("Inconsistent geometry type: " + rs.getDouble("GEOMETRY_TYPE")); return false; if (original_geometry.getEnvelopeInternal().getMinX() != rs.getDouble("MINX")) { LOG.info("Inconsistent MinX: " + rs.getDouble("MINX")); return false; DocumentImpl doc = null; try { doc = (DocumentImpl)broker.getXMLResource(XmldbURI.create(rs.getString("DOCUMENT_URI"))); } catch (PermissionDeniedException e) { //The broker has no right on the document LOG.error(e); return false; NodeId nodeId = new DLN(rs.getInt("NODE_ID_UNITS"), rs.getBytes("NODE_ID"), 0); StoredNode node = broker.objectWith(new NodeProxy((DocumentImpl)doc, nodeId)); if (node == null) { LOG.info("Node " + nodeId + "doesn't exist"); return false; if (!GMLHSQLIndexWorker.GML_NS.equals(node.getNamespaceURI())) { LOG.info("GML indexed node (" + node.getNodeId()+ ") is in the '" + node.getNamespaceURI() + "' namespace. '" + GMLHSQLIndexWorker.GML_NS + "' was expected !"); return false; if (!original_geometry.getGeometryType().equals(node.getLocalName())) { if ("Box".equals(node.getLocalName()) && "Polygon".equals(original_geometry.getGeometryType())) { LOG.debug("GML indexed node (" + node.getNodeId() + ") is a gml:Box indexed as a polygon"); } else { LOG.info("GML indexed node (" + node.getNodeId() + ") has '" + node.getLocalName() + "' as its local name. '" + original_geometry.getGeometryType() + "' was expected !"); return false; LOG.info(node); return true; } catch (ParseException e) { //Transforms the exception into an SQLException. //Very unlikely to happen though... SQL Exception ee = new SQLException(e.getMessage()); ee.initCause(e); throw ee; } finally { if (rs != null) rs.close(); if (ps != null) ps.close();It deserializes each
Geometry
and checks that its data are consistent with what is stored in the DB.Writing a concrete implementation of org.exist.indexing.StreamListener
The
StreamListener
's main purpose is to generateGeometry
instances, if accurate, from the nodes it listensThis will be done using a
org.geotools.gml.GMLFilterDocument
provided by the Geotools libraries. The trick is to map our STAX events to the expected SAX events.As stated above, our
StreamListener
will be implemented as an inner class oforg.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
. Of course, it will extendorg.exist.indexing.AbstractStreamListener
:private class GMLStreamListener extends AbstractStreamListener { Stack srsNamesStack = new Stack(); ElementImpl deferredElement; public IndexWorker getWorker() { return AbstractGMLJDBCIndexWorker.this;There are only two members.
srsNamesStack
will maintain a (String
)java.util.Stack
for thesrsName
attribute of the elements in the GML namespace (http://www.opengis.net/gml).null
will be pushed if such an attribute doesn't exist, namely because it isn't accurate.
deferredElement
will hold an element whose streaming is deferred, namely because we still haven't received its attributes.The
getWorker()
method should be straightforward.Let's see how the process is performed:
public void startElement(Txn transaction, ElementImpl element, NodePath path) { if (isDocumentGMLAware) { //Release the deferred element if any if (deferredElement != null) processDeferredElement(); //Retain this element deferredElement = element; //Forward the event to the next listener super.startElement(transaction, element, path); public void attribute(Txn transaction, AttrImpl attrib, NodePath path) { //Forward the event to the next listener super.attribute(transaction, attrib, path); public void characters(Txn transaction, TextImpl text, NodePath path) { if (isDocumentGMLAware) { //Release the deferred element if any if (deferredElement != null) processDeferredElement(); try { geometryDocument.characters(text.getData().toCharArray(), 0, text.getLength()); } catch (Exception e) { LOG.error(e); //Forward the event to the next listener super.characters(transaction, text, path); public void endElement(Txn transaction, ElementImpl element, NodePath path) { if (isDocumentGMLAware) { //Release the deferred element if any if (deferredElement != null) processDeferredElement(); //Process the element processCurrentElement(element); //Forward the event to the next listener super.endElement(transaction, element, path);Element deferring occurs only if
currentDoc
is to be indexed of course. If so, an incoming element is deferred but we do not forget to forward the event to the nextStreamListener
in the pipeline.If we have a deferred element, we will process it (see below) in order to collect its attributes and if relevant,
endElement()
, will add an index entry for the current element. The methodcharacters()
also forwards its data to the SAX handler.We could have used
attribute()
to collect the deferred element's attributes. The described design is just a matter of choice.Let's see how the deferred element is processed:
private void processDeferredElement() { //We need to collect the deferred element's attributes in order to feed the SAX handler AttributesImpl attList = new AttributesImpl(); NamedNodeMap attrs = deferredElement.getAttributes(); String whatToPush = null; for (int i = 0; i < attrs.getLength() ; i++) { AttrImpl attrib = (AttrImpl)attrs.item(i); //Store the srs if (GML_NS.equals(deferredElement.getNamespaceURI())) { //Maybe we could assume a configurable default value here if (attrib.getName().equals("srsName")) { whatToPush = attrib.getValue(); srsNamesStack.push(whatToPush); attList.addAttribute(attrib.getNamespaceURI(), attrib.getLocalName(), attrib.getQName().getStringValue(), Integer.toString(attrib.getType()), attrib.getValue()); srsNamesStack.push(whatToPush); try { geometryDocument.startElement(deferredElement.getNamespaceURI(), deferredElement.getLocalName(), deferredElement.getQName().getStringValue(), attList); } catch (Exception e) { e.printStackTrace(); LOG.error(e); } finally { deferredElement = null;We first need to collect its attributes and that's why it is deferred, because attributes events come after the call to
startElement()
. Elements in the GML namespace that carry ansrsName
attribute will push its value. If the element is not in the GML namespace or if nosrsName
attribute exists,null
is pushed.We could have had a smarter mechanism, but we first have to take a decision about the fact that we could define a default SRS here, either from the config, or from a higher-level element. This part of the code will thus probably be revisited once the decision is taken.
When the attributes are collected, we can send a
startElement()
event to the SAX handler, thus marking the end of the deferring process.Processing of the current element with
endElement()
:private void processCurrentElement(ElementImpl element) { String currentSrsName = (String)srsNamesStack.pop(); try { geometryDocument.endElement(element.getNamespaceURI(), element.getLocalName(), element.getQName().getStringValue()); //Some invalid/(yet) incomplete geometries don't have a SRS if (streamedGeometry != null && currentSrsName != null) { currentNodeId = element.getNodeId(); geometries.put(currentNodeId, new SRSGeometry(currentSrsName, streamedGeometry)); if (flushAfter != -1 && geometries.size() >= flushAfter) { //Mmmh... doesn't flush since it is currently dependant from the //number of nodes in the DOM file ; would need refactorings //currentDoc.getBroker().checkAvailableMemory(); currentDoc.getBroker().flush(); ///Aaaaaargl ! final double percent = ((double) Runtime.getRuntime().freeMemory() / (double) Runtime.getRuntime().maxMemory()) * 100; if (percent < 30) { System.gc(); } catch (Exception e) { LOG.error("Unable to collect geometry for node: " + currentNodeId + ". Indexing will be skipped", e); } finally { streamedGeometry = null;We first pop a SRS name from the stack.
null
will indicate that the element doesn't have any and thus that it is an element which doesn't carry enough information to build a complete geometry. That doesn't prevent us to forward this element to the SAX handler.The SAX handler might have been able to build a
Geometry
then. If so, the current index entry (composed ofcurrentSrsName
,streamedGeometry
,currentNodeId
and the "global"org.exist.dom.DocumentImpl
) is added togeometries
(wrapped in the convenienceSRSGeometry
class). We then check if it's time to flush the pending index entries.This is how the
GeometryHandler
looks like. It is also implemented as an inner class oforg.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
private class GeometryHandler extends XMLFilterImpl implements GMLHandlerJTS { public void geometry(Geometry geometry) { streamedGeometry = geometry; //TODO : null geometries can be returned for many reasons, including a (too) strict //topology check done by the Geotools SAX parser. //It would be nice to have static classes extending Geometry to report such geometries if (geometry == null) LOG.error("Collected null geometry for node: " + currentNodeId + ". Indexing will be skipped");Thanks to Geotools SAX parser, it hasn't to be more complicated than setting the
streamedGeometry
"global" member.However, it may throw some
NullPointerException
s as described above.Implementing some functions that cooperate with the spatial index
We currently provide three sets of functions that are able to cooperate with spatial indexes.
The functions are declared in the
org.exist.xquery.modules.spatial.SpatialModule
module which operates in thehttp://exist-db.org/xquery/spatial
namespace (whose default prefix isspatial
).The functions signatures are documented together with the functions themselves in the XQuery function documentation. Here we will only look at their
eval()
methods.The first functions set we will describe is
org.exist.xquery.modules.spatial.FunSpatialSearch
, which performs searches on the spatial index:public Sequence eval(Sequence[] args, Sequence contextSequence) throws XPathException { Sequence result = null; Sequence nodes = args[0]; if (nodes.isEmpty()) result = Sequence.EMPTY_SEQUENCE; else if (args[1].isEmpty()) //TODO : to be discussed. We could also return an empty sequence here result = nodes; else { try { AbstractGMLJDBCIndexWorker indexWorker = (AbstractGMLJDBCIndexWorker) context.getBroker().getIndexController().getWorkerByIndexId(AbstractGMLJDBCIndex.ID); if (indexWorker == null) throw new XPathException("Unable to find a spatial index worker"); Geometry EPSG4326_geometry = null; NodeValue geometryNode = (NodeValue) args[1].itemAt(0); if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) //Get the geometry from the index if available EPSG4326_geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, true); if (EPSG4326_geometry == null) { String sourceCRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); Geometry geometry = indexWorker.streamNodeToGeometry(context, geometryNode); EPSG4326_geometry = indexWorker.transformGeometry(geometry, sourceCRS, "EPSG:4326"); if (EPSG4326_geometry == null) throw new XPathException("Unable to get a geometry from the node"); int spatialOp = SpatialOperator.UNKNOWN; if (isCalledAs("equals")) spatialOp = SpatialOperator.EQUALS; else if (isCalledAs("disjoint")) spatialOp = SpatialOperator.DISJOINT; else if (isCalledAs("intersects")) spatialOp = SpatialOperator.INTERSECTS; else if (isCalledAs("touches")) spatialOp = SpatialOperator.TOUCHES; else if (isCalledAs("crosses")) spatialOp = SpatialOperator.CROSSES; else if (isCalledAs("within")) spatialOp = SpatialOperator.WITHIN; else if (isCalledAs("contains")) spatialOp = SpatialOperator.CONTAINS; else if (isCalledAs("overlaps")) spatialOp = SpatialOperator.OVERLAPS; //Search the EPSG:4326 in the index result = indexWorker.search(context.getBroker(), nodes.toNodeSet(), EPSG4326_geometry, spatialOp); hasUsedIndex = true; } catch (SpatialIndexException e) { throw new XPathException(e); return result;We first build an early result if empty sequences are passed to the function.
Then, we try to access the
XQueryContext
'sAbstractGMLJDBCIndex
(remember that there is a 1:1 relationship betweenXQueryContext
andDBBroker
and a 1:1 relationship betweenDBBroker
andIndexWorker
). If we can not find anAbstractGMLJDBCIndex
we throw anException
since we will need this and its concrete class to delegate spatial operations to (whatever its underlying DB implementation is, thanks to our generic design).Then, we examine if the geometry node is persistent, in which case it might be indexed. If so, we try to get an EPSG:4326
Geometry
from the index.If nothing is returned here, either because the node isn't indexed or because it is an in-memory node, we stream it to a
Geometry
and we transform this into an EPSG:4326Geometry
. Of course, this process is slower than a direct lookup into the index.Then we search for the geometry in the index after having determined the spatial operator from the function's name.
The second functions set is
org.exist.xquery.modules.spatial.FunGeometricProperties
, which retrieves a property for aGeometry
:public Sequence eval(Sequence[] args, Sequence contextSequence) throws XPathException { Sequence result = null; Sequence nodes = args[0]; if (nodes.isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { try { Geometry geometry = null; String sourceCRS = null; AbstractGMLJDBCIndexWorker indexWorker = (AbstractGMLJDBCIndexWorker) context.getBroker().getIndexController().getWorkerByIndexId(AbstractGMLJDBCIndex.ID); if (indexWorker == null) throw new XPathException("Unable to find a spatial index worker"); String propertyName = null; if (isCalledAs("getWKT")) { propertyName = "WKT"; } else if (isCalledAs("getWKB")) { propertyName = "WKB"; } else if (isCalledAs("getMinX")) { propertyName = "MINX"; } else if (isCalledAs("getMaxX")) { propertyName = "MAXX"; } else if (isCalledAs("getMinY")) { propertyName = "MINY"; } else if (isCalledAs("getMaxY")) { propertyName = "MAXY"; } else if (isCalledAs("getCentroidX")) { propertyName = "CENTROID_X"; } else if (isCalledAs("getCentroidY")) { propertyName = "CENTROID_Y"; } else if (isCalledAs("getArea")) { propertyName = "AREA"; } else if (isCalledAs("getEPSG4326WKT")) { propertyName = "EPSG4326_WKT"; } else if (isCalledAs("getEPSG4326WKB")) { propertyName = "EPSG4326_WKB"; } else if (isCalledAs("getEPSG4326MinX")) { propertyName = "EPSG4326_MINX"; } else if (isCalledAs("getEPSG4326MaxX")) { propertyName = "EPSG4326_MAXX"; } else if (isCalledAs("getEPSG4326MinY")) { propertyName = "EPSG4326_MINY"; } else if (isCalledAs("getEPSG4326MaxY")) { propertyName = "EPSG4326_MAXY"; } else if (isCalledAs("getEPSG4326CentroidX")) { propertyName = "EPSG4326_CENTROID_X"; } else if (isCalledAs("getEPSG4326CentroidY")) { propertyName = "EPSG4326_CENTROID_Y"; } else if (isCalledAs("getEPSG4326Area")) { propertyName = "EPSG4326_AREA"; } else if (isCalledAs("getSRS")) { propertyName = "SRS_NAME"; } else if (isCalledAs("getGeometryType")) { propertyName = "GEOMETRY_TYPE"; } else if (isCalledAs("isClosed")) { propertyName = "IS_CLOSED"; } else if (isCalledAs("isSimple")) { propertyName = "IS_SIMPLE"; } else if (isCalledAs("isValid")) { propertyName = "IS_VALID"; } else { throw new XPathException("Unknown spatial property: " + mySignature.getName().getLocalName()); NodeValue geometryNode = (NodeValue) nodes.itemAt(0); if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) { if (propertyName != null) { //The node should be indexed : get its property result = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, propertyName); hasUsedIndex = true; } else { //Or, at least, its geometry for further processing if (propertyName.indexOf("EPSG4326") != Constants.STRING_NOT_FOUND) { geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, true); sourceCRS = "EPSG:4326"; } else { geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, false); sourceCRS = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, "SRS_NAME").getStringValue(); if (result == null) { //builds the geometry if (geometry == null) { sourceCRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); geometry = indexWorker.streamNodeToGeometry(context, geometryNode); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); //Transform the geometry to EPSG:4326 if relevant if (propertyName.indexOf("EPSG4326") != Constants.STRING_NOT_FOUND) { geometry = indexWorker.transformGeometry(geometry, sourceCRS, "EPSG:4326"); if (isCalledAs("getEPSG4326WKT")) { result = new StringValue(wktWriter.write(geometry)); } else if (isCalledAs("getEPSG4326WKB")) { result = new Base64Binary(wkbWriter.write(geometry)); } else if (isCalledAs("getEPSG4326MinX")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMinX()); } else if (isCalledAs("getEPSG4326MaxX")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMaxX()); } else if (isCalledAs("getEPSG4326MinY")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMinY()); } else if (isCalledAs("getEPSG4326MaxY")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMaxY()); } else if (isCalledAs("getEPSG4326CentroidX")) { result = new DoubleValue(geometry.getCentroid().getX()); } else if (isCalledAs("getEPSG4326CentroidY")) { result = new DoubleValue(geometry.getCentroid().getY()); } else if (isCalledAs("getEPSG4326Area")) { result = new DoubleValue(geometry.getArea()); } else if (isCalledAs("getWKT")) { result = new StringValue(wktWriter.write(geometry)); } else if (isCalledAs("getWKB")) { result = new Base64Binary(wkbWriter.write(geometry)); } else if (isCalledAs("getMinX")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMinX()); } else if (isCalledAs("getMaxX")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMaxX()); } else if (isCalledAs("getMinY")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMinY()); } else if (isCalledAs("getMaxY")) { result = new DoubleValue(geometry.getEnvelopeInternal().getMaxY()); } else if (isCalledAs("getCentroidX")) { result = new DoubleValue(geometry.getCentroid().getX()); } else if (isCalledAs("getCentroidY")) { result = new DoubleValue(geometry.getCentroid().getY()); } else if (isCalledAs("getArea")) { result = new DoubleValue(geometry.getArea()); } else if (isCalledAs("getSRS")) { result = new StringValue(((Element)geometryNode).getAttribute("srsName")); } else if (isCalledAs("getGeometryType")) { result = new StringValue(geometry.getGeometryType()); } else if (isCalledAs("isClosed")) { result = new BooleanValue(!geometry.isEmpty()); } else if (isCalledAs("isSimple")) { result = new BooleanValue(geometry.isSimple()); } else if (isCalledAs("isValid")) { result = new BooleanValue(geometry.isValid()); } else { throw new XPathException("Unknown spatial property: " + mySignature.getName().getLocalName()); } catch (SpatialIndexException e) { throw new XPathException(e); return result;The design is very much the same: we build an early result if empty sequences are involved, we get a
AbstractGMLJDBCIndex
, then we set apropertyName
, which is actually a SQL field name, depending on the function's name.An attempt to retrieve the field content from the DB is made and, if unsuccessful, we try to get the node's
Geometry
from the index.Then, if we still haven't got this
Geometry
, either because the node isn't indexed or because it is an in-memory node, we stream it to aGeometry
and we transform this into an EPSG:4326Geometry
if the function's name requires to do so.We then dynamically build the property to be returned.
This mechanism if far from being efficient compared to the index lookup, but it shows how easy it would be to return a property which is not available in a spatial index.
The third functions set,
org.exist.xquery.modules.spatial.FunGMLProducers
, uses the same design:public Sequence eval(Sequence[] args, Sequence contextSequence) throws XPathException { Sequence result = null; try { AbstractGMLJDBCIndexWorker indexWorker = (AbstractGMLJDBCIndexWorker) context.getBroker().getIndexController().getWorkerByIndexId(AbstractGMLJDBCIndex.ID); if (indexWorker == null) throw new XPathException("Unable to find a spatial index worker"); Geometry geometry = null; String targetSRS = null; if (isCalledAs("transform")) { if (args[0].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { NodeValue geometryNode = (NodeValue) args[0].itemAt(0); //Try to get the geometry from the index String sourceSRS = null; if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) { sourceSRS = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, "SRS_NAME").getStringValue(); geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, false); hasUsedIndex = true; //Otherwise, build it if (geometry == null) { sourceSRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); geometry = indexWorker.streamNodeToGeometry(context, geometryNode); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); targetSRS = args[1].itemAt(0).getStringValue().trim(); geometry = indexWorker.transformGeometry(geometry, sourceSRS, targetSRS); } else if (isCalledAs("WKTtoGML")) { if (args[0].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { String wkt = args[0].itemAt(0).getStringValue(); WKTReader wktReader = new WKTReader(); try { geometry = wktReader.read(wkt); } catch (ParseException e) { throw new XPathException(e); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); targetSRS = args[1].itemAt(0).getStringValue().trim(); } else if (isCalledAs("buffer")) { if (args[0].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { NodeValue geometryNode = (NodeValue) args[0].itemAt(0); //Try to get the geometry from the index if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) { targetSRS = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, "SRS_NAME").getStringValue(); geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, false); hasUsedIndex = true; //Otherwise, build it if (geometry == null) { targetSRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); geometry = indexWorker.streamNodeToGeometry(context, geometryNode); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); double distance = ((DoubleValue)args[1].itemAt(0)).getDouble(); int quadrantSegments = 8; int endCapStyle = BufferOp.CAP_ROUND; if (getArgumentCount() > 2 && Type.subTypeOf(args[2].itemAt(0).getType(), Type.INTEGER)) quadrantSegments = ((IntegerValue)args[2].itemAt(0)).getInt(); if (getArgumentCount() > 3 && Type.subTypeOf(args[3].itemAt(0).getType(), Type.INTEGER)) endCapStyle = ((IntegerValue)args[3].itemAt(0)).getInt(); switch (endCapStyle) { case BufferOp.CAP_ROUND: case BufferOp.CAP_BUTT: case BufferOp.CAP_SQUARE: break; default: throw new XPathException("Invalid line end style"); geometry = geometry.buffer(distance, quadrantSegments, endCapStyle); } else if (isCalledAs("getBbox")) { if (args[0].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { NodeValue geometryNode = (NodeValue) args[0].itemAt(0); //Try to get the geometry from the index if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) { targetSRS = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, "SRS_NAME").getStringValue(); geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, false); hasUsedIndex = true; //Otherwise, build it if (geometry == null) { targetSRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); geometry = indexWorker.streamNodeToGeometry(context, geometryNode); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); geometry = geometry.getEnvelope(); } else if (isCalledAs("convexHull")) { if (args[0].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { NodeValue geometryNode = (NodeValue) args[0].itemAt(0); //Try to get the geometry from the index if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) { targetSRS = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, "SRS_NAME").getStringValue(); geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, false); hasUsedIndex = true; //Otherwise, build it if (geometry == null) { targetSRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); geometry = indexWorker.streamNodeToGeometry(context, geometryNode); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); geometry = geometry.convexHull(); } else if (isCalledAs("boundary")) { if (args[0].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else { NodeValue geometryNode = (NodeValue) args[0].itemAt(0); //Try to get the geometry from the index if (geometryNode.getImplementationType() == NodeValue.PERSISTENT_NODE) { targetSRS = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode, "SRS_NAME").getStringValue(); geometry = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode, false); hasUsedIndex = true; //Otherwise, build it if (geometry == null) { targetSRS = ((Element)geometryNode.getNode()).getAttribute("srsName").trim(); geometry = indexWorker.streamNodeToGeometry(context, geometryNode); if (geometry == null) throw new XPathException("Unable to get a geometry from the node"); geometry = geometry.getBoundary(); } else { Geometry geometry1 = null; Geometry geometry2 = null; if (args[0].isEmpty() && args[1].isEmpty()) result = Sequence.EMPTY_SEQUENCE; else if (!args[0].isEmpty() && args[1].isEmpty()) result = args[0].itemAt(0).toSequence(); else if (args[0].isEmpty() && !args[1].isEmpty()) result = args[1].itemAt(0).toSequence(); else { NodeValue geometryNode1 = (NodeValue) args[0].itemAt(0); NodeValue geometryNode2 = (NodeValue) args[1].itemAt(0); String srsName1 = null; String srsName2 = null; //Try to get the geometries from the index if (geometryNode1.getImplementationType() == NodeValue.PERSISTENT_NODE) { srsName1 = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode1, "SRS_NAME").getStringValue(); geometry1 = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode1, false); hasUsedIndex = true; if (geometryNode2.getImplementationType() == NodeValue.PERSISTENT_NODE) { srsName2 = indexWorker.getGeometricPropertyForNode(context.getBroker(), (NodeProxy)geometryNode2, "SRS_NAME").getStringValue(); geometry2 = indexWorker.getGeometryForNode(context.getBroker(), (NodeProxy)geometryNode2, false); hasUsedIndex = true; //Otherwise build them if (geometry1 == null) { srsName1 = ((Element)geometryNode1.getNode()).getAttribute("srsName").trim(); geometry1 = indexWorker.streamNodeToGeometry(context, geometryNode1); if (geometry2 == null) { srsName2 = ((Element)geometryNode2.getNode()).getAttribute("srsName").trim(); geometry2 = indexWorker.streamNodeToGeometry(context, geometryNode2); if (geometry1 == null) throw new XPathException("Unable to get a geometry from the first node"); if (geometry2 == null) throw new XPathException("Unable to get a geometry from the second node"); //Transform the second geometry in the SRS of the first one if necessary if (!srsName1.equalsIgnoreCase(srsName2)) { geometry2 = indexWorker.transformGeometry(geometry2, srsName1, srsName2); if (isCalledAs("intersection")) { geometry = geometry1.intersection(geometry2); } else if (isCalledAs("union")) { geometry = geometry1.union(geometry2); } else if (isCalledAs("difference")) { geometry = geometry1.difference(geometry2); } else if (isCalledAs("symetricDifference")) { geometry = geometry1.symDifference(geometry2); targetSRS = srsName1; if (result == null) { String gmlPrefix = context.getPrefixForURI(AbstractGMLJDBCIndexWorker.GML_NS); if (gmlPrefix == null) throw new XPathException("'" + AbstractGMLJDBCIndexWorker.GML_NS + "' namespace is not defined"); context.pushDocumentContext(); try { MemTreeBuilder builder = context.getDocumentBuilder(); DocumentBuilderReceiver receiver = new DocumentBuilderReceiver(builder); result = (NodeValue)indexWorker.streamGeometryToElement(geometry, targetSRS, receiver); } finally { context.popDocumentContext(); } catch (SpatialIndexException e) { throw new XPathException(e); return result;It looks more complicated because of the multiple possible argument counts. However, the pinciple remains the same: early result when empty sequences are involved, fetching of the
Geometry
ies (and of its/their SRS) from the DB, streaming if nothing can be fetched, then geometric computations after a transformation of the secondGeometry
if relevant.The final process streams the resulting
Geometry
as the result, in the SRS specified by the relevant argument or in the SRS of the firstGeometry
, depending on the function's name.Playing with the spatial index
Now that we have described the spatial index, it is time to play with it. Only a few of its features will be demonstrated, but we will explain again what happens under the hood.
The first step is to make sure to have a recent enough release version of eXist: 4.7.0 or later, and Java 8.
Our demonstration file is taken from the Ordnance Survey of Great-Britain's WWW site which offers sample data.
The chosen topography layer is of Port-Talbot, which is available as 2182-SS7886-2c1.gz. Download this file, gunzip it, and give to the resulting file a
.gml
extension (port-talbot.gml
) this will allow eXist to recognise it as an XML file.If you have previously executed
mvn test
, the file should have been downloaded and gunzipped for you in$EXIST_HOME/extensions/indexes/spatial/target/test-classes
.Since this file refers to an OSGB-hosted schema, we will need to bypass validation in
${EXIST_HOME}/etc/conf.xml
.Make sure the mode value is set like this:
<validation mode="no">We are now ready to start the demonstration and we will use the interactive client for that. Run either
${EXIST_HOME}/bin/client.bat
or${EXIST_HOME}/bin/client.sh
from the command line (please read elsewhere if you do not know how to start it).Let's start by creating a collection named
spatial
in the/db
collection. The menus might be localised, but in english it isFile/Create a collection...
.Then, we will configure this collection by creating a configuration collection.
Let's navigate to
/db/system/config
If required, let's create a general configuration collection:
File/Create a collection...
name itdb
and get into it.Then let's create a configuration collection for
/db/spatial
:File/Create a collection...
name itspatial
and get into it.We are now in
/db/system/config/db/spatial
.Let's now create a configuration file for this collection:
File/Create an empty document...
name itcollection.xconf
.Double-click on this document and let's replace its auto-generated content:
<template/>
with this one:
<collection xmlns="http://exist-db.org/collection-config/1.0"> <index> <gml flushAfter="200"/> </index> </collection>
Do not forget to save the document before closing the window.
The
/db/system/config/db/spatial
collection is now configured to index GML geometries when they are uploaded. The in-memory index entries will be flushed to the HSQLDB every 200 geometries and will wait at most 100 seconds, the default value, to establish a connection to the HSQL db.Let's navigate to
/db/spatial
.Let's upload
port-talbot.gml
: File/Upload files/directories...On my computer, the operation on this 23.6 Mb file is performed in about 100 seconds, including some default fulltext indexing. Let's close the upload window and quit the interactive client.
Let's look our our GML file looks like on GML Viewer, a free viewer provided by Snowflake software:
If you want to have a look at the spatial index HSQLDB, which, if you are using the default data-dir, is in
${EXIST_HOME}/data/spatial_index.*
there is a dedicated script file in${EXIST_HOME}/extensions/indexes/spatial/
. to launch HSQL's GUI client: Use eitherhsql.bat
orhsql.sh [data-dir]
(you only need to supply data-dir if it is not the default one).If the SQL command
SELECT * FROM SPATIAL_INDEX_V1;
is executed, the result window shows that21961
geometries have been indexed.Let's get back to the interactive client and open the query window (the one we get when clicking on the binocular button in the toolbar).
This query:
declare namespace gml = "http://www.opengis.net/gml"; spatial:transform( <gml:Polygon srsName="osgb:BNG" xmlns:gml='http://www.opengis.net/gml'> <gml:outerBoundaryIs><gml:LinearRing><gml:coordinates> 278200,187600 278400,187600 278400,188000 278200,188000 278200,187600 </gml:coordinates></gml:LinearRing></gml:outerBoundaryIs> </gml:Polygon>, "epsg:4326")Computes in a little bit less than 2 seconds. That could seem high, but there is a cost for the Geotools transformation factories initialization. Subsequent requests will be much faster, although there will always be a small cost for the streaming of the in-memory node to a
Geometry
object.The result is:
<gml:Polygon xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing> <gml:coordinates decimal="." cs="," ts=" "> -3.7578,51.5743 -3.7579,51.5779 -3.755,51.5779 -3.7549,51.5743 -3.7578,51.5743 </gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs> </gml:Polygon>
Due to the current Geotools limitations, there is no
srsName
attribute on<gml:Polygon>
! See above.It might be more convenient to perform this query:
declare namespace gml = "http://www.opengis.net/gml"; spatial:getEPSG4326WKT( <gml:Polygon srsName="osgb:BNG" xmlns:gml='http://www.opengis.net/gml'> <gml:outerBoundaryIs><gml:LinearRing> <gml:coordinates> 278200,187600 278400,187600 278400,188000 278200,188000 278200,187600 </gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs> </gml:Polygon>)Which results in:
POLYGON ((-3.7577853800140995 51.57430250509819, -3.7579241102503356 51.57789774692169, -3.7550389942943365 51.57794093512567, -3.754900491457913 51.57434568777196, -3.7577853800140995 51.57430250509819)So, 3 degrees West, 51 deegrees North... we must be indeed northern of Brittany, i.e. in south-western Great Britain.
Let's see what our polygon looks like:
Now, we continue doing something more practical:
declare namespace gml = "http://www.opengis.net/gml"; spatial:intersects(//gml:Polygon, <gml:Polygon srsName="osgb:BNG" xmlns:gml='http://www.opengis.net/gml'> <gml:outerBoundaryIs> <gml:LinearRing> <gml:coordinates> 278200,187600 278400,187600 278400,188000 278200,188000 278200,187600 </gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs> </gml:Polygon>This query returns 756
<gml:Polygon>
s in about 15 seconds. A subsequent call returns in just about 450 ms, not having the cost for initializations (in particular the first connection to the HSQLDB). A slighly modified query, in order to show the performance without utilising eXist's performant cache:declare namespace gml = "http://www.opengis.net/gml"; spatial:intersects(//gml:Polygon, <gml:Polygon srsName="osgb:BNG" xmlns:gml='http://www.opengis.net/gml'> <gml:outerBoundaryIs> <gml:LinearRing> <gml:coordinates> 278201,187600 278401,187600 278401,188000 278201,188000 278201,187600 </gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs> </gml:Polygon>... retuns 755
<gml:Polygon>
(one less) in just about 470 ms.The result of our first intersection query looks like this:
Let's try another type of spatial query:
declare namespace gml = "http://www.opengis.net/gml"; spatial:within(//gml:Polygon, <gml:Polygon srsName="osgb:BNG" xmlns:gml='http://www.opengis.net/gml'> <gml:outerBoundaryIs> <gml:LinearRing> <gml:coordinates> 278200,187600 278400,187600 278400,188000 278200,188000 278200,187600 </gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs> </gml:Polygon>It returns 598
<gml:Polygon>
s in just a little bit more than 400 ms. Here is what they look like:The last query of this session is just to demonstrate some interesting capabilities of the spatial functions:
declare namespace gml = "http://www.opengis.net/gml"; spatial:buffer( <gml:Polygon srsName="osgb:BNG" xmlns:gml='http://www.opengis.net/gml'> <gml:outerBoundaryIs> <gml:LinearRing> <gml:coordinates> 278200,187600 278400,187600 278400,188000 278200,188000 278200,187600 </gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs> </gml:Polygon>,See the (not so) rounded corners of our 500 metres buffer over Port-Talbot:
Facts and thoughts
The spatial index is in working condition. It provides an interesting set of functionalities and its performance is satisfactory given the lightweight, non spatially-enabled, database engine that stores the
Geometry
objects and their properties.The Spatial Index does not currently support XQuery Update or XUpdate. Using either update facility on GML documents will likely lead to incorrect query results.
Here are still some tentative improvements.
The first improvement is to plug in the
getGeometriesForNodes()
andgetGeometricPropertyForNodes()
(inorg.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
) to allow sequence/set optimization.Indeed, a query like this one on our test file:
declare namespace gml = "http://www.opengis.net/gml"; for $pol in //gml:Polygon return spatial:getArea($pol)Which results in
5339
items through as many calls to the DB in 51 seconds on an initialized index! Intercepting theSINGLE_STEP_EXECUTION
flag when the expression is analyzed would allow to call the 2 above methods rather than their "individual" counterparts, namelygetGeometryForNode()
andgetGeometricPropertyForNode()
. The expected performance improvement would be interesting.A second improvement could be to refine the queries on the HSQLDB. Currently,
search()
(inorg.exist.indexing.spatial.GMLHSQLIndexWorker
) filters the records on the BBox of the searchedGeometry
. It would also be nice to refine the query on the context nodes and, in particular, on their involved collections and/or documents. The like applies for the HSQL implementation ofgetGeometryForNode()
andgetGeometricPropertyForNode()
too.However, we have to be aware that writing such a SQL statement and passing it to the DB server might become counter-productive. The idea is then to define some (configurable) threshold values that would refine the query on the documents if there are fewer than, say, 10 documents in the context nodeset, and if there are more than 10 documents in it, but less than, say, 15 collections, refine the query on the collection.
It would be quite easy to determine those threshold values above which writing a long SQL statement and passing it to the DB server takes more time than filtering the fectched data.
We might also consider the field in which the document's URI is stored (
DOCUMENT_URI
) and possibly split it into two fields, one for the collection and the second one for the document. Of course, having indexed integer values here would probably be interesting.Having some better algorithms to prefilter the
Geometry
ies could probably help as well and, more generally, everything a DB server could bring (caching for instance) should be considered.An other improvement would be to introduce some error margins for
Geometry
ies computations or BBox ones. The Java Topology Suite, widely used by Geotools, has all the necessary material for this.Another interesting improvement would be to compute and return simplified
Geometry
ies depending of a "hint". Applications might want to return simplified polygons and even points at wider scales. Depending on the hint's value, passed in the function's parameters, the index could use the right DB fields in order to work on precomputed simpler (or, more generally, different) entries.This is how a hint configuration could look like:
<gml flushAfter="200"> <!-- divide the number of vertices by 2 --> <hint value="simplify1" method="simplify" parameter="2"/> <!-- divide the number of vertices by 4 --> <hint value="simplify2" method="simplify" parameter="4"/> <!-- reduce to a point --> <hint value="simplify3" method="point"/>
We should also discuss with the other developers about the opportunity to have a
org.exist.indexing.Index
interface in the modularized indexes hierarchy. The abstract classorg.exist.indexing.AbstractIndex
provides some nice general-purpose methods and allows static members that are nearly mandatory (likeID
). The like fororg.exist.indexing.StreamListener
versusorg.exist.indexing.AbstractStreamListener
.More tests should also be driven. The spatial index has only been tested on one file until now although this file is sizeable. It might be interesting to see how it behaves with unusual geometries like rectangles, multi-geometries and collections. It might also be interesting to know more about the error margin when geometries in different SRSes are involved. The accuracy of the referencing libraries available in the CLASSPATH would play an important role here.
As always, the code could be written in a more efficient way. There are probably too many
if (...) ... else if(...) ... else ...constructs in the code for instance. Also, we will have to follow Geotools progress to get rid of some of the more or less elegant workarounds we've had to implement.
org.exist.indexing.IndexManager org.exist.indexing.IndexController org.exist.indexing.Index and org.exist.indexing.AbstractIndex org.exist.indexing.IndexWorker org.exist.indexing.StreamListener and org.exist.indexing.AbstractStreamListener Use case: developing an indexing architecture for GML geometries Writing the concrete implementation of org.exist.indexing.AbstractIndex Writing the concrete implementation of org.exist.indexing.IndexWorker Writing a concrete implementation of org.exist.indexing.StreamListener Implementing some functions that cooperate with the spatial index Playing with the spatial index Facts and thoughts
帅气的酱牛肉 · Using multiple test_clients in parallel to simulate multi-user workflows · Issue #163 · pytest-dev/p 2 月前 |
从容的鼠标 · 国家金融监督管理总局 3 月前 |