Saturday, July 5, 2014

Saving and Indexing Shp resources in Arches

In this post, I am going to explain my work with internals of Arches data processing and how it models heritage data in the back-end. I described how the shapefile reader works in my previous post and in this, I will be explaining how shapefile records are collaborating with PostGIS and Elasticsearch.






Sunday, June 15, 2014

Shapefile reader for Arches

In my previous post I introduced how shapefile support can be provided for Arches in a very abstract and font-end perspective. By the time I write this post, I have started working on data processing methods. So this post will describe how things happen in the back-end. There are certain things that need to be glued together on order to complete the workflow and that will come in a later post.

Shapefile is a file format defined by ESRI in order to manage geographical data[1][2].
Currently Arches do not support shapefile format. With this improvement, Arches will be enabled to read user's legacy data from a shapefile and store them inside Arches as Arches resource instances.

Django GDAL(GeoSpacial Data Abstraction Library) interface was used in the development[3][4].
Current code is available in the data_import_mapping development branch at bitbucket repository trusira/arches[5]

Code for shpreader.py is given below.

The following code is used to return all the data read from the shapefile in a single container so that it will be easy for consumers of the output of shpreader.

shp_arches_mapping.py is taking user defined attribute mapping and create python dictionaries which then can be converted into the corresponding JSON objects.


data attribute mapping and authority data mapping come as two separate arguments, attr_mapping and auth_mapping. The rationale behind this design is that, the user maps shp fields to Arches attributes where the authority data generated through the Arches GUI itself.

See shp_arches_mapping.py where the build_dictionary method is defined.

def build_dictionary(self,attr_mapping, auth_mapping,reader_output)


An example mapping instance would be the following.

attr_map = {
"p_name" : "NAME.E41" ,
"summary" : "SUMMARY.E62" ,
"geom_id" : "EXTERNAL XREF.E42"
}

auth_map = {
"NAME TYPE.E55" : "Primary",
"EXTERNAL XREF TYPE.E55" : "Legacy"
}

The third argument is reader_output, the raw data read from the shp file (.shp + .dbf files)

References

[1] http://en.wikipedia.org/wiki/Shapefile
[2] http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
[3] https://docs.djangoproject.com/en/1.6/ref/contrib/gis/tutorial/#gdal-interface
[4] https://docs.djangoproject.com/en/1.6/ref/contrib/gis/gdal/
[5] https://bitbucket.org/trusira/arches/branch/data_import_mapping

Sunday, May 25, 2014

Shapefile support for Arches

Shapefiles (http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf) are a widely used data exchange format among GIS data users. The new modifications will allow Arches users to load heritage resource data using Shapefiles.

A shapefile has the extension of .shp but a shapefile alone has no value. So every shapefile is associated with a couple more mandatory files (There are optional files in addition to these two mandatory files), an index file (*.shx) and a dBase table (*.dbf).

Following is a set of UI mockups for uploading heritage resources from a shapefile.





User selects the .shp file but a script checks the existence of the corresponding .shx and .dbf and loads them. If these are not found, an error message will return.

 

There is a shapefile record navigator on the top. (In the mockup, the 3rd record is selected). Now we modify the 3rd record by adding other data required to save an Arches heritage resource.



Geometric location of the resource is already read from the shapefile so we disable modifying that option under Location theme. This is shown in the figure below.



While the shp file itself contains geographic (shape) details, the dbf (dBase) file has other attributes with a one-to-one mapping with the shp file records. This mapping is obtained through the index file (.shx).
Arches will read these data and allow the user to select corresponding Arches attributes. 


The component design diagram is given below.




Friday, May 23, 2014

Connect to a Postgres database using pgAdmin III


pgAdmin III is a GUI tool to manage and query postgres databases. Assuming you have installed both postgres (in my case it is postgis) and pgAdmin we can proceed as follows.
pgAdmin III is available through Ubuntu software center. 


add new server

Fill the required fields to setup the connection. Default postgres port is 5433. Default user/password details are found in ~/.pgpass file.

content of ~/.pgpass


Once everything is fine, the connection is established and the database is shown in pgAdmin object explorer window.

Connected Arches database


Sunday, April 13, 2014

Build an executable jar from a maven project

Not all the jar files are executable because there should be a well defined access point (public static void main(String[] args)) for a jar package to be executable.

Add the following plugin under plugins in your pom file. Eg: If I have my main method in class named, MainClass.java, I add the following.

<plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-jar-plugin</artifactId>
          <version>2.4</version>
          <configuration>
            <archive>
                <manifest>
                    <mainClass>MainClass</mainClass>
                </manifest>
            </archive>
          </configuration>
 </plugin>

Thursday, April 3, 2014

Package code with dependencies in Maven

1. Add the following part to the project pom file.

<build>
 <plugins>
  <plugin>
   <artifactId>maven-assembly-plugin</artifactId>
   <configuration>
    <descriptorRefs>
     <descriptorRef>jar-with-dependencies</descriptorRef>
    </descriptorRefs>
   </configuration>
  </plugin>

 </plugins>
</build>


2. package the code with the support of assembly plugin.

mvn clean package assembly:assembly 

3. This will generate two packages, one without dependencies and another with all the project dependencies. The latter is ready to be shipped directly.
Fork me on GitHub