4.3 Implementing Feature Extractors
Feature extraction can delegate / leverage specialized services via plugins
, i.e:
Chainalysis
can extractAML-Compliance
related features.- An adpater for
Chainalaysis API
can be implemented as aplugin
. - Utility of this plugin allow feature extraction to leverage Chainalysis services.
Speialized feature-extraction logic / algorithms can be implemented as gadgets
, i.e:
EVM-Call-Tracer
gadget for fast-tracing of internal calls.Token-Tracker
gadget for tracking all token movements.
As Faults in feature extraction can have downstream impact in the classification process, implementation of a feature extractor must take in account the complexity of the feature extraction process to ensure that:
- The process is reliable and
deterministc
to avoid errors. - THe process is
verifiable
to ensure that the extracted features are correct. - The process is designed to be as fast as possible to avoid delays in the classification process.
- The process is designed to be as efficient as possible to avoid unnecessary resource consumption.
- The process is designed to be as secure as possible to avoid data leaks.
- The process is designed to be as scalable as possible to handle large volumes of data.
4.3.2 Inputs & Outputs:
To elimniate unwanted influence by other components, the feature-extractor is designed to:
- Accept only the Record as input.
- Be integrated with access-control systems with policies to restrict access to the feature-extraction configuration or runtime environment.
- Log any incoming data packets from external sources to ensure data integrity.
Outputs of the feature extraction process is serialized data that is encoded in a verifiable format. The extractor should sign it’s output to ensure that the data is not tampered with during transit.
The output should not be a stateful object such as a document or a database record consisting of the extracted features. Instead it is a set of verfiable JSON Patch operations that can be applied to a known & verified state to derive the extracted features, i.e.:
[
{
"op": 1,
"root": "0x010010",
"path": "HIGH_RISK/DIRECT_SANCTIONED_ENTITY_EXPOSURE",
"value": "0x0f001010",
"merkle-proof": {}
},
{
"op": 2,
"root": "0x010010",
"path": "HIGH_RISK/INDIRECT_SANCTIONED_ENTITY_EXPOSURE",
"value": "0x0f001010",
"merkle-proof": {}
}
]
These operations is applied to a default
feature-document which is a document containing the features for a category (per Schema)
but with default values (encrypted or encoded values).
This document is represented as a merkle-tree where then merkle-proofs is used to verify data integrity.
4.3.3 Plugins & Gadgets:
The feature extraction process should be modularized to allow for easy extension, testing and maintenance. Interfaces to external systems (i.e. Chaainalysis API) should be abstracted to allow dependency injection and supporting testing of feature extraction process in isolation.
Below is an example of a feature-extractor implemented in go.
It utilises plugins
for interfacing with external systems for feature extraction
and gadgets
(i.e. interpreting EVM call-traces) for specialized feature extraction.
package highRiskCat
import (
"encoding/json"
. "github.com/0xbow-io/asp-spec-V1.0/pkg/feature/extraction/storage"
. "github.com/0xbow-io/asp-spec-V1.0/pkg/feature/extraction/extractors"
. "github.com/0xbow-io/asp-spec-V1.0/pkg/feature/extraction/gadgets"
. "github.com/0xbow-io/asp-spec-V1.0/pkg/feature/extraction/plugins"
"github.com/swaggest/jsonschema-go"
)
var (
// Plugin IDs
plugins = []string{
"PLUG_CA_01",
}
// Gadget IDs
gadgets = []string{
"GA_01",
}
// Cache IDs
storages = []string{
"DB_01",
}
)
type feature struct {
ID string `json:"$id"`
Minimum int `json:"Minimum"`
Maximum int `json:"Maximum"`
Type string `json:"Type"`
Default string `json:"Default"`
}
func applyFeatureSchema(feature *feature, spec *jsonschema.Schema) error {
// quickest is to marshall then unmarshall
b, error := spec.MarshalJSON()
if error != nil {
return error
}
return json.Unmarshal(b, feature)
}
type _Extractor struct {
schema []byte
pluginCl PluginCl
gadgetCl GadgetCl
storageCl StorageCl
}
var _ FeatureExtractor = (*_Extractor)(nil)
func Init(schema []byte) *_Extractor {
ex := _Extractor{schema: schema}
// init plugins
for _, id := range plugins {
if ex.pluginCl.Connect(id) != nil {
return nil
}
}
// init gadgets
for _, id := range gadgets {
if ex.gadgetCl.Connect(id) != nil {
return nil
}
}
// init cache
for _, id := range storages {
if ex.storageCl.Connect(id) != nil {
return nil
}
}
return &ex
}
// Implements FeatureExtractorMetadata interface
func (ex *_Extractor) Name() string { return "HIGH_RISK_CATEGORY_EXTRACTOR" }
func (ex *_Extractor) Describe() string { return "Extracting features for the HIGH_RISK Category" }
func (ex *_Extractor) Ver() string { return "0.1.0" }
func (ex *_Extractor) Author() string { return "0box.io" }
func (ex *_Extractor) License() string { return "MIT" }
func (ex *_Extractor) URL() string { return "github.com/0xbow.io/asp-v1.0/" }
func (ex *_Extractor) GetFeatureSchema() []byte { return ex.schema }
// Parses the Schmea to build a feature set
func (ex *_Extractor) featureSet() (set []feature) {
var (
category = jsonschema.Schema{}
)
if category.UnmarshalJSON(ex.schema) == nil {
// extract features
featureSet := category.Properties["features"].TypeObject.Items.SchemaArray
set = make([]feature, len(featureSet))
// iterate over features
for i, feature := range featureSet {
// apply feature schema
applyFeatureSchema(&set[i], feature.TypeObject)
}
}
return
}
func comparator(x [32]byte, y [32]byte) *Op { return nil }
func mtRoot(map[string][32]byte) [32]byte { return [32]byte{} }
func (ex *_Extractor) sign(v []byte) []byte { return nil }
func (ex *_Extractor) ExtractFeatures(record Record) (out []byte) {
return
}