pax_global_header 0000666 0000000 0000000 00000000064 14061700555 0014515 g ustar 00root root 0000000 0000000 52 comment=f5733adb7b920a18aa88c92911eb981091128c7a
search-master/ 0000775 0000000 0000000 00000000000 14061700555 0013617 5 ustar 00root root 0000000 0000000 search-master/catalog/ 0000775 0000000 0000000 00000000000 14061700555 0015231 5 ustar 00root root 0000000 0000000 search-master/catalog/search.cs_CZ.utf-8.xml 0000664 0000000 0000000 00000003342 14061700555 0021164 0 ustar 00root root 0000000 0000000
search-master/www/doc/guidelines.html 0000664 0000000 0000000 00000013677 14061700555 0020244 0 ustar 00root root 0000000 0000000How to make an object type searchable?
by Neophytos Demetriou (k2pts\@cytanet.com.cy)
Making an object type searchable involves three steps:
- Choose the object type
- Implement FtsContentProvider
- Add triggers
Choose the object type
In most of the cases, choosing the object type is straightforward. However, if your object type uses the content repository then you should make sure that your object type is a subclass of the "content_revision" class. You should also make sure all content is created using that subclass, rather than simply create content with the "content_revision" type.
- Object types that don't use the CR, can be specified using
acs_object_type__create_type
, but those that use the CR need to usecontent_type__create_type
.content_type__create_type
overloadsacs_object_type__create_type
and provides two views for inserting and viewing content data, and the CR depends on these views.- Whenever you call content_item__new, call it with 'content_revision' as the item_subtype and 'your_content_type' as the content_type.
Implement FtsContentProvider
FtsContentProvider is comprised of two abstract operations, namelydatasource
andurl
. The specification for these operations can be found inpackages/search/sql/postgresql/search-sc-create.sql
. You have to implement these operations for your object type by writing concrete functions that follow the specification. For example, the implementation ofdatasource
for the object typenote
, looks like this:When you are done with the implementation ofad_proc notes__datasource { object_id } { \@author Neophytos Demetriou } { db_0or1row notes_datasource { select n.note_id as object_id, n.title as title, n.body as content, 'text/plain' as mime, '' as keywords, 'text' as storage_type from notes n where note_id = :object_id } -column_array datasource return [array get datasource] }
FtsContentProvider
operations, you should let the system know of your implementation. This is accomplished by an SQL file which associates the implementation with a contract name. The implementation ofFtsContentProvider
for the object typenote
looks like:You should adapt this association to reflect your implementation. That is, changeselect acs_sc_impl__new( 'FtsContentProvider', -- impl_contract_name 'note', -- impl_name 'notes' -- impl_owner_name );
impl_name
with your object type and theimpl_owner_name
to the package key. Next, you have to create associations between the operations ofFtsContentProvider
and your concrete functions. Here's how an association between an operation and a concrete function looks like:Again, you have to make some changes. Change theselect acs_sc_impl_alias__new( 'FtsContentProvider', -- impl_contract_name 'note', -- impl_name 'datasource', -- impl_operation_name 'notes__datasource', -- impl_alias 'TCL' -- impl_pl );
impl_name
fromnote
to your object type and theimpl_alias
fromnotes__datasource
to the name that you gave to the function that implements the operationdatasource
.Add triggers
If your object type uses the content repository to store its items, then you are done. If not, an extra step is required to inform the search_observer_queue of new content items, updates or deletions. We do this by adding triggers on the table that stores the content items of your object type. Here's how that part looks like fornote
.create function notes__itrg () returns opaque as $$ begin perform search_observer__enqueue(new.note_id,'INSERT'); return new; end; $$ language plpgsql; create function notes__dtrg () returns opaque as $$ begin perform search_observer__enqueue(old.note_id,'DELETE'); return old; end; $$ language plpgsql; create function notes__utrg () returns opaque as $$ begin perform search_observer__enqueue(old.note_id,'UPDATE'); return old; end; $$ language plpgsql; create trigger notes__itrg after insert on notes for each row execute procedure notes__itrg (); create trigger notes__dtrg after delete on notes for each row execute procedure notes__dtrg (); create trigger notes__utrg after update on notes for each row execute procedure notes__utrg ();
Questions & Answers
- Q: If content is some binary file (like a pdf file stored in file storage, for example), will the content still be indexable/searchable?
A: For each mime type we require some type of handler. Once the handler is available, i.e. pdf2txt, it is very easy to incorporate support for that mime type into the search package. Content items with unsupported mime types will be ignored by the indexer.
- Q: Can the search package handle lobs and files?
A: Yes, the search package will convert everything into text based on the content and storage_type attributes. Here is the convention to use while writing the implementation of datasource:
- Content is a filename when storage_type='file'.
- Content is a lob id when storage_type='lob'.
- Content is text when storage_type='text'.
How to make an object type searchable?
by Neophytos Demetriou (k2pts@cytanet.com.cy)
Making an object type searchable involves three steps:
- Choose the object type
- Implement FtsContentProvider
- Add triggers
Choose the object type
In most of the cases, choosing the object type is straightforward. However, if your object type uses the content repository then you should make sure that your object type is a subclass of the "content_revision" class. You should also make sure all content is created using that subclass, rather than simply create content with the "content_revision" type.
- Object types that don't use the CR, can be specified using
acs_object_type__create_type
, but those that use the CR need to usecontent_type__create_type
.content_type__create_type
overloadsacs_object_type__create_type
and provides two views for inserting and viewing content data, and the CR depends on these views.- Whenever you call content_item__new, call it with 'content_revision' as the item_subtype and 'your_content_type' as the content_type.
Implement FtsContentProvider
FtsContentProvider is comprised of two abstract operations, namelydatasource
andurl
. The specification for these operations can be found inpackages/search/sql/postgresql/search-sc-create.sql
. You have to implement these operations for your object type by writing concrete functions that follow the specification. For example, the implementation ofdatasource
for the object typenote
, looks like this:When you are done with the implementation of
ad_proc notes__datasource { object_id } { @author Neophytos Demetriou } { db_0or1row notes_datasource { select n.note_id as object_id, n.title as title, n.body as content, 'text/plain' as mime, '' as keywords, 'text' as storage_type from notes n where note_id = :object_id } -column_array datasource return [array get datasource] }FtsContentProvider
operations, you should let the system know of your implementation. This is accomplished by an SQL file which associates the implementation with a contract name. The implementation ofFtsContentProvider
for the object typenote
looks like:You should adapt this association to reflect your implementation. That is, change
select acs_sc_impl__new( 'FtsContentProvider', -- impl_contract_name 'note', -- impl_name 'notes' -- impl_owner_name );impl_name
with your object type and theimpl_owner_name
to the package key. Next, you have to create associations between the operations ofFtsContentProvider
and your concrete functions. Here's how an association between an operation and a concrete function looks like:Again, you have to make some changes. Change the
select acs_sc_impl_alias__new( 'FtsContentProvider', -- impl_contract_name 'note', -- impl_name 'datasource', -- impl_operation_name 'notes__datasource', -- impl_alias 'TCL' -- impl_pl );impl_name
fromnote
to your object type and theimpl_alias
fromnotes__datasource
to the name that you gave to the function that implements the operationdatasource
.Add triggers
If your object type uses the content repository to store its items, then you are done. If not, an extra step is required to inform the search_observer_queue of new content items, updates or deletions. We do this by adding triggers on the table that stores the content items of your object type. Here's how that part looks like fornote
.
create function notes__itrg () returns opaque as $$ begin perform search_observer__enqueue(new.note_id,'INSERT'); return new; end; $$ language plpgsql; create function notes__dtrg () returns opaque as $$ begin perform search_observer__enqueue(old.note_id,'DELETE'); return old; end; $$ language plpgsql; create function notes__utrg () returns opaque as $$ begin perform search_observer__enqueue(old.note_id,'UPDATE'); return old; end; $$ language plpgsql; create trigger notes__itrg after insert on notes for each row execute procedure notes__itrg (); create trigger notes__dtrg after delete on notes for each row execute procedure notes__dtrg (); create trigger notes__utrg after update on notes for each row execute procedure notes__utrg ();Questions & Answers
- Q: If content is some binary file (like a pdf file stored in file storage, for example), will the content still be indexable/searchable?
A: For each mime type we require some type of handler. Once the handler is available, i.e. pdf2txt, it is very easy to incorporate support for that mime type into the search package. Content items with unsupported mime types will be ignored by the indexer.
- Q: Can the search package handle lobs and files?
A: Yes, the search package will convert everything into text based on the content and storage_type attributes. Here is the convention to use while writing the implementation of datasource:
- Content is a filename when storage_type='file'.
- Content is a lob id when storage_type='lob'.
- Content is text when storage_type='text'.