QuickRank  v2.0
QuickRank: A C++ suite of Learning to Rank algorithms
Public Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
quickrank::data::Dataset Class Reference

This class implements a Dataset to be used for a L-t-R task. More...

#include <dataset.h>

Public Member Functions

 Dataset (size_t n_instances, size_t n_features)
 Allocates an empty Dataset of given size in horizontal format. More...
 
virtual ~Dataset ()
 
 Dataset (const Dataset &other)=delete
 Avoid inefficient copy constructor. More...
 
Datasetoperator= (const Dataset &)=delete
 Avoid inefficient copy assignment. More...
 
quickrank::Featureat (size_t document_id, size_t feature_id)
 Returns a pointer to a specific data item. More...
 
Label getLabel (size_t document_id)
 Returns the value of the i-th relevance label. More...
 
size_t offset (size_t i) const
 Returns the offset in the internal data structure of the i-th query results list. More...
 
std::unique_ptr< QueryResultsgetQueryResults (size_t i) const
 Returns the i-th QueryResults in the dataset. More...
 
void addInstance (QueryID q_id, Label i_label, std::vector< Feature > i_features)
 Add a new training instance, i.e., a labeled document, to the dataset. More...
 
size_t num_features () const
 Returns the number of features used to represent a document. More...
 
size_t num_queries () const
 Returns the number of queries in the dataset. More...
 
size_t num_instances () const
 Returns the number of documents in the dataset. More...
 

Private Member Functions

virtual std::ostream & put (std::ostream &os) const
 Prints the data reading time stats. More...
 

Private Attributes

size_t num_features_
 
size_t num_queries_
 
size_t num_instances_
 
quickrank::Featuredata_ = NULL
 
quickrank::Labellabels_ = NULL
 
std::vector< size_t > offsets_
 
size_t last_instance_id_
 
size_t max_instances_
 

Friends

std::ostream & operator<< (std::ostream &os, const Dataset &me)
 The output stream operator. More...
 

Detailed Description

This class implements a Dataset to be used for a L-t-R task.

The internal representation is quite simple: a row vector of size num_instances() x num_features(). (A training instance is indeed a document.) We allow to directly access the internal representation through the function at() to support fast access and custom high performance implementations. Internal representation is horizontal (instances x features).

Constructor & Destructor Documentation

quickrank::data::Dataset::Dataset ( size_t  n_instances,
size_t  n_features 
)

Allocates an empty Dataset of given size in horizontal format.

Parameters
n_instancesThe number of training instances (lines) in the dataset.
n_featuresThe number of features.
quickrank::data::Dataset::~Dataset ( )
virtual
quickrank::data::Dataset::Dataset ( const Dataset other)
delete

Avoid inefficient copy constructor.

Member Function Documentation

void quickrank::data::Dataset::addInstance ( QueryID  q_id,
Label  i_label,
std::vector< Feature i_features 
)

Add a new training instance, i.e., a labeled document, to the dataset.

Warning
Currently the addition works only when data is in HORIZ format.
Parameters
q_idThe query ID.
i_labelThe relevance label of the result.
i_featuresThe feature vector of the document.
quickrank::Feature* quickrank::data::Dataset::at ( size_t  document_id,
size_t  feature_id 
)
inline

Returns a pointer to a specific data item.

Parameters
document_idThe document of interest.
feature_idThe feature of interest.
Returns
A reference to the requested feature value of the given document id.
Label quickrank::data::Dataset::getLabel ( size_t  document_id)
inline

Returns the value of the i-th relevance label.

std::unique_ptr< QueryResults > quickrank::data::Dataset::getQueryResults ( size_t  i) const

Returns the i-th QueryResults in the dataset.

Parameters
iThe i-th query results list of interest.
Returns
The requested QueryResults.
size_t quickrank::data::Dataset::num_features ( ) const
inline

Returns the number of features used to represent a document.

size_t quickrank::data::Dataset::num_instances ( ) const
inline

Returns the number of documents in the dataset.

size_t quickrank::data::Dataset::num_queries ( ) const
inline

Returns the number of queries in the dataset.

size_t quickrank::data::Dataset::offset ( size_t  i) const
inline

Returns the offset in the internal data structure of the i-th query results list.

Parameters
iThe i-th query results list of interest.
Returns
The offset of the first document in the i-th query results list. This can be used to later invoke the at() function.
Dataset& quickrank::data::Dataset::operator= ( const Dataset )
delete

Avoid inefficient copy assignment.

std::ostream & quickrank::data::Dataset::put ( std::ostream &  os) const
privatevirtual

Prints the data reading time stats.

Friends And Related Function Documentation

std::ostream& operator<< ( std::ostream &  os,
const Dataset me 
)
friend

The output stream operator.

Prints the data reading time stats

Member Data Documentation

quickrank::Feature* quickrank::data::Dataset::data_ = NULL
private
quickrank::Label* quickrank::data::Dataset::labels_ = NULL
private
size_t quickrank::data::Dataset::last_instance_id_
private
size_t quickrank::data::Dataset::max_instances_
private
size_t quickrank::data::Dataset::num_features_
private
size_t quickrank::data::Dataset::num_instances_
private
size_t quickrank::data::Dataset::num_queries_
private
std::vector<size_t> quickrank::data::Dataset::offsets_
private

The documentation for this class was generated from the following files: