mlpack 3.4.2
Public Member Functions | List of all members
StringEncoding< EncodingPolicyType, DictionaryType > Class Template Reference

The class translates a set of strings into numbers using various encoding algorithms. More...

#include <string_encoding.hpp>

Public Member Functions

template<typename ... ArgTypes>
 StringEncoding (ArgTypes &&... args)
 Pass the given arguments to the policy constructor and create the StringEncoding object using the policy. More...
 
 StringEncoding (const StringEncoding &)
 Default copy-constructor. More...
 
 StringEncoding (EncodingPolicyType encodingPolicy)
 Construct the class from the given encoding policy. More...
 
 StringEncoding (StringEncoding &&)
 Default move-constructor. More...
 
 StringEncoding (StringEncoding &)
 A variant of the copy constructor for non-constant objects. More...
 
void Clear ()
 Clear the dictionary. More...
 
template<typename TokenizerType >
void CreateMap (const std::string &input, const TokenizerType &tokenizer)
 Initialize the dictionary using the given corpus. More...
 
DictionaryType & Dictionary ()
 Modify the dictionary. More...
 
const DictionaryType & Dictionary () const
 Return the dictionary. More...
 
template<typename OutputType , typename TokenizerType >
void Encode (const std::vector< std::string > &input, OutputType &output, const TokenizerType &tokenizer)
 Encode the given text and write the result to the given output. More...
 
EncodingPolicyType & EncodingPolicy ()
 Modify the encoding policy object. More...
 
const EncodingPolicyType & EncodingPolicy () const
 Return the encoding policy object. More...
 
StringEncodingoperator= (const StringEncoding &)=default
 Default copy assignment operator. More...
 
StringEncodingoperator= (StringEncoding &&)=default
 Default move assignment operator. More...
 
template<typename Archive >
void serialize (Archive &ar, const unsigned int)
 Serialize the class to the given archive. More...
 

Detailed Description

template<typename EncodingPolicyType, typename DictionaryType>
class mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >

The class translates a set of strings into numbers using various encoding algorithms.

The encoder writes data either in the column-major order or in the row-major order depending on the output data type.

Template Parameters
EncodingPolicyTypeType of the encoding algorithm itself.
DictionaryTypeType of the dictionary.

Definition at line 35 of file string_encoding.hpp.

Constructor & Destructor Documentation

◆ StringEncoding() [1/5]

StringEncoding ( ArgTypes &&...  args)

Pass the given arguments to the policy constructor and create the StringEncoding object using the policy.

◆ StringEncoding() [2/5]

StringEncoding ( EncodingPolicyType  encodingPolicy)

Construct the class from the given encoding policy.

Parameters
encodingPolicyThe given encoding policy.

◆ StringEncoding() [3/5]

StringEncoding ( StringEncoding< EncodingPolicyType, DictionaryType > &  )

A variant of the copy constructor for non-constant objects.

◆ StringEncoding() [4/5]

StringEncoding ( const StringEncoding< EncodingPolicyType, DictionaryType > &  )

Default copy-constructor.

◆ StringEncoding() [5/5]

StringEncoding ( StringEncoding< EncodingPolicyType, DictionaryType > &&  )

Default move-constructor.

Member Function Documentation

◆ Clear()

void Clear ( )

Clear the dictionary.

◆ CreateMap()

void CreateMap ( const std::string &  input,
const TokenizerType &  tokenizer 
)

Initialize the dictionary using the given corpus.

Template Parameters
TokenizerTypeType of the tokenizer.
Parameters
inputCorpus of text to encode.
tokenizerThe tokenizer object.

The tokenization algorithm has to be an object with two public methods:

  1. operator() which accepts a reference to boost::string_view, extracts the next token from the given view, removes the prefix containing the extracted token and returns the token;
  2. IsTokenEmpty() that accepts a token and returns true if the given token is empty.

◆ Dictionary() [1/2]

DictionaryType & Dictionary ( )
inline

Modify the dictionary.

Definition at line 126 of file string_encoding.hpp.

◆ Dictionary() [2/2]

const DictionaryType & Dictionary ( ) const
inline

Return the dictionary.

Definition at line 124 of file string_encoding.hpp.

◆ Encode()

void Encode ( const std::vector< std::string > &  input,
OutputType &  output,
const TokenizerType &  tokenizer 
)

Encode the given text and write the result to the given output.

The encoder writes data in the column-major order or in the row-major order depending on the output data type.

If the output type is either arma::mat or arma::sp_mat then the function writes it in the column-major order. If the output type is 2D std::vector then the function writes it in the row major order.

Template Parameters
OutputTypeType of the output container. The function supports the following types: arma::mat, arma::sp_mat, std::vector<std::vector<>>.
TokenizerTypeType of the tokenizer.
Parameters
inputCorpus of text to encode.
outputOutput container to store the result.
tokenizerThe tokenizer object.

The tokenization algorithm has to be an object with two public methods:

  1. operator() which accepts a reference to boost::string_view, extracts the next token from the given view, removes the prefix containing the extracted token and returns the token;
  2. IsTokenEmpty() that accepts a token and returns true if the given token is empty.

◆ EncodingPolicy() [1/2]

EncodingPolicyType & EncodingPolicy ( )
inline

Modify the encoding policy object.

Definition at line 131 of file string_encoding.hpp.

◆ EncodingPolicy() [2/2]

const EncodingPolicyType & EncodingPolicy ( ) const
inline

Return the encoding policy object.

Definition at line 129 of file string_encoding.hpp.

◆ operator=() [1/2]

StringEncoding & operator= ( const StringEncoding< EncodingPolicyType, DictionaryType > &  )
default

Default copy assignment operator.

◆ operator=() [2/2]

StringEncoding & operator= ( StringEncoding< EncodingPolicyType, DictionaryType > &&  )
default

Default move assignment operator.

◆ serialize()

void serialize ( Archive &  ar,
const unsigned int   
)

Serialize the class to the given archive.


The documentation for this class was generated from the following file: