咸糖记录编程的地方

Do what you love and the money will follow.

目录
scrapy+sqlalchemy实践
/  

scrapy+sqlalchemy实践

###sqlalchemy

sqlalchemy创建模板对象

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Text, Date
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://kettle:root@192.168.0.210:3306/db_douban")
Base = declarative_base()

class Topic(Base):
    __tablename__ = 'raw_content_perfume_boisdejasmin_topic'
    topic_title = Column(String(60),primary_key=True)
    topic_body = Column(Text())
    create_date = Column(Date())
    crawl_date = Column(Date())
    user_name = Column(String(20))
    like = Column(Integer,default=0)
    site_url = Column(String(255))
    group_type = Column(String(40))
    reply_num = Column(Integer)
    from_site = Column(String(60))
    def __repr__(self):
        return "<Person(name='%s')>" % self.topic_title

我们需要创建一个引擎的对象,可以根据不同的数据库进行配置。案例中我们使用的是
mysql数据库
Mysql 相关库的配置如下:

from sqlalchemy import create_engine
# default
engine = create_engine('mysql://scott:tiger@localhost/foo')

# mysql-python
engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')

# MySQL-connector-python
engine = create_engine('mysql+mysqlconnector://scott:tiger@localhost/foo')

# OurSQL
engine = create_engine('mysql+oursql://scott:tiger@localhost/foo')

sqlserver的相关配置环境如下:

from sqlalchemy import create_engine
# pyodbc
engine = create_engine('mssql+pyodbc://scott:tiger@mydsn')

# pymssql
engine = create_engine('mssql+pymssql://scott:tiger@hostname:port/dbname')

sqlalchemy将对象插入数据库

from ORM.my_first_orm import Reply
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
def do_insert(self):
    engine = create_engine("mysql+pymysql://kettle:root@192.168.0.210:3306/db_douban")
    DBSession = sessionmaker(bind=engine)
    session = DBSession()
    new_reply = Reply(topic_title= self['topic_title'],
                          create_date=self['create_date'], reply_body=self['reply_body'],
                          crawl_date=self['crawl_date'],user_name=self['user_name'],
                          like=self['like'],avatar=self['avatar']
                         )
    session.add(new_reply)
    session.commit()
    session.close()

以回复表为例子,创建一个引擎对象
然后通过这个对象去创建一个session我们可以通过这个
来对新抓取的回复进行插入数据库操作

sqlalchemy动态增加新的字段

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Table, Integer, String, ForeignKey, UniqueConstraint, Index, Text, Date, MetaData
# from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy import create_engine
def add_col(engine,table_name_,col):
    col_name = col.compile(dialect=engine.dialect)
    col_type = col.type.compile(engine.dialect)
    engine.execute('ALTER TABLE %s add column %s %s'%(table_name_,col_name,col_type))

if __name__ == '__main__':
    engine = create_engine("mysql+pymysql://kettle:root@192.168.0.210:3306/db_douban")
    col = Column('topic_title',String(60), comment="标题名称")
    table_name = 'raw_content_perfume_boisdejasmin_reply'
    add_col(engine,table_name,col)

首先创建一个Column对象
从这个对象中获取他的标题,他的字段类型
使用引擎的execute()去执行这句补全的sql语句


sqlalchemy 插入带有emoji的字段

我们在爬虫爬取的时候插入字符会触发一个异常:

Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F…

经过翻阅各种博客
我发现这个是因为字段编码产生的问题
utf-8只支持1-3个字节
可是emoji需要4个字节
所以就报错了
我们的解决方法是重建数据库
将编码改为:
collate='utftmb4_general_ci'
不过我们还是需要在sqlalchemy上修改编码格式
utf8mb4
engine = create_engine("mysql+pymysql://user:pass@192.168.0.210:3306/db?charset=utf8mb4")
就可以正常的插入数据了


标题:scrapy+sqlalchemy实践
作者:xiantang
地址:http://xiantang.info/articles/2019/06/03/1559551084001.html

评论